All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough
@ 2014-08-09 14:25 Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 01/10] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

This RFC series aims at enabling KVM platform device passthrough.
It implements a VFIO platform device, derived from VFIO PCI device.

The VFIO platform device uses the host VFIO platform driver which must
be bound to the assigned device prior to the QEMU system start.

- the guest can directly access the device register space
- assigned device IRQs are transparently routed to the guest by
  QEMU/KVM (2 methods currently are supported)
- iommu is transparently programmed to prevent the device from
  accessing physical pages outside of the guest address space

the patch relies on the following QEMU patch series:

- Alex Graf's "Dynamic sysbus device allocation support"
  http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
  (up to "sysbus: Make devices spawnable via -device")
- [RFC v2] machvirt dynamic sysbus device instantiation

This patch series is made of the following patch files:

1-5) Modifications to PCI code to prepare for VFIO platform device
6) split of PCI specific code and generic code (move)
7) creation of the VFIO platform device, without irqfd support
   (MMIO direct access and IRQ assignment).
8-9) addition of irqfd/virqfd support
10) capability to dynamically instantiate the device

v4->v5:
- rebase on v2.1.0 PCI code
- take into account Alex Williamson comments on PCI code rework
  - trace updates in vfio_region_write/read
  - remove fd from VFIORegion
  - get/put ckeanup
- bug fix: bar region's vbasedev field duly initialization
- misc cleanups in platform device
- device tree node generation removed from device and handled in
  hw/arm/dyn_sysbus_devtree.c
- remove "hw/vfio: add an example calxeda_xgmac": with removal of
  device tree node generation we do not have so many things to
  implement in that derived device yet. May be re-introduced later
  on if needed typically for reset/migration.
- no GSI routing table anymore

v3->v4 changes (Eric Auger, Alvise Rigo)
- rebase on last VFIO PCI code (v2.1.0-rc0)
- full git history rework to ease PCI code change review
- mv include files in hw/vfio
- DPRINTF reformatting temporarily moved out
- support of VFIO virq (removal of resamplefd handler on user-side)
- integration with sysbus dynamic instantiation framwork
- removal of unrealize and cleanup routines until it is better
  understood what is really needed
- Support of VFIO for Amba devices should be handled in an inherited
  device to specialize the device tree generation (clock handle currently
  missing in framework however)
- "Always use eventfd as notifying mechanism" temporarily moved out
- static instantiation is not mainstream (although it remains possible)
  note if static instantiation is used, irqfd must be setup in machine file
  when virtual IRQ is known
- create the GSI routing table on qemu side

v2->v3 changes (Alvise Rigo, Eric Auger):
- Following Alex W recommandations, further efforts to factorize the
  code between PCI:introduction of VFIODevice and VFIORegion
  as base classes
- unique reset handler for platform and PCI
- cleanup following Kim's comments
- multiple IRQ support mechanics should be in place although not
  tested
- Better handling of MMIO multiple regions
- New features and fixes by Alvise (multiple compat string, exec
  flag, force eventfd usage, amba device tree support)
- irqfd support

v1->v2 changes (Kim Phillips, Eric Auger):
- IRQ initial support (legacy mode where eventfds are handled on
  user side)
- hacked dynamic instantiation

v1 (Kim Phillips):
- initial split between PCI and platform
- MMIO support only
- static instantiation

This patch has the following kernel side dependencies:

- [RFC Patch v6 0/20] VFIO support for platform devices
https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
- [Patch] ARM: KVM: Handle IPA unmapping on memory region deletion
https://patches.linaro.org/27691/
- [PATCH RFC] ARM: KVM: add irqfd support
http://www.gossamer-threads.com/lists/linux/kernel/1981144
- arm/arm64: KVM: Various VGIC cleanups and improvements
http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
- [PATCH] ARM: KVM: Enable the KVM-VFIO device
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-March/008629.html

those kernel pieces can be found at:
git://git.linaro.org/people/eric.auger/linux.git (branch irqfd_integ_v4)

QEMU patch files and dependencies can be found at:
git://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v5)

The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
is assigned to KVM host while the second one is assigned to the guest.
Unfortunately a single IRQ is exercised. Reworked PCI device is not tested.

https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway

Best Regards

Eric




Eric Auger (9):
  hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  hw/vfio/pci: introduce VFIODevice
  hw/vfio/pci: Introduce VFIORegion
  hw/vfio/pci: split vfio_get_device
  hw/vfio: create common module
  hw/vfio/platform: add vfio-platform support
  hw/intc/arm_gic_kvm: advertise irqfd
  hw/vfio/platform: Add irqfd support
  hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation

Kim Phillips (1):
  vfio: move hw/misc/vfio.c to hw/vfio/pci.c     Move vfio.h into
    include/hw/vfio

 LICENSE                          |    2 +-
 MAINTAINERS                      |    2 +-
 hw/Makefile.objs                 |    1 +
 hw/arm/dyn_sysbus_devtree.c      |  138 ++++
 hw/intc/arm_gic_kvm.c            |    2 +
 hw/misc/Makefile.objs            |    1 -
 hw/ppc/spapr_pci_vfio.c          |    2 +-
 hw/vfio/Makefile.objs            |    5 +
 hw/vfio/common.c                 |  990 +++++++++++++++++++++++++
 hw/{misc/vfio.c => vfio/pci.c}   | 1499 +++++++-------------------------------
 hw/vfio/platform.c               |  611 ++++++++++++++++
 include/hw/vfio/vfio-common.h    |  151 ++++
 include/hw/vfio/vfio-platform.h  |   77 ++
 include/hw/{misc => vfio}/vfio.h |    0
 14 files changed, 2260 insertions(+), 1221 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 create mode 100644 hw/vfio/common.c
 rename hw/{misc/vfio.c => vfio/pci.c} (71%)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-common.h
 create mode 100644 include/hw/vfio/vfio-platform.h
 rename include/hw/{misc => vfio}/vfio.h (100%)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 01/10] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 02/10] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
	agraf, stuart.yoder, Bharat.Bhushan, alex.williamson,
	joel.schopp, a.motakis, kvmarm

From: Kim Phillips <kim.phillips@linaro.org>

This is done in preparation for the addition of VFIO platform
device support.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 LICENSE                          | 2 +-
 MAINTAINERS                      | 2 +-
 hw/Makefile.objs                 | 1 +
 hw/misc/Makefile.objs            | 1 -
 hw/ppc/spapr_pci_vfio.c          | 2 +-
 hw/vfio/Makefile.objs            | 3 +++
 hw/{misc/vfio.c => vfio/pci.c}   | 2 +-
 include/hw/{misc => vfio}/vfio.h | 0
 8 files changed, 8 insertions(+), 5 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 rename hw/{misc/vfio.c => vfio/pci.c} (99%)
 rename include/hw/{misc => vfio}/vfio.h (100%)

diff --git a/LICENSE b/LICENSE
index da70e94..0e0b4b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -11,7 +11,7 @@ option) any later version.
 
 As of July 2013, contributions under version 2 of the GNU General Public
 License (and no later version) are only accepted for the following files
-or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
+or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*.
 
 3) The Tiny Code Generator (TCG) is released under the BSD license
    (see license headers in files).
diff --git a/MAINTAINERS b/MAINTAINERS
index 906f252..866e3c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -619,7 +619,7 @@ F: tests/usb-hcd-ehci-test.c
 VFIO
 M: Alex Williamson <alex.williamson@redhat.com>
 S: Supported
-F: hw/misc/vfio.c
+F: hw/vfio/*
 
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 52a1464..73afa41 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/
 devices-dirs-$(CONFIG_SOFTMMU) += timer/
 devices-dirs-$(CONFIG_TPM) += tpm/
 devices-dirs-$(CONFIG_SOFTMMU) += usb/
+devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 86f6243..9b77554 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/
 
 ifeq ($(CONFIG_PCI), y)
 obj-$(CONFIG_KVM) += ivshmem.o
-obj-$(CONFIG_LINUX) += vfio.o
 endif
 
 obj-$(CONFIG_REALVIEW) += arm_sysctl.o
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..144912b 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,7 +20,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "linux/vfio.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
 
 static Property spapr_phb_vfio_properties[] = {
     DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
new file mode 100644
index 0000000..31c7dab
--- /dev/null
+++ b/hw/vfio/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_PCI) += pci.o
+endif
diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c
similarity index 99%
rename from hw/misc/vfio.c
rename to hw/vfio/pci.c
index ba08adb..188fdd2 100644
--- a/hw/misc/vfio.c
+++ b/hw/vfio/pci.c
@@ -39,7 +39,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
 
 /* #define DEBUG_VFIO */
 #ifdef DEBUG_VFIO
diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h
similarity index 100%
rename from include/hw/misc/vfio.h
rename to include/hw/vfio/vfio.h
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 02/10] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 01/10] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice Eric Auger
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

This prepares for the introduction of VFIOPlatformDevice

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 209 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 105 insertions(+), 104 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 188fdd2..c2cdd73 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -56,11 +56,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
-struct VFIODevice;
+struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
     MemoryRegion mem;
-    struct VFIODevice *vdev;
+    struct VFIOPCIDevice *vdev;
     QLIST_ENTRY(VFIOQuirk) next;
     struct {
         uint32_t base_offset:TARGET_PAGE_BITS;
@@ -131,7 +131,7 @@ typedef struct VFIOMSIVector {
      */
     EventNotifier interrupt;
     EventNotifier kvm_interrupt;
-    struct VFIODevice *vdev; /* back pointer to device */
+    struct VFIOPCIDevice *vdev; /* back pointer to device */
     int virq;
     bool use;
 } VFIOMSIVector;
@@ -193,7 +193,7 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODevice {
+typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     int fd;
     VFIOINTx intx;
@@ -211,7 +211,7 @@ typedef struct VFIODevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIODevice) next;
+    QLIST_ENTRY(VFIOPCIDevice) next;
     struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
@@ -226,13 +226,13 @@ typedef struct VFIODevice {
     bool has_pm_reset;
     bool needs_reset;
     bool rom_read_failed;
-} VFIODevice;
+} VFIOPCIDevice;
 
 typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_HEAD(, VFIOPCIDevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -276,16 +276,16 @@ static QLIST_HEAD(, VFIOGroup)
 static int vfio_kvm_device_fd = -1;
 #endif
 
-static void vfio_disable_interrupts(VFIODevice *vdev);
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIODevice *vdev, int index)
+static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -301,7 +301,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int index)
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIODevice *vdev)
+static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -315,7 +315,7 @@ static void vfio_unmask_intx(VFIODevice *vdev)
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIODevice *vdev)
+static void vfio_mask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -346,7 +346,7 @@ static void vfio_mask_intx(VFIODevice *vdev)
  */
 static void vfio_intx_mmap_enable(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (vdev->intx.pending) {
         timer_mod(vdev->intx.mmap_timer,
@@ -359,7 +359,7 @@ static void vfio_intx_mmap_enable(void *opaque)
 
 static void vfio_intx_interrupt(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
         return;
@@ -378,7 +378,7 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIODevice *vdev)
+static void vfio_eoi(VFIOPCIDevice *vdev)
 {
     if (!vdev->intx.pending) {
         return;
@@ -392,7 +392,7 @@ static void vfio_eoi(VFIODevice *vdev)
     vfio_unmask_intx(vdev);
 }
 
-static void vfio_enable_intx_kvm(VFIODevice *vdev)
+static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -471,7 +471,7 @@ fail:
 #endif
 }
 
-static void vfio_disable_intx_kvm(VFIODevice *vdev)
+static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -516,7 +516,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev)
 
 static void vfio_update_irq(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     PCIINTxRoute route;
 
     if (vdev->interrupt != VFIO_INT_INTx) {
@@ -547,7 +547,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_eoi(vdev);
 }
 
-static int vfio_enable_intx(VFIODevice *vdev)
+static int vfio_enable_intx(VFIOPCIDevice *vdev)
 {
     uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
     int ret, argsz;
@@ -613,7 +613,7 @@ static int vfio_enable_intx(VFIODevice *vdev)
     return 0;
 }
 
-static void vfio_disable_intx(VFIODevice *vdev)
+static void vfio_disable_intx(VFIOPCIDevice *vdev)
 {
     int fd;
 
@@ -640,7 +640,7 @@ static void vfio_disable_intx(VFIODevice *vdev)
 static void vfio_msi_interrupt(void *opaque)
 {
     VFIOMSIVector *vector = opaque;
-    VFIODevice *vdev = vector->vdev;
+    VFIOPCIDevice *vdev = vector->vdev;
     int nr = vector - vdev->msi_vectors;
 
     if (!event_notifier_test_and_clear(&vector->interrupt)) {
@@ -672,7 +672,7 @@ static void vfio_msi_interrupt(void *opaque)
     }
 }
 
-static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
+static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
     struct vfio_irq_set *irq_set;
     int ret = 0, i, argsz;
@@ -763,7 +763,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector;
     int ret;
 
@@ -852,7 +852,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
     DPRINTF("%s(%04x:%02x:%02x.%x) vector %d released\n", __func__,
@@ -891,7 +891,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
     }
 }
 
-static void vfio_enable_msix(VFIODevice *vdev)
+static void vfio_enable_msix(VFIOPCIDevice *vdev)
 {
     vfio_disable_interrupts(vdev);
 
@@ -924,7 +924,7 @@ static void vfio_enable_msix(VFIODevice *vdev)
             vdev->host.bus, vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_enable_msi(VFIODevice *vdev)
+static void vfio_enable_msi(VFIOPCIDevice *vdev)
 {
     int ret, i;
 
@@ -1002,7 +1002,7 @@ retry:
             vdev->host.function, vdev->nr_vectors);
 }
 
-static void vfio_disable_msi_common(VFIODevice *vdev)
+static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1026,7 +1026,7 @@ static void vfio_disable_msi_common(VFIODevice *vdev)
     vfio_enable_intx(vdev);
 }
 
-static void vfio_disable_msix(VFIODevice *vdev)
+static void vfio_disable_msix(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1053,7 +1053,7 @@ static void vfio_disable_msix(VFIODevice *vdev)
             vdev->host.bus, vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_disable_msi(VFIODevice *vdev)
+static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
     vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
@@ -1062,7 +1062,7 @@ static void vfio_disable_msi(VFIODevice *vdev)
             vdev->host.bus, vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_update_msi(VFIODevice *vdev)
+static void vfio_update_msi(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1115,7 +1115,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
                 ", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
@@ -1132,7 +1132,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 }
 
 static uint64_t vfio_bar_read(void *opaque,
@@ -1170,7 +1170,7 @@ static uint64_t vfio_bar_read(void *opaque,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
                 ", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
@@ -1180,7 +1180,7 @@ static uint64_t vfio_bar_read(void *opaque,
 #endif
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 
     return data;
 }
@@ -1191,7 +1191,7 @@ static const MemoryRegionOps vfio_bar_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void vfio_pci_load_rom(VFIODevice *vdev)
+static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
         .argsz = sizeof(reg_info),
@@ -1249,7 +1249,7 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
 
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1299,7 +1299,7 @@ static const MemoryRegionOps vfio_rom_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
+static bool vfio_blacklist_opt_rom(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t vendor_id, device_id;
@@ -1319,7 +1319,7 @@ static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
     return false;
 }
 
-static void vfio_pci_size_rom(VFIODevice *vdev)
+static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
@@ -1498,7 +1498,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1531,7 +1531,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     if (ranges_overlap(addr, size,
                        quirk->data.address_offset, quirk->data.address_size)) {
@@ -1585,7 +1585,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
     uint64_t data;
@@ -1615,7 +1615,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                                      uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
 
@@ -1660,7 +1660,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data = vfio_pci_read_config(&vdev->pdev,
                                          PCI_BASE_ADDRESS_0 + (4 * 4) + 1,
                                          size);
@@ -1674,7 +1674,7 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1717,7 +1717,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
  * that only read-only access is provided, but we drop writes when the window
  * is enabled to config space nonetheless.
  */
-static void vfio_probe_ati_bar4_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1779,7 +1779,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1821,7 +1821,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1868,7 +1868,7 @@ static const MemoryRegionOps vfio_rtl8168_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1896,7 +1896,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
 /*
  * Trap the BAR2 MMIO window to config space as well.
  */
-static void vfio_probe_ati_bar2_4000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1964,7 +1964,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
                                            hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
                                   addr + quirk->data.base_offset, size);
@@ -1983,7 +1983,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, hwaddr addr,
                                         uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
 
     switch (quirk->data.flags) {
@@ -2030,7 +2030,7 @@ static const MemoryRegionOps vfio_nvidia_3d0_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_nvidia_3d0_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2122,7 +2122,7 @@ static const MemoryRegionOps vfio_nvidia_bar5_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2157,7 +2157,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
                                           uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
 
@@ -2190,7 +2190,7 @@ static const MemoryRegionOps vfio_nvidia_88000_quirk = {
  *
  * Here's offset 0x88000...
  */
-static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2224,7 +2224,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
 /*
  * And here's the same for BAR0 offset 0x1800...
  */
-static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2268,13 +2268,13 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
 /*
  * Common quirk probe entry points.
  */
-static void vfio_vga_quirk_setup(VFIODevice *vdev)
+static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 {
     vfio_vga_probe_ati_3c3_quirk(vdev);
     vfio_vga_probe_nvidia_3d0_quirk(vdev);
 }
 
-static void vfio_vga_quirk_teardown(VFIODevice *vdev)
+static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -2289,7 +2289,7 @@ static void vfio_vga_quirk_teardown(VFIODevice *vdev)
     }
 }
 
-static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
 {
     vfio_probe_ati_bar4_window_quirk(vdev, nr);
     vfio_probe_ati_bar2_4000_quirk(vdev, nr);
@@ -2299,7 +2299,7 @@ static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
     vfio_probe_rtl8168_bar2_window_quirk(vdev, nr);
 }
 
-static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2317,7 +2317,7 @@ static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
  */
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
     memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -2352,7 +2352,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t val_le = cpu_to_le32(val);
 
     DPRINTF("%s(%04x:%02x:%02x.%x, @0x%x, 0x%x, len=0x%x)\n", __func__,
@@ -2709,7 +2709,7 @@ static void vfio_listener_release(VFIOContainer *container)
 /*
  * Interrupt setup
  */
-static void vfio_disable_interrupts(VFIODevice *vdev)
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
 {
     switch (vdev->interrupt) {
     case VFIO_INT_INTx:
@@ -2724,7 +2724,7 @@ static void vfio_disable_interrupts(VFIODevice *vdev)
     }
 }
 
-static int vfio_setup_msi(VFIODevice *vdev, int pos)
+static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
 {
     uint16_t ctrl;
     bool msi_64bit, msi_maskbit;
@@ -2764,7 +2764,7 @@ static int vfio_setup_msi(VFIODevice *vdev, int pos)
  * need to first look for where the MSI-X table lives.  So we
  * unfortunately split MSI-X setup across two functions.
  */
-static int vfio_early_setup_msix(VFIODevice *vdev)
+static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
 {
     uint8_t pos;
     uint16_t ctrl;
@@ -2810,7 +2810,7 @@ static int vfio_early_setup_msix(VFIODevice *vdev)
     return 0;
 }
 
-static int vfio_setup_msix(VFIODevice *vdev, int pos)
+static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
 {
     int ret;
 
@@ -2830,7 +2830,7 @@ static int vfio_setup_msix(VFIODevice *vdev, int pos)
     return 0;
 }
 
-static void vfio_teardown_msi(VFIODevice *vdev)
+static void vfio_teardown_msi(VFIOPCIDevice *vdev)
 {
     msi_uninit(&vdev->pdev);
 
@@ -2843,7 +2843,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
 /*
  * Resource setup
  */
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
 {
     int i;
 
@@ -2861,7 +2861,7 @@ static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
     }
 }
 
-static void vfio_unmap_bar(VFIODevice *vdev, int nr)
+static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2884,7 +2884,7 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
     memory_region_destroy(&bar->mem);
 }
 
-static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
                          MemoryRegion *mem, MemoryRegion *submem,
                          void **map, size_t size, off_t offset,
                          const char *name)
@@ -2922,7 +2922,7 @@ empty_region:
     return ret;
 }
 
-static void vfio_map_bar(VFIODevice *vdev, int nr)
+static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
     unsigned size = bar->size;
@@ -2991,7 +2991,7 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
     vfio_bar_quirk_setup(vdev, nr);
 }
 
-static void vfio_map_bars(VFIODevice *vdev)
+static void vfio_map_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3023,7 +3023,7 @@ static void vfio_map_bars(VFIODevice *vdev)
     }
 }
 
-static void vfio_unmap_bars(VFIODevice *vdev)
+static void vfio_unmap_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3062,7 +3062,7 @@ static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
     pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_word(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_word(VFIOPCIDevice *vdev, int pos,
                                    uint16_t val, uint16_t mask)
 {
     vfio_set_word_bits(vdev->pdev.config + pos, val, mask);
@@ -3075,7 +3075,7 @@ static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask)
     pci_set_long(buf, (pci_get_long(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_long(VFIOPCIDevice *vdev, int pos,
                                    uint32_t val, uint32_t mask)
 {
     vfio_set_long_bits(vdev->pdev.config + pos, val, mask);
@@ -3083,7 +3083,7 @@ static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
     vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
 }
 
-static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size)
 {
     uint16_t flags;
     uint8_t type;
@@ -3175,7 +3175,7 @@ static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
     return pos;
 }
 
-static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
 
@@ -3187,7 +3187,7 @@ static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
 
@@ -3199,7 +3199,7 @@ static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
 
@@ -3211,7 +3211,7 @@ static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint8_t cap_id, next, size;
@@ -3286,7 +3286,7 @@ static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
     return 0;
 }
 
-static int vfio_add_capabilities(VFIODevice *vdev)
+static int vfio_add_capabilities(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
 
@@ -3298,7 +3298,7 @@ static int vfio_add_capabilities(VFIODevice *vdev)
     return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
 }
 
-static void vfio_pci_pre_reset(VFIODevice *vdev)
+static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t cmd;
@@ -3335,7 +3335,7 @@ static void vfio_pci_pre_reset(VFIODevice *vdev)
     vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
-static void vfio_pci_post_reset(VFIODevice *vdev)
+static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
     vfio_enable_intx(vdev);
 }
@@ -3347,7 +3347,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
             host1->slot == host2->slot && host1->function == host2->function);
 }
 
-static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
     VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info;
@@ -3397,7 +3397,7 @@ static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3489,7 +3489,7 @@ out:
     /* Re-enable INTx on affected devices */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3539,12 +3539,12 @@ out_single:
  * _one() will only do a hot reset for the one in-use devices case, calling
  * _multi() will do nothing if a _one() would have been sufficient.
  */
-static int vfio_pci_hot_reset_one(VFIODevice *vdev)
+static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, false);
 }
@@ -3552,7 +3552,7 @@ static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
 static void vfio_pci_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIODevice *vdev;
+    VFIOPCIDevice *vdev;
 
     QLIST_FOREACH(group, &group_list, next) {
         QLIST_FOREACH(vdev, &group->device_list, next) {
@@ -3890,7 +3890,8 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev)
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIOPCIDevice *vdev)
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
@@ -4044,7 +4045,7 @@ error:
     return ret;
 }
 
-static void vfio_put_device(VFIODevice *vdev)
+static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     QLIST_REMOVE(vdev, next);
     vdev->group = NULL;
@@ -4058,7 +4059,7 @@ static void vfio_put_device(VFIODevice *vdev)
 
 static void vfio_err_notifier_handler(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
         return;
@@ -4087,7 +4088,7 @@ static void vfio_err_notifier_handler(void *opaque)
  * and continue after disabling error recovery support for the
  * device.
  */
-static void vfio_register_err_notifier(VFIODevice *vdev)
+static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     int ret;
     int argsz;
@@ -4128,7 +4129,7 @@ static void vfio_register_err_notifier(VFIODevice *vdev)
     g_free(irq_set);
 }
 
-static void vfio_unregister_err_notifier(VFIODevice *vdev)
+static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 {
     int argsz;
     struct vfio_irq_set *irq_set;
@@ -4163,7 +4164,7 @@ static void vfio_unregister_err_notifier(VFIODevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4317,7 +4318,7 @@ out_put:
 
 static void vfio_exitfn(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group = vdev->group;
 
     vfio_unregister_err_notifier(vdev);
@@ -4337,7 +4338,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 static void vfio_pci_reset(DeviceState *dev)
 {
     PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
     DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
             vdev->host.bus, vdev->host.slot, vdev->host.function);
@@ -4369,16 +4370,16 @@ post_reset:
 }
 
 static Property vfio_pci_dev_properties[] = {
-    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIODevice, host),
-    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIODevice,
+    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
+    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
                        intx.mmap_timeout, 1100),
-    DEFINE_PROP_BIT("x-vga", VFIODevice, features,
+    DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
                     VFIO_FEATURE_ENABLE_VGA_BIT, false),
-    DEFINE_PROP_INT32("bootindex", VFIODevice, bootindex, -1),
+    DEFINE_PROP_INT32("bootindex", VFIOPCIDevice, bootindex, -1),
     /*
      * TODO - support passed fds... is this necessary?
-     * DEFINE_PROP_STRING("vfiofd", VFIODevice, vfiofd_name),
-     * DEFINE_PROP_STRING("vfiogroupfd, VFIODevice, vfiogroupfd_name),
+     * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
+     * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
      */
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -4408,7 +4409,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 static const TypeInfo vfio_pci_dev_info = {
     .name = "vfio-pci",
     .parent = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(VFIODevice),
+    .instance_size = sizeof(VFIOPCIDevice),
     .class_init = vfio_pci_dev_class_init,
 };
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 01/10] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 02/10] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-12  2:34   ` David Gibson
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 04/10] hw/vfio/pci: Introduce VFIORegion Eric Auger
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

Introduce the VFIODevice struct that is going to be shared by
VFIOPCIDevice and VFIOPlatformDevice.

Additional fields will be added there later on for review
convenience.

the group's device_list becomes a list of VFIODevice

This obliges to rework the reset_handler which becomes generic and
calls VFIODevice ops that are specialized in each parent object.
Also functions that iterate on this list must take care that the
devices can be something else than VFIOPCIDevice. The type is used
to discriminate them.

we profit from this step to change the prototype of
vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
apply to VFIODevice. They are renamed as *_irqindex.
The index is passed as parameter to anticipate their usage for
platform IRQs

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- fix style issues
- in vfio_initfn, rework allocation of vdev->vbasedev.name and
  replace snprintf by g_strdup_printf
---
 hw/vfio/pci.c | 239 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 146 insertions(+), 93 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c2cdd73..ae827c5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -56,6 +56,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
 struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
@@ -193,9 +198,27 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+};
+
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
-    int fd;
+    VFIODevice vbasedev;
     VFIOINTx intx;
     unsigned int config_size;
     uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
@@ -211,20 +234,16 @@ typedef struct VFIOPCIDevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIOPCIDevice) next;
-    struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
 #define VFIO_FEATURE_ENABLE_VGA_BIT 0
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
     int32_t bootindex;
     uint8_t pm_cap;
-    bool reset_works;
     bool has_vga;
     bool pci_aer;
     bool has_flr;
     bool has_pm_reset;
-    bool needs_reset;
     bool rom_read_failed;
 } VFIOPCIDevice;
 
@@ -232,7 +251,7 @@ typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIOPCIDevice) device_list;
+    QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -285,7 +304,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -295,37 +314,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
         .count = 0,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIOPCIDevice *vdev)
+static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIOPCIDevice *vdev)
+static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 #endif
 
@@ -389,7 +408,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -412,7 +431,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     /* Get to a known interrupt state */
     qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -442,7 +461,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     *pfd = irqfd.resamplefd;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
@@ -450,7 +469,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
     }
 
     /* Let'em rip */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     vdev->intx.kvm_accel = true;
 
@@ -467,7 +486,7 @@ fail_irqfd:
     event_notifier_cleanup(&vdev->intx.unmask);
 fail:
     qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 #endif
 }
 
@@ -488,7 +507,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
      * Get to a known state, hardware masked, QEMU ready to accept new
      * interrupts, QEMU IRQ de-asserted.
      */
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -506,7 +525,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
     vdev->intx.kvm_accel = false;
 
     /* If we've missed an event, let it re-fire through QEMU */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     DPRINTF("%s(%04x:%02x:%02x.%x) KVM INTx accel disabled\n",
             __func__, vdev->host.domain, vdev->host.bus,
@@ -594,7 +613,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
     qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx fd: %m");
@@ -619,7 +638,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
     timer_del(vdev->intx.mmap_timer);
     vfio_disable_intx_kvm(vdev);
-    vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
     vfio_mmap_set_enabled(vdev, true);
@@ -709,7 +728,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
         fds[i] = fd;
     }
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
     g_free(irq_set);
 
@@ -806,7 +825,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
      * increase them as needed.
      */
     if (vdev->nr_vectors < nr + 1) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
         vdev->nr_vectors = nr + 1;
         ret = vfio_enable_vectors(vdev, true);
         if (ret) {
@@ -834,7 +853,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             *pfd = event_notifier_get_fd(&vector->interrupt);
         }
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
         g_free(irq_set);
         if (ret) {
             error_report("vfio: failed to modify vector, %d", ret);
@@ -885,7 +904,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 
         *pfd = event_notifier_get_fd(&vector->interrupt);
 
-        ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
         g_free(irq_set);
     }
@@ -1044,7 +1063,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
     }
 
     if (vdev->nr_vectors) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
     }
 
     vfio_disable_msi_common(vdev);
@@ -1055,7 +1074,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
 
 static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
-    vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
 
     DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain,
@@ -1201,7 +1220,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     off_t off = 0;
     size_t bytes;
 
-    if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
+    if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
         error_report("vfio: Error getting ROM info: %m");
         return;
     }
@@ -1231,7 +1250,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     memset(vdev->rom, 0xff, size);
 
     while (size) {
-        bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
+        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+                      size, vdev->rom_offset + off);
         if (bytes == 0) {
             break;
         } else if (bytes > 0) {
@@ -1325,6 +1345,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
     DeviceState *dev = DEVICE(vdev);
     char name[32];
+    int fd = vdev->vbasedev.fd;
 
     if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
         /* Since pci handles romfile, just print a message and return */
@@ -1343,10 +1364,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
      * Use the same size ROM BAR as the physical device.  The contents
      * will get filled in later when the guest tries to read it.
      */
-    if (pread(vdev->fd, &orig, 4, offset) != 4 ||
-        pwrite(vdev->fd, &size, 4, offset) != 4 ||
-        pread(vdev->fd, &size, 4, offset) != 4 ||
-        pwrite(vdev->fd, &orig, 4, offset) != 4) {
+    if (pread(fd, &orig, 4, offset) != 4 ||
+        pwrite(fd, &size, 4, offset) != 4 ||
+        pread(fd, &size, 4, offset) != 4 ||
+        pwrite(fd, &orig, 4, offset) != 4) {
         error_report("%s(%04x:%02x:%02x.%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function);
@@ -2330,7 +2351,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
     if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
         ssize_t ret;
 
-        ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
+        ret = pread(vdev->vbasedev.fd, &phys_val, len,
+                    vdev->config_offset + addr);
         if (ret != len) {
             error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
                          __func__, vdev->host.domain, vdev->host.bus,
@@ -2360,7 +2382,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
             vdev->host.function, addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
-    if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
+    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
+                != len) {
         error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function, addr, val, len);
@@ -2730,7 +2753,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
     bool msi_64bit, msi_maskbit;
     int ret, entries;
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
@@ -2769,23 +2792,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
     uint8_t pos;
     uint16_t ctrl;
     uint32_t table, pba;
+    int fd = vdev->vbasedev.fd;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
     if (!pos) {
         return 0;
     }
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &table, sizeof(table),
+    if (pread(fd, &table, sizeof(table),
               vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &pba, sizeof(pba),
+    if (pread(fd, &pba, sizeof(pba),
               vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
         return -errno;
     }
@@ -2941,7 +2965,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
              vdev->host.function, nr);
 
     /* Determine what type of BAR this is for registration */
-    ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
+    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
                 vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
     if (ret != sizeof(pci_bar)) {
         error_report("vfio: Failed to read BAR %d (%m)", nr);
@@ -3362,12 +3386,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
             single ? "one" : "multi");
 
     vfio_pci_pre_reset(vdev);
-    vdev->needs_reset = false;
+    vdev->vbasedev.needs_reset = false;
 
     info = g_malloc0(sizeof(*info));
     info->argsz = sizeof(*info);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret && errno != ENOSPC) {
         ret = -errno;
         if (!vdev->has_pm_reset) {
@@ -3383,7 +3407,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     info->argsz = sizeof(*info) + (count * sizeof(*devices));
     devices = &info->devices[0];
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret) {
         ret = -errno;
         error_report("vfio: hot reset info failed: %m");
@@ -3398,6 +3422,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3429,7 +3454,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         }
 
         /* Prep dependent devices for reset and clear our marker. */
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 if (single) {
                     DPRINTF("vfio: found another in-use device "
@@ -3439,7 +3468,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                     goto out_single;
                 }
                 vfio_pci_pre_reset(tmp);
-                tmp->needs_reset = false;
+                tmp->vbasedev.needs_reset = false;
                 multi = true;
                 break;
             }
@@ -3478,7 +3507,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     }
 
     /* Bus reset! */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
     g_free(reset);
 
     DPRINTF("%04x:%02x:%02x.%x hot reset: %s\n", vdev->host.domain,
@@ -3490,6 +3519,7 @@ out:
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3510,7 +3540,11 @@ out:
             break;
         }
 
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 vfio_pci_post_reset(tmp);
                 break;
@@ -3544,28 +3578,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     return vfio_pci_hot_reset(vdev, false);
 }
 
-static void vfio_pci_reset_handler(void *opaque)
+static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+    if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+        vbasedev->needs_reset = true;
+    }
+    return vbasedev->needs_reset;
+}
+
+static VFIODeviceOps vfio_pci_ops = {
+    .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+};
+
+static void vfio_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIOPCIDevice *vdev;
+    VFIODevice *vbasedev;
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
-                vdev->needs_reset = true;
-            }
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
         }
     }
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (vdev->needs_reset) {
-                vfio_pci_hot_reset_multi(vdev);
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
             }
         }
     }
@@ -3854,7 +3901,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
     }
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_register_reset(vfio_pci_reset_handler, NULL);
+        qemu_register_reset(vfio_reset_handler, NULL);
     }
 
     QLIST_INSERT_HEAD(&group_list, group, next);
@@ -3886,7 +3933,7 @@ static void vfio_put_group(VFIOGroup *group)
     g_free(group);
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_unregister_reset(vfio_pci_reset_handler, NULL);
+        qemu_unregister_reset(vfio_reset_handler, NULL);
     }
 }
 
@@ -3907,12 +3954,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         return ret;
     }
 
-    vdev->fd = ret;
-    vdev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vdev, next);
+    vdev->vbasedev.fd = ret;
+    vdev->vbasedev.group = group;
+    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
 
     /* Sanity check device */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
     if (ret) {
         error_report("vfio: error getting device info: %m");
         goto error;
@@ -3926,7 +3973,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
 
     if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
         error_report("vfio: unexpected number of io regions %u",
@@ -3942,7 +3989,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
         if (ret) {
             error_report("vfio: Error getting region %d info: %m", i);
             goto error;
@@ -3956,14 +4003,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         vdev->bars[i].flags = reg_info.flags;
         vdev->bars[i].size = reg_info.size;
         vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->fd;
+        vdev->bars[i].fd = vdev->vbasedev.fd;
         vdev->bars[i].nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
     reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
     if (ret) {
         error_report("vfio: Error getting config info: %m");
         goto error;
@@ -3987,7 +4034,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
             .index = VFIO_PCI_VGA_REGION_INDEX,
          };
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
         if (ret) {
             error_report(
                 "vfio: Device does not support requested feature x-vga");
@@ -4004,7 +4051,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         }
 
         vdev->vga.fd_offset = vga_info.offset;
-        vdev->vga.fd = vdev->fd;
+        vdev->vga.fd = vdev->vbasedev.fd;
 
         vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
         vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
@@ -4022,7 +4069,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     }
     irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
         DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
@@ -4038,19 +4085,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
 
 error:
     if (ret) {
-        QLIST_REMOVE(vdev, next);
-        vdev->group = NULL;
-        close(vdev->fd);
+        QLIST_REMOVE(&vdev->vbasedev, next);
+        vdev->vbasedev.group = NULL;
+        close(vdev->vbasedev.fd);
     }
     return ret;
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(vdev, next);
-    vdev->group = NULL;
+    QLIST_REMOVE(&vdev->vbasedev, next);
+    vdev->vbasedev.group = NULL;
     DPRINTF("vfio_put_device: close vdev->fd\n");
-    close(vdev->fd);
+    close(vdev->vbasedev.fd);
+    g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
@@ -4119,7 +4167,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->err_notifier);
     qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to set up error notification");
         qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
@@ -4152,7 +4200,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
     pfd = (int32_t *)&irq_set->data;
     *pfd = -1;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to de-assign error fd: %m");
     }
@@ -4164,7 +4212,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIODevice *vbasedev_iter;
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4182,6 +4231,13 @@ static int vfio_initfn(PCIDevice *pdev)
         return -errno;
     }
 
+    vdev->vbasedev.ops = &vfio_pci_ops;
+
+    vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
+    g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
+            vdev->host.domain, vdev->host.bus, vdev->host.slot,
+            vdev->host.function);
+
     strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
 
     len = readlink(path, iommu_group_path, sizeof(path));
@@ -4211,12 +4267,8 @@ static int vfio_initfn(PCIDevice *pdev)
             vdev->host.domain, vdev->host.bus, vdev->host.slot,
             vdev->host.function);
 
-    QLIST_FOREACH(pvdev, &group->device_list, next) {
-        if (pvdev->host.domain == vdev->host.domain &&
-            pvdev->host.bus == vdev->host.bus &&
-            pvdev->host.slot == vdev->host.slot &&
-            pvdev->host.function == vdev->host.function) {
-
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
             error_report("vfio: error: device %s is already attached", path);
             vfio_put_group(group);
             return -EBUSY;
@@ -4231,7 +4283,7 @@ static int vfio_initfn(PCIDevice *pdev)
     }
 
     /* Get a copy of config space */
-    ret = pread(vdev->fd, vdev->pdev.config,
+    ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
                 vdev->config_offset);
     if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
@@ -4319,7 +4371,7 @@ out_put:
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
-    VFIOGroup *group = vdev->group;
+    VFIOGroup *group = vdev->vbasedev.group;
 
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
@@ -4345,8 +4397,9 @@ static void vfio_pci_reset(DeviceState *dev)
 
     vfio_pci_pre_reset(vdev);
 
-    if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works &&
+        (vdev->has_flr || !vdev->has_pm_reset) &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         DPRINTF("%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET\n", vdev->host.domain,
             vdev->host.bus, vdev->host.slot, vdev->host.function);
         goto post_reset;
@@ -4358,8 +4411,8 @@ static void vfio_pci_reset(DeviceState *dev)
     }
 
     /* If nothing else works and the device supports PM reset, use it */
-    if (vdev->reset_works && vdev->has_pm_reset &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         DPRINTF("%04x:%02x:%02x.%x PCI PM Reset\n", vdev->host.domain,
             vdev->host.bus, vdev->host.slot, vdev->host.function);
         goto post_reset;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 04/10] hw/vfio/pci: Introduce VFIORegion
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (2 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device Eric Auger
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.

vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.

vfio_mmap_bar becomes vfio_map_region

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- remove fd field from VFIORegion
- change error_report format string in vfio_region_write/read
- remove #ifdef DEBUG_VFIO in the same function
- correct missing initialization of bar region's vbasedev field
- change Object * parameter name of vfio_mmap_region and remove
  useless OBJECT()
---
 hw/vfio/pci.c | 194 +++++++++++++++++++++++++++++++---------------------------
 1 file changed, 103 insertions(+), 91 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ae827c5..1a24398 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -86,15 +86,19 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIOBAR {
-    off_t fd_offset; /* offset of BAR within device fd */
-    int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
     MemoryRegion mem; /* slow, read/write access */
     MemoryRegion mmap_mem; /* direct mapped access */
     void *mmap;
     size_t size;
     uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the BAR number for debug */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOBAR {
+    VFIORegion region;
     bool ioport;
     bool mem64;
     QLIST_HEAD(, VFIOQuirk) quirks;
@@ -214,6 +218,7 @@ typedef struct VFIODevice {
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -397,8 +402,10 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIOPCIDevice *vdev)
+static void vfio_eoi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
     if (!vdev->intx.pending) {
         return;
     }
@@ -408,7 +415,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -563,7 +570,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_enable_intx_kvm(vdev);
 
     /* Re-enable the interrupt in cased we missed an EOI */
-    vfio_eoi(vdev);
+    vfio_eoi(&vdev->vbasedev);
 }
 
 static int vfio_enable_intx(VFIOPCIDevice *vdev)
@@ -1101,10 +1108,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
-static void vfio_bar_write(void *opaque, hwaddr addr,
+static void vfio_region_write(void *opaque, hwaddr addr,
                            uint64_t data, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1127,21 +1135,16 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
         break;
     }
 
-    if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
-                     __func__, addr, data, size);
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-                ", %d)\n", __func__, vdev->host.domain, vdev->host.bus,
-                vdev->host.slot, vdev->host.function, bar->nr, addr,
-                data, size);
-    }
-#endif
+    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+            ", %d)\n", __func__, vbasedev->name,
+            region->nr, addr, data, size);
 
     /*
      * A read or write to a BAR always signals an INTx EOI.  This will
@@ -1151,13 +1154,15 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
+
 }
 
-static uint64_t vfio_bar_read(void *opaque,
+static uint64_t vfio_region_read(void *opaque,
                               hwaddr addr, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1166,9 +1171,10 @@ static uint64_t vfio_bar_read(void *opaque,
     } buf;
     uint64_t data = 0;
 
-    if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, addr, size);
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
         return (uint64_t)-1;
     }
 
@@ -1187,26 +1193,19 @@ static uint64_t vfio_bar_read(void *opaque,
         break;
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        DPRINTF("%s(%04x:%02x:%02x.%x:BAR%d+0x%"HWADDR_PRIx
-                ", %d) = 0x%"PRIx64"\n", __func__, vdev->host.domain,
-                vdev->host.bus, vdev->host.slot, vdev->host.function,
-                bar->nr, addr, size, data);
-    }
-#endif
+    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
+            __func__, vdev->name,
+            region->nr, addr, size, data);
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
 
     return data;
 }
 
-static const MemoryRegionOps vfio_bar_ops = {
-    .read = vfio_bar_read,
-    .write = vfio_bar_write,
+static const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
@@ -1541,8 +1540,8 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                 vdev->host.bus, vdev->host.slot, vdev->host.function,
                 quirk->data.bar, addr, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar],
-                             addr + quirk->data.base_offset, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + quirk->data.base_offset, size);
     }
 
     return data;
@@ -1592,7 +1591,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
         return;
     }
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar],
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
                    addr + quirk->data.base_offset, data, size);
 }
 
@@ -1626,7 +1625,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                 vdev->host.bus, vdev->host.slot, vdev->host.function,
                 quirk->data.bar, addr + base, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + base, size);
     }
 
     return data;
@@ -1655,7 +1655,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                 vdev->host.domain, vdev->host.bus, vdev->host.slot,
                 vdev->host.function, quirk->data.bar, addr + base, data, size);
     } else {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -1708,7 +1709,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
      * As long as the BAR is >= 256 bytes it will be aligned such that the
      * lower byte is always zero.  Filter out anything else, if it exists.
      */
-    if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
+    if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
         return;
     }
 
@@ -1761,7 +1762,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_generic_window_quirk, quirk,
                           "vfio-ati-bar4-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.base_offset, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1835,7 +1836,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
             memory_region_name(&quirk->mem), vdev->host.domain,
             vdev->host.bus, vdev->host.slot, vdev->host.function);
 
-    return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
+    return vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                            addr + 0x70, size);
 }
 
 static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
@@ -1875,7 +1877,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
             memory_region_name(&quirk->mem), vdev->host.domain,
             vdev->host.bus, vdev->host.slot, vdev->host.function);
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                      addr + 0x70, data, size);
 }
 
 static const MemoryRegionOps vfio_rtl8168_window_quirk = {
@@ -1905,7 +1908,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
                           quirk, "vfio-rtl8168-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                                         0x70, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1938,7 +1941,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-ati-bar2-4000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2057,7 +2060,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
     VFIOQuirk *quirk;
 
     if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
-        !vdev->bars[1].size) {
+        !vdev->bars[1].region.size) {
         return;
     }
 
@@ -2165,7 +2168,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_nvidia_bar5_window_quirk, quirk,
                           "vfio-nvidia-bar5-window-quirk", 16);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+                                        0, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
@@ -2192,7 +2196,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
      */
     if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
         vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -2231,7 +2236,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2257,7 +2262,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     /* Log the chipset ID */
     DPRINTF("Nvidia NV%02x\n",
-            (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
+            (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
+                           & 0xff);
 
     quirk = g_malloc0(sizeof(*quirk));
     quirk->vdev = vdev;
@@ -2269,7 +2275,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-nvidia-bar0-1800-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2326,7 +2332,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 
     while (!QLIST_EMPTY(&bar->quirks)) {
         VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
-        memory_region_del_subregion(&bar->mem, &quirk->mem);
+        memory_region_del_subregion(&bar->region.mem, &quirk->mem);
         memory_region_destroy(&quirk->mem);
         QLIST_REMOVE(quirk, next);
         g_free(quirk);
@@ -2839,9 +2845,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
     int ret;
 
     ret = msix_init(&vdev->pdev, vdev->msix->entries,
-                    &vdev->bars[vdev->msix->table_bar].mem,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
                     vdev->msix->table_bar, vdev->msix->table_offset,
-                    &vdev->bars[vdev->msix->pba_bar].mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem,
                     vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
@@ -2859,8 +2865,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
     msi_uninit(&vdev->pdev);
 
     if (vdev->msix) {
-        msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
-                    &vdev->bars[vdev->msix->pba_bar].mem);
+        msix_uninit(&vdev->pdev,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem);
     }
 }
 
@@ -2874,11 +2881,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
     for (i = 0; i < PCI_ROM_SLOT; i++) {
         VFIOBAR *bar = &vdev->bars[i];
 
-        if (!bar->size) {
+        if (!bar->region.size) {
             continue;
         }
 
-        memory_region_set_enabled(&bar->mmap_mem, enabled);
+        memory_region_set_enabled(&bar->region.mmap_mem, enabled);
         if (vdev->msix && vdev->msix->table_bar == i) {
             memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
         }
@@ -2889,56 +2896,58 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
-    if (!bar->size) {
+    if (!bar->region.size) {
         return;
     }
 
     vfio_bar_quirk_teardown(vdev, nr);
 
-    memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
-    munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
-    memory_region_destroy(&bar->mmap_mem);
+    memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
+    munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
+    memory_region_destroy(&bar->region.mmap_mem);
 
     if (vdev->msix && vdev->msix->table_bar == nr) {
-        memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
+        memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
         munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
         memory_region_destroy(&vdev->msix->mmap_mem);
     }
 
-    memory_region_destroy(&bar->mem);
+    memory_region_destroy(&bar->region.mem);
 }
 
-static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_region(Object *obj, VFIORegion *region,
                          MemoryRegion *mem, MemoryRegion *submem,
                          void **map, size_t size, off_t offset,
                          const char *name)
 {
     int ret = 0;
+    VFIODevice *vbasedev = region->vbasedev;
 
-    if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
         int prot = 0;
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
             prot |= PROT_READ;
         }
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
             prot |= PROT_WRITE;
         }
 
         *map = mmap(NULL, size, prot, MAP_SHARED,
-                    bar->fd, bar->fd_offset + offset);
+                    vbasedev->fd, region->fd_offset + offset);
         if (*map == MAP_FAILED) {
             *map = NULL;
             ret = -errno;
             goto empty_region;
         }
 
-        memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
     } else {
 empty_region:
         /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, OBJECT(vdev), name, 0);
+        memory_region_init(submem, obj, name, 0);
     }
 
     memory_region_add_subregion(mem, offset, submem);
@@ -2949,7 +2958,7 @@ empty_region:
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
-    unsigned size = bar->size;
+    unsigned size = bar->region.size;
     char name[64];
     uint32_t pci_bar;
     uint8_t type;
@@ -2979,9 +2988,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
                                     ~PCI_BASE_ADDRESS_MEM_MASK);
 
     /* A "slow" read/write mapping underlies all BARs */
-    memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
+    memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
                           bar, name, size);
-    pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
+    pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
 
     /*
      * We can't mmap areas overlapping the MSIX vector table, so we
@@ -2992,8 +3001,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
     }
 
     strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
-    if (vfio_mmap_bar(vdev, bar, &bar->mem,
-                      &bar->mmap_mem, &bar->mmap, size, 0, name)) {
+    if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                      &bar->region.mmap_mem, &bar->region.mmap,
+                      size, 0, name)) {
         error_report("%s unsupported. Performance may be slow", name);
     }
 
@@ -3003,10 +3013,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
         start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
                                 (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
 
-        size = start < bar->size ? bar->size - start : 0;
+        size = start < bar->region.size ? bar->region.size - start : 0;
         strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
         /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
-        if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
+        if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                          &vdev->msix->mmap_mem,
                           &vdev->msix->mmap, size, start, name)) {
             error_report("%s unsupported. Performance may be slow", name);
         }
@@ -3596,6 +3607,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
 static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+    .vfio_eoi = vfio_eoi,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -4000,11 +4012,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
                 (unsigned long)reg_info.size, (unsigned long)reg_info.offset,
                 (unsigned long)reg_info.flags);
 
-        vdev->bars[i].flags = reg_info.flags;
-        vdev->bars[i].size = reg_info.size;
-        vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->vbasedev.fd;
-        vdev->bars[i].nr = i;
+        vdev->bars[i].region.vbasedev = &vdev->vbasedev;
+        vdev->bars[i].region.flags = reg_info.flags;
+        vdev->bars[i].region.size = reg_info.size;
+        vdev->bars[i].region.fd_offset = reg_info.offset;
+        vdev->bars[i].region.nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (3 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 04/10] hw/vfio/pci: Introduce VFIORegion Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-12  2:41   ` David Gibson
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

vfio_get_device now takes a VFIODevice as argument. The function is split
into 4 functional parts: dev_info query, device check, region populate
and interrupt populate. the last 3 are specialized by parent device and
are added into DeviceOps.

3 new fields are introduced in VFIODevice to store dev_info.

vfio_put_base_device is created.

---

v4->v5:
- cleanup up of error handling and get/put operations in
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts and
  vfio_get_device.
  - correct misuse of errno
  - vfio_populate_regions always returns 0
  - VFIODevice .name deallocation done in vfio_put_device instead of
    vfio_put_base_device
  - vfio_put_base_device done at vfio_get_device level.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 181 ++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 120 insertions(+), 61 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1a24398..5f218b7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -213,12 +213,18 @@ typedef struct VFIODevice {
     bool reset_works;
     bool needs_reset;
     VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
 } VFIODevice;
 
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
     void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_check_device)(VFIODevice *vdev);
+    int (*vfio_populate_regions)(VFIODevice *vdev);
+    int (*vfio_populate_interrupts)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -305,6 +311,10 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_put_base_device(VFIODevice *vbasedev);
+static int vfio_check_device(VFIODevice *vbasedev);
+static int vfio_populate_regions(VFIODevice *vbasedev);
+static int vfio_populate_interrupts(VFIODevice *vbasedev);
 
 /*
  * Common VFIO interrupt disable
@@ -3608,6 +3618,9 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
     .vfio_eoi = vfio_eoi,
+    .vfio_check_device = vfio_check_device,
+    .vfio_populate_regions = vfio_populate_regions,
+    .vfio_populate_interrupts = vfio_populate_interrupts,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -3949,54 +3962,52 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIOPCIDevice *vdev)
+static int vfio_check_device(VFIODevice *vbasedev)
 {
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
-    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-    int ret, i;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-pci "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
+    if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
+        error_report("vfio: Um, this isn't a PCI device");
+        goto error;
     }
-
-    vdev->vbasedev.fd = ret;
-    vdev->vbasedev.group = group;
-    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
-
-    /* Sanity check device */
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
+    if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
+        error_report("vfio: unexpected number of io regions %u",
+                     vbasedev->num_regions);
         goto error;
     }
-
-    DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
-            dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
-
-    if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
-        error_report("vfio: Um, this isn't a PCI device");
+    if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
+        error_report("vfio: unexpected number of irqs %u",
+                     vbasedev->num_irqs);
         goto error;
     }
+    return 0;
+error:
+    return -1;
+}
 
-    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+static int vfio_populate_interrupts(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+    int ret;
+    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+    irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
-    if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
-        error_report("vfio: unexpected number of io regions %u",
-                     dev_info.num_regions);
-        goto error;
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    if (ret) {
+        /* This can fail for an old kernel or legacy PCI dev */
+        DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
+    } else if (irq_info.count == 1) {
+        vdev->pci_aer = true;
+    } else {
+        error_report("vfio: %s Could not enable error recovery for the device",
+                     vbasedev->name);
     }
+    return 0;
+}
 
-    if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
-        error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
-        goto error;
-    }
+static int vfio_populate_regions(VFIODevice *vbasedev)
+{
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    int i, ret = 0;
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
@@ -4007,7 +4018,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
             goto error;
         }
 
-        DPRINTF("Device %s region %d:\n", name, i);
+        DPRINTF("Device %s region %d:\n", vbasedev->name, i);
         DPRINTF("  size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n",
                 (unsigned long)reg_info.size, (unsigned long)reg_info.offset,
                 (unsigned long)reg_info.flags);
@@ -4028,7 +4039,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    DPRINTF("Device %s config:\n", name);
+    DPRINTF("Device %s config:\n", vbasedev->name);
     DPRINTF("  size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n",
             (unsigned long)reg_info.size, (unsigned long)reg_info.offset,
             (unsigned long)reg_info.flags);
@@ -4040,7 +4051,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     vdev->config_offset = reg_info.offset;
 
     if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
-        dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+        vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
         struct vfio_region_info vga_info = {
             .argsz = sizeof(vga_info),
             .index = VFIO_PCI_VGA_REGION_INDEX,
@@ -4059,6 +4070,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
             error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
                          (unsigned long)vga_info.flags,
                          (unsigned long)vga_info.size);
+            ret = -1;
             goto error;
         }
 
@@ -4079,42 +4091,89 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
 
         vdev->has_vga = true;
     }
-    irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
+error:
+    return ret;
+}
+
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+    int ret;
 
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-pci "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
     if (ret) {
-        /* This can fail for an old kernel or legacy PCI dev */
-        DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure: %m\n");
-        ret = 0;
-    } else if (irq_info.count == 1) {
-        vdev->pci_aer = true;
-    } else {
-        error_report("vfio: %04x:%02x:%02x.%x "
-                     "Could not enable error recovery for the device",
-                     vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                     vdev->host.function);
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
+            dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    /* call device specific functions */
+    ret = vbasedev->ops->vfio_check_device(vbasedev);
+    if (ret) {
+        error_report("vfio: error when checking device %s\n",
+                     vbasedev->name);
+        goto error;
+    }
+    ret = vbasedev->ops->vfio_populate_regions(vbasedev);
+    if (ret) {
+        error_report("vfio: error when populating regions of device %s\n",
+                     vbasedev->name);
+        goto error;
+    }
+    ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
+    if (ret) {
+        error_report("vfio: error when populating interrupts of device %s\n",
+                     vbasedev->name);
+        goto error;
     }
 
 error:
     if (ret) {
-        QLIST_REMOVE(&vdev->vbasedev, next);
-        vdev->vbasedev.group = NULL;
-        close(vdev->vbasedev.fd);
+        vfio_put_base_device(vbasedev);
     }
     return ret;
 }
 
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    DPRINTF("vfio_put_base_device: close vdev->fd\n");
+    close(vbasedev->fd);
+}
+
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(&vdev->vbasedev, next);
-    vdev->vbasedev.group = NULL;
-    DPRINTF("vfio_put_device: close vdev->fd\n");
-    close(vdev->vbasedev.fd);
     g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
     }
+    vfio_put_base_device(&vdev->vbasedev);
 }
 
 static void vfio_err_notifier_handler(void *opaque)
@@ -4287,7 +4346,7 @@ static int vfio_initfn(PCIDevice *pdev)
         }
     }
 
-    ret = vfio_get_device(group, path, vdev);
+    ret = vfio_get_device(group, path, &vdev->vbasedev);
     if (ret) {
         error_report("vfio: failed to get device %s", path);
         vfio_put_group(group);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (4 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-11 19:20   ` Alex Williamson
                     ` (2 more replies)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support Eric Auger
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
	agraf, stuart.yoder, Bharat.Bhushan, alex.williamson,
	joel.schopp, a.motakis, kvmarm

A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).

This patch only consists in move (no functional changes)

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4 -> v5:
- integrate "sPAPR/IOMMU: Fix TCE entry permission"
- VFIOdevice .name dealloc removed from vfio_put_base_device
- add some includes according to vfio inclusion policy

v3 -> v4:
[Eric Auger]
move done after all PCI modifications to anticipate for
VFIO Platform needs. Purpose is to alleviate the whole
review process.

<= v3
First split done by Kim Phillips
---
 hw/vfio/Makefile.objs         |    1 +
 hw/vfio/common.c              |  990 ++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 1070 +----------------------------------------
 include/hw/vfio/vfio-common.h |  151 ++++++
 4 files changed, 1147 insertions(+), 1065 deletions(-)
 create mode 100644 hw/vfio/common.c
 create mode 100644 include/hw/vfio/vfio-common.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 31c7dab..e31f30e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,3 +1,4 @@
 ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
new file mode 100644
index 0000000..297c508
--- /dev/null
+++ b/hw/vfio/common.c
@@ -0,0 +1,990 @@
+/*
+ * generic functions used by VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/vfio.h>
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/hw.h"
+#include "qemu/error-report.h"
+#include "sysemu/kvm.h"
+
+QLIST_HEAD(, VFIOGroup)
+    group_list = QLIST_HEAD_INITIALIZER(group_list);
+
+QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
+    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+
+#ifdef CONFIG_KVM
+/*
+ * We have a single VFIO pseudo device per KVM VM.  Once created it lives
+ * for the life of the VM.  Closing the file descriptor only drops our
+ * reference to it and the device's reference to kvm.  Therefore once
+ * initialized, this file descriptor is only released on QEMU exit and
+ * we'll re-use it should another vfio device be attached before then.
+ */
+static int vfio_kvm_device_fd = -1;
+#endif
+
+/*
+ * Common VFIO interrupt disable
+ */
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+        .index = index,
+        .start = 0,
+        .count = 0,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+#endif
+
+/*
+ * IO Port/MMIO - Beware of the endians, VFIO is always little endian
+ */
+void vfio_region_write(void *opaque, hwaddr addr,
+                       uint64_t data, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+
+    switch (size) {
+    case 1:
+        buf.byte = data;
+        break;
+    case 2:
+        buf.word = data;
+        break;
+    case 4:
+        buf.dword = data;
+        break;
+    default:
+        hw_error("vfio: unsupported write size, %d bytes", size);
+        break;
+    }
+
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
+    }
+
+    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+            ", %d)\n", __func__, vbasedev->name,
+            region->nr, addr, data, size);
+
+    /*
+     * A read or write to a BAR always signals an INTx EOI.  This will
+     * do nothing if not pending (including not in INTx mode).  We assume
+     * that a BAR access is in response to an interrupt and that BAR
+     * accesses will service the interrupt.  Unfortunately, we don't know
+     * which access will service the interrupt, so we're potentially
+     * getting quite a few host interrupts per guest interrupt.
+     */
+    vbasedev->ops->vfio_eoi(vbasedev);
+}
+
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+    uint64_t data = 0;
+
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
+        return (uint64_t)-1;
+    }
+
+    switch (size) {
+    case 1:
+        data = buf.byte;
+        break;
+    case 2:
+        data = buf.word;
+        break;
+    case 4:
+        data = buf.dword;
+        break;
+    default:
+        hw_error("vfio: unsupported read size, %d bytes", size);
+        break;
+    }
+
+    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
+            __func__, vbasedev->name,
+            region->nr, addr, size, data);
+
+    /* Same as write above */
+    vbasedev->ops->vfio_eoi(vbasedev);
+
+    return data;
+}
+
+const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_dma_unmap(VFIOContainer *container,
+                          hwaddr iova, ram_addr_t size)
+{
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = 0,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        DPRINTF("VFIO_UNMAP_DMA: %d\n", -errno);
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+                        ram_addr_t size, void *vaddr, bool readonly)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_READ,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (!readonly) {
+        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
+    }
+
+    /*
+     * Try the mapping, if it fails with EBUSY, unmap the region and try
+     * again.  This shouldn't be necessary, but we sometimes see it in
+     * the the VGA ROM space.
+     */
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
+        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+        return 0;
+    }
+
+    DPRINTF("VFIO_MAP_DMA: %d\n", -errno);
+    return -errno;
+}
+
+static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+{
+    return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_iommu(section->mr)) ||
+           /*
+            * Sizing an enabled 64-bit BAR can cause spurious mappings to
+            * addresses in the upper part of the 64-bit address space.  These
+            * are never accessed by the CPU and beyond the address width of
+            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
+            */
+           section->offset_within_address_space & (1ULL << 63);
+}
+
+static void vfio_iommu_map_notify(Notifier *n, void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUTLBEntry *iotlb = data;
+    MemoryRegion *mr;
+    hwaddr xlat;
+    hwaddr len = iotlb->addr_mask + 1;
+    void *vaddr;
+    int ret;
+
+    DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+            iotlb->iova, iotlb->iova + iotlb->addr_mask);
+
+    /*
+     * The IOMMU TLB entry we have just covers translation through
+     * this IOMMU to its immediate target.  We need to translate
+     * it the rest of the way through to memory.
+     */
+    mr = address_space_translate(&address_space_memory,
+                                 iotlb->translated_addr,
+                                 &xlat, &len, iotlb->perm & IOMMU_WO);
+    if (!memory_region_is_ram(mr)) {
+        DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
+                xlat);
+        return;
+    }
+    /*
+     * Translation truncates length to the IOMMU page size,
+     * check that it did not truncate too much.
+     */
+    if (len & iotlb->addr_mask) {
+        DPRINTF("iommu has granularity incompatible with target AS\n");
+        return;
+    }
+
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        vaddr = memory_region_get_ram_ptr(mr) + xlat;
+
+        ret = vfio_dma_map(container, iotlb->iova,
+                           iotlb->addr_mask + 1, vaddr,
+                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, vaddr, ret);
+        }
+    } else {
+        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
+        if (ret) {
+            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, ret);
+        }
+    }
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    Int128 llend;
+    void *vaddr;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return;
+    }
+
+    memory_region_ref(section->mr);
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+                iova, int128_get64(int128_sub(llend, int128_one())));
+        /*
+         * FIXME: We should do some checking to see if the
+         * capabilities of the host VFIO IOMMU are adequate to model
+         * the guest IOMMU
+         *
+         * FIXME: For VFIO iommu types which have KVM acceleration to
+         * avoid bouncing all map/unmaps through qemu this way, this
+         * would be the right place to wire that up (tell the KVM
+         * device emulation the VFIO iommu handles to use).
+         */
+        /*
+         * This assumes that the guest IOMMU is empty of
+         * mappings at this point.
+         *
+         * One way of doing this is:
+         * 1. Avoid sharing IOMMUs between emulated devices or different
+         * IOMMU groups.
+         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
+         * there are some mappings in IOMMU.
+         *
+         * VFIO on SPAPR does that. Other IOMMU models may do that different,
+         * they must make sure there are no existing mappings or
+         * loop through existing mappings to map them into VFIO.
+         */
+        giommu = g_malloc0(sizeof(*giommu));
+        giommu->iommu = section->mr;
+        giommu->container = container;
+        giommu->n.notify = vfio_iommu_map_notify;
+        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+        return;
+    }
+
+    /* Here we assume that memory_region_is_ram(section->mr)==true */
+
+    end = int128_get64(llend);
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+
+    DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
+            iova, end - 1, vaddr);
+
+    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+    if (ret) {
+        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                     container, iova, end - iova, vaddr, ret);
+
+        /*
+         * On the initfn path, store the first error in the container so we
+         * can gracefully fail.  Runtime, there's not much we can do other
+         * than throw a hardware error.
+         */
+        if (!container->iommu_data.type1.initialized) {
+            if (!container->iommu_data.type1.error) {
+                container->iommu_data.type1.error = ret;
+            }
+        } else {
+            hw_error("vfio: DMA mapping failed, unable to continue");
+        }
+    }
+}
+
+static void vfio_listener_region_del(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+            if (giommu->iommu == section->mr) {
+                memory_region_unregister_iommu_notifier(&giommu->n);
+                QLIST_REMOVE(giommu, giommu_next);
+                g_free(giommu);
+                break;
+            }
+        }
+
+        /*
+         * FIXME: We assume the one big unmap below is adequate to
+         * remove any individual page mappings in the IOMMU which
+         * might have been copied into VFIO. This works for a page table
+         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
+         * That may not be true for all IOMMU types.
+         */
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    end = (section->offset_within_address_space + int128_get64(section->size)) &
+          TARGET_PAGE_MASK;
+
+    if (iova >= end) {
+        return;
+    }
+
+    DPRINTF("region_del %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
+            iova, end - 1);
+
+    ret = vfio_dma_unmap(container, iova, end - iova);
+    memory_region_unref(section->mr);
+    if (ret) {
+        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx") = %d (%m)",
+                     container, iova, end - iova, ret);
+    }
+}
+
+const MemoryListener vfio_memory_listener = {
+    .region_add = vfio_listener_region_add,
+    .region_del = vfio_listener_region_del,
+};
+
+void vfio_listener_release(VFIOContainer *container)
+{
+    memory_listener_unregister(&container->iommu_data.type1.listener);
+}
+
+int vfio_mmap_region(Object *obj, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name)
+{
+    int ret = 0;
+
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
+        int prot = 0;
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
+            prot |= PROT_READ;
+        }
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+            prot |= PROT_WRITE;
+        }
+
+        *map = mmap(NULL, size, prot, MAP_SHARED,
+                    region->vbasedev->fd,
+                    region->fd_offset + offset);
+        if (*map == MAP_FAILED) {
+            *map = NULL;
+            ret = -errno;
+            goto empty_region;
+        }
+
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
+    } else {
+empty_region:
+        /* Create a zero sized sub-region to make cleanup easy. */
+        memory_region_init(submem, obj, name, 0);
+    }
+
+    memory_region_add_subregion(mem, offset, submem);
+
+    return ret;
+}
+
+void vfio_reset_handler(void *opaque)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
+        }
+    }
+
+    QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
+            }
+        }
+    }
+}
+
+static void vfio_kvm_device_add_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_ADD,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (!kvm_enabled()) {
+        return;
+    }
+
+    if (vfio_kvm_device_fd < 0) {
+        struct kvm_create_device cd = {
+            .type = KVM_DEV_TYPE_VFIO,
+        };
+
+        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+            DPRINTF("KVM_CREATE_DEVICE: %m\n");
+            return;
+        }
+
+        vfio_kvm_device_fd = cd.fd;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to add group %d to KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static void vfio_kvm_device_del_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_DEL,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (vfio_kvm_device_fd < 0) {
+        return;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to remove group %d from KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
+{
+    VFIOAddressSpace *space;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        if (space->as == as) {
+            return space;
+        }
+    }
+
+    /* No suitable VFIOAddressSpace, create a new one */
+    space = g_malloc0(sizeof(*space));
+    space->as = as;
+    QLIST_INIT(&space->containers);
+
+    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
+
+    return space;
+}
+
+static void vfio_put_address_space(VFIOAddressSpace *space)
+{
+    if (QLIST_EMPTY(&space->containers)) {
+        QLIST_REMOVE(space, list);
+        g_free(space);
+    }
+}
+
+static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
+{
+    VFIOContainer *container;
+    int ret, fd;
+    VFIOAddressSpace *space;
+
+    space = vfio_get_address_space(as);
+
+    QLIST_FOREACH(container, &space->containers, next) {
+        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            group->container = container;
+            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            return 0;
+        }
+    }
+
+    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    if (fd < 0) {
+        error_report("vfio: failed to open /dev/vfio/vfio: %m");
+        ret = -errno;
+        goto put_space_exit;
+    }
+
+    ret = ioctl(fd, VFIO_GET_API_VERSION);
+    if (ret != VFIO_API_VERSION) {
+        error_report("vfio: supported vfio version: %d, "
+                     "reported version: %d", VFIO_API_VERSION, ret);
+        ret = -EINVAL;
+        goto close_fd_exit;
+    }
+
+    container = g_malloc0(sizeof(*container));
+    container->space = space;
+    container->fd = fd;
+
+    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 &address_space_memory);
+
+        if (container->iommu_data.type1.error) {
+            ret = container->iommu_data.type1.error;
+            error_report("vfio: memory listener initialization failed for container");
+            goto listener_release_exit;
+        }
+
+        container->iommu_data.type1.initialized = true;
+
+    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        /*
+         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
+         * when container fd is closed so we do not call it explicitly
+         * in this file.
+         */
+        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+        if (ret) {
+            error_report("vfio: failed to enable container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 container->space->as);
+
+    } else {
+        error_report("vfio: No available IOMMU models");
+        ret = -EINVAL;
+        goto free_container_exit;
+    }
+
+    QLIST_INIT(&container->group_list);
+    QLIST_INSERT_HEAD(&space->containers, container, next);
+
+    group->container = container;
+    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
+    return 0;
+
+listener_release_exit:
+    vfio_listener_release(container);
+
+free_container_exit:
+    g_free(container);
+
+close_fd_exit:
+    close(fd);
+
+put_space_exit:
+    vfio_put_address_space(space);
+
+    return ret;
+}
+
+static void vfio_disconnect_container(VFIOGroup *group)
+{
+    VFIOContainer *container = group->container;
+
+    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
+        error_report("vfio: error disconnecting group %d from container",
+                     group->groupid);
+    }
+
+    QLIST_REMOVE(group, container_next);
+    group->container = NULL;
+
+    if (QLIST_EMPTY(&container->group_list)) {
+        VFIOAddressSpace *space = container->space;
+
+        if (container->iommu_data.release) {
+            container->iommu_data.release(container);
+        }
+        QLIST_REMOVE(container, next);
+        DPRINTF("vfio_disconnect_container: close container->fd\n");
+        close(container->fd);
+        g_free(container);
+
+        vfio_put_address_space(space);
+    }
+}
+
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
+{
+    VFIOGroup *group;
+    char path[32];
+    struct vfio_group_status status = { .argsz = sizeof(status) };
+
+    QLIST_FOREACH(group, &group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if (group->container->space->as == as) {
+                return group;
+            } else {
+                error_report("vfio: group %d used in multiple address spaces",
+                             group->groupid);
+                return NULL;
+            }
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
+    group->fd = qemu_open(path, O_RDWR);
+    if (group->fd < 0) {
+        error_report("vfio: error opening %s: %m", path);
+        goto free_group_exit;
+    }
+
+    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
+        error_report("vfio: error getting group status: %m");
+        goto close_fd_exit;
+    }
+
+    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+        error_report("vfio: error, group %d is not viable, please ensure "
+                     "all devices within the iommu_group are bound to their "
+                     "vfio bus driver.", groupid);
+        goto close_fd_exit;
+    }
+
+    group->groupid = groupid;
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as)) {
+        error_report("vfio: failed to setup container for group %d", groupid);
+        goto close_fd_exit;
+    }
+
+    if (QLIST_EMPTY(&group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&group_list, group, next);
+
+    vfio_kvm_device_add_group(group);
+
+    return group;
+
+close_fd_exit:
+    close(group->fd);
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+void vfio_put_group(VFIOGroup *group)
+{
+    if (!QLIST_EMPTY(&group->device_list)) {
+        return;
+    }
+
+    vfio_kvm_device_del_group(group);
+    vfio_disconnect_container(group);
+    QLIST_REMOVE(group, next);
+    DPRINTF("vfio_put_group: close group->fd\n");
+    close(group->fd);
+    g_free(group);
+
+    if (QLIST_EMPTY(&group_list)) {
+        qemu_unregister_reset(vfio_reset_handler, NULL);
+    }
+}
+
+int vfio_get_device(VFIOGroup *group, const char *name,
+                       VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+    int ret;
+
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-pci "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
+            dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    /* call device specific functions */
+    ret = vbasedev->ops->vfio_check_device(vbasedev);
+    if (ret) {
+        error_report("vfio: error when checking device %s\n",
+                     vbasedev->name);
+        goto error;
+    }
+    ret = vbasedev->ops->vfio_populate_regions(vbasedev);
+    if (ret) {
+        error_report("vfio: error when populating regions of device %s\n",
+                     vbasedev->name);
+        goto error;
+    }
+    ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
+    if (ret) {
+        error_report("vfio: error when populating interrupts of device %s\n",
+                     vbasedev->name);
+        goto error;
+    }
+
+error:
+    if (ret) {
+        vfio_put_base_device(vbasedev);
+    }
+    return ret;
+}
+
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    DPRINTF("vfio_put_base_device: close vdev->fd\n");
+    close(vbasedev->fd);
+}
+
+static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
+                                   int req, void *param)
+{
+    VFIOGroup *group;
+    VFIOContainer *container;
+    int ret = -1;
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        error_report("vfio: group %d not registered", groupid);
+        return ret;
+    }
+
+    container = group->container;
+    if (group->container) {
+        ret = ioctl(container->fd, req, param);
+        if (ret < 0) {
+            error_report("vfio: failed to ioctl container: ret=%d, %s",
+                         ret, strerror(errno));
+        }
+    }
+
+    vfio_put_group(group);
+
+    return ret;
+}
+
+int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+                         int req, void *param)
+{
+    /* We allow only certain ioctls to the container */
+    switch (req) {
+    case VFIO_CHECK_EXTENSION:
+    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+        break;
+    default:
+        /* Return an error on unknown requests */
+        error_report("vfio: unsupported ioctl %X", req);
+        return -1;
+    }
+
+    return vfio_container_do_ioctl(as, groupid, req, param);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5f218b7..d2ccb3b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -39,27 +39,12 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
-#include "hw/vfio/vfio.h"
+#include "hw/vfio/vfio-common.h"
 
-/* #define DEBUG_VFIO */
-#ifdef DEBUG_VFIO
-#define DPRINTF(fmt, ...) \
-    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-    do { } while (0)
-#endif
-
-/* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
-#define VFIO_ALLOW_KVM_INTX 1
-#define VFIO_ALLOW_KVM_MSI 1
-#define VFIO_ALLOW_KVM_MSIX 1
-
-enum {
-    VFIO_DEVICE_TYPE_PCI = 0,
-    VFIO_DEVICE_TYPE_PLATFORM = 1,
-};
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(, VFIOGroup) group_list;
+extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
 
 struct VFIOPCIDevice;
 
@@ -86,17 +71,6 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIORegion {
-    struct VFIODevice *vbasedev;
-    off_t fd_offset; /* offset of region within device fd */
-    MemoryRegion mem; /* slow, read/write access */
-    MemoryRegion mmap_mem; /* direct mapped access */
-    void *mmap;
-    size_t size;
-    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the region number for debug */
-} VFIORegion;
-
 typedef struct VFIOBAR {
     VFIORegion region;
     bool ioport;
@@ -152,45 +126,6 @@ enum {
     VFIO_INT_MSIX = 3,
 };
 
-typedef struct VFIOAddressSpace {
-    AddressSpace *as;
-    QLIST_HEAD(, VFIOContainer) containers;
-    QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
-    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
-
-struct VFIOGroup;
-
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
-typedef struct VFIOContainer {
-    VFIOAddressSpace *space;
-    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
-    } iommu_data;
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
-    QLIST_HEAD(, VFIOGroup) group_list;
-    QLIST_ENTRY(VFIOContainer) next;
-} VFIOContainer;
-
-typedef struct VFIOGuestIOMMU {
-    VFIOContainer *container;
-    MemoryRegion *iommu;
-    Notifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
 /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
 typedef struct VFIOMSIXInfo {
     uint8_t table_bar;
@@ -202,31 +137,6 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODeviceOps VFIODeviceOps;
-
-typedef struct VFIODevice {
-    QLIST_ENTRY(VFIODevice) next;
-    struct VFIOGroup *group;
-    char *name;
-    int fd;
-    int type;
-    bool reset_works;
-    bool needs_reset;
-    VFIODeviceOps *ops;
-    unsigned int num_irqs;
-    unsigned int num_regions;
-    unsigned int flags;
-} VFIODevice;
-
-struct VFIODeviceOps {
-    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
-    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
-    void (*vfio_eoi)(VFIODevice *vdev);
-    int (*vfio_check_device)(VFIODevice *vdev);
-    int (*vfio_populate_regions)(VFIODevice *vdev);
-    int (*vfio_populate_interrupts)(VFIODevice *vdev);
-};
-
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     VFIODevice vbasedev;
@@ -258,15 +168,6 @@ typedef struct VFIOPCIDevice {
     bool rom_read_failed;
 } VFIOPCIDevice;
 
-typedef struct VFIOGroup {
-    int fd;
-    int groupid;
-    VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
-    QLIST_ENTRY(VFIOGroup) next;
-    QLIST_ENTRY(VFIOGroup) container_next;
-} VFIOGroup;
-
 typedef struct VFIORomBlacklistEntry {
     uint16_t vendor_id;
     uint16_t device_id;
@@ -292,78 +193,16 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 
 #define MSIX_CAP_LENGTH 12
 
-static QLIST_HEAD(, VFIOGroup)
-    group_list = QLIST_HEAD_INITIALIZER(group_list);
-
-#ifdef CONFIG_KVM
-/*
- * We have a single VFIO pseudo device per KVM VM.  Once created it lives
- * for the life of the VM.  Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm.  Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
- * we'll re-use it should another vfio device be attached before then.
- */
-static int vfio_kvm_device_fd = -1;
-#endif
-
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
-static void vfio_put_base_device(VFIODevice *vbasedev);
 static int vfio_check_device(VFIODevice *vbasedev);
 static int vfio_populate_regions(VFIODevice *vbasedev);
 static int vfio_populate_interrupts(VFIODevice *vbasedev);
 
 /*
- * Common VFIO interrupt disable
- */
-static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
-        .index = index,
-        .start = 0,
-        .count = 0,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-/*
- * INTx
- */
-static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-#endif
-
-/*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
  * also be a huge overhead.  We try to get the best of both worlds by
  * waiting until an interrupt to disable mmaps (subsequent transitions
@@ -1115,110 +954,6 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
     }
 }
 
-/*
- * IO Port/MMIO - Beware of the endians, VFIO is always little endian
- */
-static void vfio_region_write(void *opaque, hwaddr addr,
-                           uint64_t data, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-
-    switch (size) {
-    case 1:
-        buf.byte = data;
-        break;
-    case 2:
-        buf.word = data;
-        break;
-    case 4:
-        buf.dword = data;
-        break;
-    default:
-        hw_error("vfio: unsupported write size, %d bytes", size);
-        break;
-    }
-
-    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-                     ",%d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, data, size);
-    }
-
-    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-            ", %d)\n", __func__, vbasedev->name,
-            region->nr, addr, data, size);
-
-    /*
-     * A read or write to a BAR always signals an INTx EOI.  This will
-     * do nothing if not pending (including not in INTx mode).  We assume
-     * that a BAR access is in response to an interrupt and that BAR
-     * accesses will service the interrupt.  Unfortunately, we don't know
-     * which access will service the interrupt, so we're potentially
-     * getting quite a few host interrupts per guest interrupt.
-     */
-    vbasedev->ops->vfio_eoi(vbasedev);
-
-}
-
-static uint64_t vfio_region_read(void *opaque,
-                              hwaddr addr, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-    uint64_t data = 0;
-
-    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, size);
-        return (uint64_t)-1;
-    }
-
-    switch (size) {
-    case 1:
-        data = buf.byte;
-        break;
-    case 2:
-        data = buf.word;
-        break;
-    case 4:
-        data = buf.dword;
-        break;
-    default:
-        hw_error("vfio: unsupported read size, %d bytes", size);
-        break;
-    }
-
-    DPRINTF("%s(%s:region%d+0x%"HWADDR_PRIx", %d) = 0x%"PRIx64"\n",
-            __func__, vdev->name,
-            region->nr, addr, size, data);
-
-    /* Same as write above */
-    vbasedev->ops->vfio_eoi(vbasedev);
-
-    return data;
-}
-
-static const MemoryRegionOps vfio_region_ops = {
-    .read = vfio_region_read,
-    .write = vfio_region_write,
-    .endianness = DEVICE_NATIVE_ENDIAN,
-};
-
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
@@ -2445,307 +2180,6 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
 }
 
 /*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_dma_unmap(VFIOContainer *container,
-                          hwaddr iova, ram_addr_t size)
-{
-    struct vfio_iommu_type1_dma_unmap unmap = {
-        .argsz = sizeof(unmap),
-        .flags = 0,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        DPRINTF("VFIO_UNMAP_DMA: %d\n", -errno);
-        return -errno;
-    }
-
-    return 0;
-}
-
-static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                        ram_addr_t size, void *vaddr, bool readonly)
-{
-    struct vfio_iommu_type1_dma_map map = {
-        .argsz = sizeof(map),
-        .flags = VFIO_DMA_MAP_FLAG_READ,
-        .vaddr = (__u64)(uintptr_t)vaddr,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (!readonly) {
-        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
-    }
-
-    /*
-     * Try the mapping, if it fails with EBUSY, unmap the region and try
-     * again.  This shouldn't be necessary, but we sometimes see it in
-     * the the VGA ROM space.
-     */
-    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
-         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
-        return 0;
-    }
-
-    DPRINTF("VFIO_MAP_DMA: %d\n", -errno);
-    return -errno;
-}
-
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
-{
-    return (!memory_region_is_ram(section->mr) &&
-            !memory_region_is_iommu(section->mr)) ||
-           /*
-            * Sizing an enabled 64-bit BAR can cause spurious mappings to
-            * addresses in the upper part of the 64-bit address space.  These
-            * are never accessed by the CPU and beyond the address width of
-            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
-            */
-           section->offset_within_address_space & (1ULL << 63);
-}
-
-static void vfio_iommu_map_notify(Notifier *n, void *data)
-{
-    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainer *container = giommu->container;
-    IOMMUTLBEntry *iotlb = data;
-    MemoryRegion *mr;
-    hwaddr xlat;
-    hwaddr len = iotlb->addr_mask + 1;
-    void *vaddr;
-    int ret;
-
-    DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
-            iotlb->iova, iotlb->iova + iotlb->addr_mask);
-
-    /*
-     * The IOMMU TLB entry we have just covers translation through
-     * this IOMMU to its immediate target.  We need to translate
-     * it the rest of the way through to memory.
-     */
-    mr = address_space_translate(&address_space_memory,
-                                 iotlb->translated_addr,
-                                 &xlat, &len, iotlb->perm & IOMMU_WO);
-    if (!memory_region_is_ram(mr)) {
-        DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
-                xlat);
-        return;
-    }
-    /*
-     * Translation truncates length to the IOMMU page size,
-     * check that it did not truncate too much.
-     */
-    if (len & iotlb->addr_mask) {
-        DPRINTF("iommu has granularity incompatible with target AS\n");
-        return;
-    }
-
-    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-        vaddr = memory_region_get_ram_ptr(mr) + xlat;
-
-        ret = vfio_dma_map(container, iotlb->iova,
-                           iotlb->addr_mask + 1, vaddr,
-                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
-        if (ret) {
-            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, vaddr, ret);
-        }
-    } else {
-        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
-        if (ret) {
-            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, ret);
-        }
-    }
-}
-
-static void vfio_listener_region_add(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    Int128 llend;
-    void *vaddr;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    llend = int128_make64(section->offset_within_address_space);
-    llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
-
-    if (int128_ge(int128_make64(iova), llend)) {
-        return;
-    }
-
-    memory_region_ref(section->mr);
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
-                iova, int128_get64(int128_sub(llend, int128_one())));
-        /*
-         * FIXME: We should do some checking to see if the
-         * capabilities of the host VFIO IOMMU are adequate to model
-         * the guest IOMMU
-         *
-         * FIXME: For VFIO iommu types which have KVM acceleration to
-         * avoid bouncing all map/unmaps through qemu this way, this
-         * would be the right place to wire that up (tell the KVM
-         * device emulation the VFIO iommu handles to use).
-         */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->container = container;
-        giommu->n.notify = vfio_iommu_map_notify;
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-
-        return;
-    }
-
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-    end = int128_get64(llend);
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
-
-    DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
-            iova, end - 1, vaddr);
-
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
-
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
-        }
-    }
-}
-
-static void vfio_listener_region_del(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-            if (giommu->iommu == section->mr) {
-                memory_region_unregister_iommu_notifier(&giommu->n);
-                QLIST_REMOVE(giommu, giommu_next);
-                g_free(giommu);
-                break;
-            }
-        }
-
-        /*
-         * FIXME: We assume the one big unmap below is adequate to
-         * remove any individual page mappings in the IOMMU which
-         * might have been copied into VFIO. This works for a page table
-         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
-         * That may not be true for all IOMMU types.
-         */
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    end = (section->offset_within_address_space + int128_get64(section->size)) &
-          TARGET_PAGE_MASK;
-
-    if (iova >= end) {
-        return;
-    }
-
-    DPRINTF("region_del %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
-            iova, end - 1);
-
-    ret = vfio_dma_unmap(container, iova, end - iova);
-    memory_region_unref(section->mr);
-    if (ret) {
-        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx") = %d (%m)",
-                     container, iova, end - iova, ret);
-    }
-}
-
-static MemoryListener vfio_memory_listener = {
-    .region_add = vfio_listener_region_add,
-    .region_del = vfio_listener_region_del,
-};
-
-static void vfio_listener_release(VFIOContainer *container)
-{
-    memory_listener_unregister(&container->iommu_data.type1.listener);
-}
-
-/*
  * Interrupt setup
  */
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
@@ -2925,46 +2359,6 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
     memory_region_destroy(&bar->region.mem);
 }
 
-static int vfio_mmap_region(Object *obj, VFIORegion *region,
-                         MemoryRegion *mem, MemoryRegion *submem,
-                         void **map, size_t size, off_t offset,
-                         const char *name)
-{
-    int ret = 0;
-    VFIODevice *vbasedev = region->vbasedev;
-
-    if (VFIO_ALLOW_MMAP && size && region->flags &
-        VFIO_REGION_INFO_FLAG_MMAP) {
-        int prot = 0;
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
-            prot |= PROT_READ;
-        }
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
-            prot |= PROT_WRITE;
-        }
-
-        *map = mmap(NULL, size, prot, MAP_SHARED,
-                    vbasedev->fd, region->fd_offset + offset);
-        if (*map == MAP_FAILED) {
-            *map = NULL;
-            ret = -errno;
-            goto empty_region;
-        }
-
-        memory_region_init_ram_ptr(submem, obj, name, size, *map);
-    } else {
-empty_region:
-        /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, obj, name, 0);
-    }
-
-    memory_region_add_subregion(mem, offset, submem);
-
-    return ret;
-}
-
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
@@ -3623,345 +3017,6 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_populate_interrupts = vfio_populate_interrupts,
 };
 
-static void vfio_reset_handler(void *opaque)
-{
-    VFIOGroup *group;
-    VFIODevice *vbasedev;
-
-    QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
-        }
-    }
-
-    QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            if (vbasedev->needs_reset) {
-                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
-            }
-        }
-    }
-}
-
-static void vfio_kvm_device_add_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_ADD,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (!kvm_enabled()) {
-        return;
-    }
-
-    if (vfio_kvm_device_fd < 0) {
-        struct kvm_create_device cd = {
-            .type = KVM_DEV_TYPE_VFIO,
-        };
-
-        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
-            DPRINTF("KVM_CREATE_DEVICE: %m\n");
-            return;
-        }
-
-        vfio_kvm_device_fd = cd.fd;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to add group %d to KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static void vfio_kvm_device_del_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_DEL,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (vfio_kvm_device_fd < 0) {
-        return;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to remove group %d from KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
-{
-    VFIOAddressSpace *space;
-
-    QLIST_FOREACH(space, &vfio_address_spaces, list) {
-        if (space->as == as) {
-            return space;
-        }
-    }
-
-    /* No suitable VFIOAddressSpace, create a new one */
-    space = g_malloc0(sizeof(*space));
-    space->as = as;
-    QLIST_INIT(&space->containers);
-
-    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
-
-    return space;
-}
-
-static void vfio_put_address_space(VFIOAddressSpace *space)
-{
-    if (QLIST_EMPTY(&space->containers)) {
-        QLIST_REMOVE(space, list);
-        g_free(space);
-    }
-}
-
-static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
-{
-    VFIOContainer *container;
-    int ret, fd;
-    VFIOAddressSpace *space;
-
-    space = vfio_get_address_space(as);
-
-    QLIST_FOREACH(container, &space->containers, next) {
-        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-            group->container = container;
-            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-            return 0;
-        }
-    }
-
-    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
-    if (fd < 0) {
-        error_report("vfio: failed to open /dev/vfio/vfio: %m");
-        ret = -errno;
-        goto put_space_exit;
-    }
-
-    ret = ioctl(fd, VFIO_GET_API_VERSION);
-    if (ret != VFIO_API_VERSION) {
-        error_report("vfio: supported vfio version: %d, "
-                     "reported version: %d", VFIO_API_VERSION, ret);
-        ret = -EINVAL;
-        goto close_fd_exit;
-    }
-
-    container = g_malloc0(sizeof(*container));
-    container->space = space;
-    container->fd = fd;
-
-    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 &address_space_memory);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
-    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        /*
-         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
-         * when container fd is closed so we do not call it explicitly
-         * in this file.
-         */
-        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-        if (ret) {
-            error_report("vfio: failed to enable container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-    } else {
-        error_report("vfio: No available IOMMU models");
-        ret = -EINVAL;
-        goto free_container_exit;
-    }
-
-    QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
-
-    group->container = container;
-    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
-    return 0;
-
-listener_release_exit:
-    vfio_listener_release(container);
-
-free_container_exit:
-    g_free(container);
-
-close_fd_exit:
-    close(fd);
-
-put_space_exit:
-    vfio_put_address_space(space);
-
-    return ret;
-}
-
-static void vfio_disconnect_container(VFIOGroup *group)
-{
-    VFIOContainer *container = group->container;
-
-    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
-        error_report("vfio: error disconnecting group %d from container",
-                     group->groupid);
-    }
-
-    QLIST_REMOVE(group, container_next);
-    group->container = NULL;
-
-    if (QLIST_EMPTY(&container->group_list)) {
-        VFIOAddressSpace *space = container->space;
-
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
-        QLIST_REMOVE(container, next);
-        DPRINTF("vfio_disconnect_container: close container->fd\n");
-        close(container->fd);
-        g_free(container);
-
-        vfio_put_address_space(space);
-    }
-}
-
-static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
-{
-    VFIOGroup *group;
-    char path[32];
-    struct vfio_group_status status = { .argsz = sizeof(status) };
-
-    QLIST_FOREACH(group, &group_list, next) {
-        if (group->groupid == groupid) {
-            /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
-                return group;
-            } else {
-                error_report("vfio: group %d used in multiple address spaces",
-                             group->groupid);
-                return NULL;
-            }
-        }
-    }
-
-    group = g_malloc0(sizeof(*group));
-
-    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open(path, O_RDWR);
-    if (group->fd < 0) {
-        error_report("vfio: error opening %s: %m", path);
-        goto free_group_exit;
-    }
-
-    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
-        error_report("vfio: error getting group status: %m");
-        goto close_fd_exit;
-    }
-
-    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
-        error_report("vfio: error, group %d is not viable, please ensure "
-                     "all devices within the iommu_group are bound to their "
-                     "vfio bus driver.", groupid);
-        goto close_fd_exit;
-    }
-
-    group->groupid = groupid;
-    QLIST_INIT(&group->device_list);
-
-    if (vfio_connect_container(group, as)) {
-        error_report("vfio: failed to setup container for group %d", groupid);
-        goto close_fd_exit;
-    }
-
-    if (QLIST_EMPTY(&group_list)) {
-        qemu_register_reset(vfio_reset_handler, NULL);
-    }
-
-    QLIST_INSERT_HEAD(&group_list, group, next);
-
-    vfio_kvm_device_add_group(group);
-
-    return group;
-
-close_fd_exit:
-    close(group->fd);
-
-free_group_exit:
-    g_free(group);
-
-    return NULL;
-}
-
-static void vfio_put_group(VFIOGroup *group)
-{
-    if (!QLIST_EMPTY(&group->device_list)) {
-        return;
-    }
-
-    vfio_kvm_device_del_group(group);
-    vfio_disconnect_container(group);
-    QLIST_REMOVE(group, next);
-    DPRINTF("vfio_put_group: close group->fd\n");
-    close(group->fd);
-    g_free(group);
-
-    if (QLIST_EMPTY(&group_list)) {
-        qemu_unregister_reset(vfio_reset_handler, NULL);
-    }
-}
-
 static int vfio_check_device(VFIODevice *vbasedev)
 {
     if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
@@ -4095,77 +3150,6 @@ error:
     return ret;
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIODevice *vbasedev)
-{
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
-    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-    int ret;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-pci "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
-    }
-
-    vbasedev->fd = ret;
-    vbasedev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
-
-    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
-        goto error;
-    }
-
-    vbasedev->num_irqs = dev_info.num_irqs;
-    vbasedev->num_regions = dev_info.num_regions;
-    vbasedev->flags = dev_info.flags;
-
-    DPRINTF("Device %s flags: %u, regions: %u, irgs: %u\n", name,
-            dev_info.flags, dev_info.num_regions, dev_info.num_irqs);
-
-    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-    /* call device specific functions */
-    ret = vbasedev->ops->vfio_check_device(vbasedev);
-    if (ret) {
-        error_report("vfio: error when checking device %s\n",
-                     vbasedev->name);
-        goto error;
-    }
-    ret = vbasedev->ops->vfio_populate_regions(vbasedev);
-    if (ret) {
-        error_report("vfio: error when populating regions of device %s\n",
-                     vbasedev->name);
-        goto error;
-    }
-    ret = vbasedev->ops->vfio_populate_interrupts(vbasedev);
-    if (ret) {
-        error_report("vfio: error when populating interrupts of device %s\n",
-                     vbasedev->name);
-        goto error;
-    }
-
-error:
-    if (ret) {
-        vfio_put_base_device(vbasedev);
-    }
-    return ret;
-}
-
-void vfio_put_base_device(VFIODevice *vbasedev)
-{
-    QLIST_REMOVE(vbasedev, next);
-    vbasedev->group = NULL;
-    DPRINTF("vfio_put_base_device: close vdev->fd\n");
-    close(vbasedev->fd);
-}
-
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     g_free(vdev->vbasedev.name);
@@ -4543,47 +3527,3 @@ static void register_vfio_pci_dev_type(void)
 }
 
 type_init(register_vfio_pci_dev_type)
-
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
-                                   int req, void *param)
-{
-    VFIOGroup *group;
-    VFIOContainer *container;
-    int ret = -1;
-
-    group = vfio_get_group(groupid, as);
-    if (!group) {
-        error_report("vfio: group %d not registered", groupid);
-        return ret;
-    }
-
-    container = group->container;
-    if (group->container) {
-        ret = ioctl(container->fd, req, param);
-        if (ret < 0) {
-            error_report("vfio: failed to ioctl container: ret=%d, %s",
-                         ret, strerror(errno));
-        }
-    }
-
-    vfio_put_group(group);
-
-    return ret;
-}
-
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
-                         int req, void *param)
-{
-    /* We allow only certain ioctls to the container */
-    switch (req) {
-    case VFIO_CHECK_EXTENSION:
-    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
-        break;
-    default:
-        /* Return an error on unknown requests */
-        error_report("vfio: unsupported ioctl %X", req);
-        return -1;
-    }
-
-    return vfio_container_do_ioctl(as, groupid, req, param);
-}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
new file mode 100644
index 0000000..4684ee5
--- /dev/null
+++ b/include/hw/vfio/vfio-common.h
@@ -0,0 +1,151 @@
+/*
+ * common header for vfio based device assignment support
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+#ifndef HW_VFIO_VFIO_COMMON_H
+#define HW_VFIO_VFIO_COMMON_H
+
+#include "qemu-common.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "qemu/notify.h"
+
+/*#define DEBUG_VFIO*/
+#ifdef DEBUG_VFIO
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Extra debugging, trap acceleration paths for more logging */
+#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_KVM_INTX 1
+#define VFIO_ALLOW_KVM_MSI 1
+#define VFIO_ALLOW_KVM_MSIX 1
+
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
+    MemoryRegion mem; /* slow, read/write access */
+    MemoryRegion mmap_mem; /* direct mapped access */
+    void *mmap;
+    size_t size;
+    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOAddressSpace {
+    AddressSpace *as;
+    QLIST_HEAD(, VFIOContainer) containers;
+    QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
+struct VFIOGroup;
+
+typedef struct VFIOType1 {
+    MemoryListener listener;
+    int error;
+    bool initialized;
+} VFIOType1;
+
+typedef struct VFIOContainer {
+    VFIOAddressSpace *space;
+    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
+    struct {
+        /* enable abstraction to support various iommu backends */
+        union {
+            VFIOType1 type1;
+        };
+        void (*release)(struct VFIOContainer *);
+    } iommu_data;
+    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOGroup) group_list;
+    QLIST_ENTRY(VFIOContainer) next;
+} VFIOContainer;
+
+typedef struct VFIOGuestIOMMU {
+    VFIOContainer *container;
+    MemoryRegion *iommu;
+    Notifier n;
+    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_check_device)(VFIODevice *vdev);
+    int (*vfio_populate_regions)(VFIODevice *vdev);
+    int (*vfio_populate_interrupts)(VFIODevice *vdev);
+};
+
+typedef struct VFIOGroup {
+    int fd;
+    int groupid;
+    VFIOContainer *container;
+    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_ENTRY(VFIOGroup) next;
+    QLIST_ENTRY(VFIOGroup) container_next;
+} VFIOGroup;
+
+void vfio_put_base_device(VFIODevice *vbasedev);
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index);
+#ifdef CONFIG_KVM
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index);
+#endif
+void vfio_region_write(void *opaque, hwaddr addr,
+                           uint64_t data, unsigned size);
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size);
+void vfio_listener_release(VFIOContainer *container);
+int vfio_mmap_region(Object *vdev, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name);
+void vfio_reset_handler(void *opaque);
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
+void vfio_put_group(VFIOGroup *group);
+int vfio_get_device(VFIOGroup *group, const char *name,
+                    VFIODevice *vbasedev);
+
+#endif /* !HW_VFIO_VFIO_COMMON_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (5 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-11  9:36   ` Alexander Graf
  2014-08-11 20:13   ` Alex Williamson
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd Eric Auger
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, Kim Phillips, patches, will.deacon,
	agraf, stuart.yoder, Bharat.Bhushan, alex.williamson,
	joel.schopp, a.motakis, kvmarm

Minimal VFIO platform implementation supporting
- register space user mapping,
- IRQ assignment based on eventfds handled on qemu side.

irqfd kernel acceleration comes in a subsequent patch.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4 -> v5:
- vfio-plaform.h included first
- cleanup error handling in *populate*, vfio_get_device,
  vfio_enable_intp
- vfio_put_device not called anymore
- add some includes to follow vfio policy

v3 -> v4:
[Eric Auger]
- merge of "vfio: Add initial IRQ support in platform device"
  to get a full functional patch although perfs are limited.
- removal of unrealize function since I currently understand
  it is only used with device hot-plug feature.

v2 -> v3:
[Eric Auger]
- further factorization between PCI and platform (VFIORegion,
  VFIODevice). same level of functionality.

<= v2:
[Kim Philipps]
- Initial Creation of the device supporting register space mapping
---
 hw/vfio/Makefile.objs           |   1 +
 hw/vfio/platform.c              | 517 ++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  77 ++++++
 3 files changed, 595 insertions(+)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-platform.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..c5c76fe 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,5 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_SOFTMMU) += platform.o
 endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
new file mode 100644
index 0000000..f1a1b55
--- /dev/null
+++ b/hw/vfio/platform.c
@@ -0,0 +1,517 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#include <linux/vfio.h>
+#include <sys/ioctl.h>
+
+#include "hw/vfio/vfio-platform.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "hw/sysbus.h"
+
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(, VFIOGroup) group_list;
+extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
+void vfio_put_device(VFIOPlatformDevice *vdev);
+
+/*
+ * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
+ * is not a QOM Object and cannot be passed to memory region functions
+*/
+static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
+{
+    VFIORegion *region = vdev->regions[nr];
+    unsigned size = region->size;
+    char name[64];
+
+    if (!size) {
+        return;
+    }
+
+    snprintf(name, sizeof(name), "VFIO %s region %d",
+             vdev->vbasedev.name, nr);
+
+    /* A "slow" read/write mapping underlies all regions */
+    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
+                          region, name, size);
+
+    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
+
+    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
+                         &region->mmap_mem, &region->mmap, size, 0, name)) {
+        error_report("%s unsupported. Performance may be slow", name);
+    }
+}
+
+static void print_regions(VFIOPlatformDevice *vdev)
+{
+    int i;
+
+    DPRINTF("Device \"%s\" counts %d region(s):\n",
+             vdev->vbasedev.name, vdev->vbasedev.num_regions);
+
+    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
+                "fd= %d, offset = 0x%lx\n",
+                vdev->regions[i]->nr,
+                (unsigned long)vdev->regions[i]->flags,
+                (unsigned long)vdev->regions[i]->size,
+                vdev->regions[i]->vbasedev->fd,
+                (unsigned long)vdev->regions[i]->fd_offset);
+    }
+}
+
+static int vfio_populate_regions(VFIODevice *vbasedev)
+{
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    int i, ret = 0;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
+        reg_info.index = i;
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        if (ret) {
+            error_report("vfio: Error getting region %d info: %m", i);
+            goto error;
+        }
+
+        vdev->regions[i]->flags = reg_info.flags;
+        vdev->regions[i]->size = reg_info.size;
+        vdev->regions[i]->fd_offset = reg_info.offset;
+        vdev->regions[i]->nr = i;
+        vdev->regions[i]->vbasedev = vbasedev;
+    }
+    print_regions(vdev);
+error:
+    return ret;
+}
+
+/* not implemented yet */
+static int vfio_platform_check_device(VFIODevice *vdev)
+{
+    return 0;
+}
+
+/* not implemented yet */
+static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
+{
+return false;
+}
+
+/* not implemented yet */
+static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
+{
+return 0;
+}
+
+/*
+ * eoi function is called on the first access to any MMIO region
+ * after an IRQ was triggered. It is assumed this access corresponds
+ * to the IRQ status register reset.
+ * With such a mechanism, a single IRQ can be handled at a time since
+ * there is no way to know which IRQ was completed by the guest.
+ * (we would need additional details about the IRQ status register mask)
+ */
+static void vfio_platform_eoi(VFIODevice *vbasedev)
+{
+    VFIOINTp *intp;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->state == VFIO_IRQ_ACTIVE) {
+            DPRINTF("EOI IRQ #%d fd=%d\n",
+                    intp->pin, event_notifier_get_fd(&intp->interrupt));
+            intp->state = VFIO_IRQ_INACTIVE;
+
+            /* deassert the virtual IRQ and unmask physical one */
+            qemu_set_irq(intp->qemuirq, 0);
+            vfio_unmask_irqindex(vbasedev, intp->pin);
+
+            /* a single IRQ can be active at a time */
+            break;
+        }
+    }
+
+    /* in case there are pending IRQs, handle them one at a time */
+    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
+        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
+        vfio_intp_interrupt(intp);
+        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
+    }
+}
+
+/*
+ * enable/disable the fast path mode
+ * fast path = MMIO region is mmaped (no KVM TRAP)
+ * slow path = MMIO region is trapped and region callbacks are called
+ * slow path enables to trap the IRQ status register guest reset
+*/
+
+static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
+{
+    VFIORegion *region;
+    int i;
+
+    DPRINTF("fast path = %d\n", enabled);
+
+    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+        region = vdev->regions[i];
+
+        /* register space is unmapped to trap EOI */
+        memory_region_set_enabled(&region->mmap_mem, enabled);
+    }
+}
+
+/*
+ * Checks whether the IRQ is still pending. In the negative
+ * the fast path mode (where reg space is mmaped) can be restored.
+ * if the IRQ is still pending, we must keep on trapping IRQ status
+ * register reset with mmap disabled (slow path).
+ * the function is called on mmap_timer event.
+ * by construction a single fd is handled at a time. See EOI comment
+ * for additional details.
+ */
+static void vfio_intp_mmap_enable(void *opaque)
+{
+    VFIOINTp *tmp;
+    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
+
+    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+        if (tmp->state == VFIO_IRQ_ACTIVE) {
+            DPRINTF("IRQ #%d still active, stay in slow path\n",
+                    tmp->pin);
+            timer_mod(vdev->mmap_timer,
+                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                          vdev->mmap_timeout);
+            return;
+        }
+    }
+    DPRINTF("no active IRQ, restore fast path\n");
+    vfio_mmap_set_enabled(vdev, true);
+}
+
+/*
+ * The fd handler
+ */
+void vfio_intp_interrupt(void *opaque)
+{
+    int ret;
+    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
+    VFIOPlatformDevice *vdev = intp->vdev;
+    bool one_active_irq = false;
+
+    /*
+     * first check whether there is a pending IRQ
+     * in the positive the new IRQ cannot be handled until the
+     * active one is not completed.
+     * by construction the same IRQ as the pending one cannot hit
+     * since the physical IRQ was disabled by the VFIO driver
+     */
+    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+        if (tmp->state == VFIO_IRQ_ACTIVE) {
+            one_active_irq = true;
+            break;
+        }
+    }
+    if (one_active_irq) {
+        /*
+         * the new IRQ gets a pending status and is pushed in
+         * the pending queue
+         */
+        intp->state = VFIO_IRQ_PENDING;
+        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
+                             intp, pqnext);
+        return;
+    }
+
+    /* no active IRQ, the new IRQ can be forwarded to the guest */
+    DPRINTF("Handle IRQ #%d (fd = %d)\n",
+            intp->pin, event_notifier_get_fd(&intp->interrupt));
+
+    ret = event_notifier_test_and_clear(&intp->interrupt);
+    if (!ret) {
+        DPRINTF("Error when clearing fd=%d\n",
+                event_notifier_get_fd(&intp->interrupt));
+    }
+
+    intp->state = VFIO_IRQ_ACTIVE;
+
+    /* sets slow path */
+    vfio_mmap_set_enabled(vdev, false);
+
+    /* trigger the virtual IRQ */
+    qemu_set_irq(intp->qemuirq, 1);
+
+    /* schedule the mmap timer which will restore mmap path after EOI*/
+    if (vdev->mmap_timeout) {
+        timer_mod(vdev->mmap_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                      vdev->mmap_timeout);
+    }
+}
+
+static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
+{
+    struct vfio_irq_set *irq_set;
+    int32_t *pfd;
+    int ret, argsz;
+    int device = vbasedev->fd;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
+    VFIOINTp *intp;
+
+    /* allocate and populate a new VFIOINTp structure put in a queue list */
+    intp = g_malloc0(sizeof(*intp));
+    intp->vdev = vdev;
+    intp->pin = index;
+    intp->state = VFIO_IRQ_INACTIVE;
+    sysbus_init_irq(sbdev, &intp->qemuirq);
+
+    ret = event_notifier_init(&intp->interrupt, 0);
+    if (ret) {
+        g_free(intp);
+        error_report("vfio: Error: event_notifier_init failed ");
+        return ret;
+    }
+
+    /* build the irq_set to be passed to the vfio kernel driver */
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+    irq_set->index = index;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+
+    *pfd = event_notifier_get_fd(&intp->interrupt);
+
+    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
+
+    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
+
+    /*
+     * pass the index/fd binding to the kernel driver so that it
+     * triggers this fd on HW IRQ
+     */
+    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret) {
+        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
+        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+        event_notifier_cleanup(&intp->interrupt);
+        return -errno;
+    }
+
+    /* store the new intp in qlist */
+    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
+    return 0;
+}
+
+static int vfio_populate_interrupts(VFIODevice *vbasedev)
+{
+    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+    int i, ret;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                    vfio_intp_mmap_enable, vdev);
+
+    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        irq.index = i;
+
+        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
+
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+        if (ret) {
+            /* This can fail for an old kernel or legacy PCI dev */
+            error_printf("vfio: error getting device %s irq info",
+                         vbasedev->name);
+        } else {
+            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
+                    irq.index, irq.count, irq.flags);
+
+            ret = vfio_enable_intp(vbasedev, irq.index);
+            if (ret) {
+                error_report("vfio: Error setting IRQ %d up", i);
+                return ret;
+            }
+        }
+    }
+    return 0;
+}
+
+static VFIODeviceOps vfio_platform_ops = {
+    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
+    .vfio_eoi = vfio_platform_eoi,
+    .vfio_check_device = vfio_platform_check_device,
+    .vfio_populate_regions = vfio_populate_regions,
+    .vfio_populate_interrupts = vfio_populate_interrupts,
+};
+
+static int vfio_base_device_init(VFIODevice *vbasedev)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev_iter;
+    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+    ssize_t len;
+    struct stat st;
+    int groupid;
+    int ret;
+
+    /* name must be set prior to the call */
+    if (!vbasedev->name) {
+        return -EINVAL;
+    }
+
+    /* Check that the host device exists */
+    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
+             vbasedev->name);
+
+    if (stat(path, &st) < 0) {
+        error_report("vfio: error: no such host device: %s", path);
+        return -errno;
+    }
+
+    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
+    len = readlink(path, iommu_group_path, sizeof(path));
+    if (len <= 0 || len >= sizeof(path)) {
+        error_report("vfio: error no iommu_group for device");
+        return len < 0 ? -errno : ENAMETOOLONG;
+    }
+
+    iommu_group_path[len] = 0;
+    group_name = basename(iommu_group_path);
+
+    if (sscanf(group_name, "%d", &groupid) != 1) {
+        error_report("vfio: error reading %s: %m", path);
+        return -errno;
+    }
+
+    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
+
+    group = vfio_get_group(groupid, &address_space_memory);
+    if (!group) {
+        error_report("vfio: failed to get group %d", groupid);
+        return -ENOENT;
+    }
+
+    snprintf(path, sizeof(path), "%s", vbasedev->name);
+
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
+            error_report("vfio: error: device %s is already attached", path);
+            vfio_put_group(group);
+            return -EBUSY;
+        }
+    }
+    ret = vfio_get_device(group, path, vbasedev);
+    if (ret) {
+        error_report("vfio: failed to get device %s", path);
+        vfio_put_group(group);
+    }
+ return ret;
+}
+
+void vfio_put_device(VFIOPlatformDevice *vdev)
+{
+    unsigned int i;
+    VFIODevice *vbasedev = &vdev->vbasedev;
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+            g_free(vdev->regions[i]);
+    }
+    g_free(vdev->regions);
+    g_free(vdev->vbasedev.name);
+    vfio_put_base_device(&vdev->vbasedev);
+}
+
+static void vfio_platform_realize(DeviceState *dev, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    int i, ret;
+
+    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
+    vbasedev->ops = &vfio_platform_ops;
+
+    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
+
+    ret = vfio_base_device_init(vbasedev);
+    if (ret) {
+        return;
+    }
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vfio_map_region(vdev, i);
+        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
+    }
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+    .name = TYPE_VFIO_PLATFORM,
+    .unmigratable = 1,
+};
+
+static Property vfio_platform_dev_properties[] = {
+    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
+    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
+    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
+                       mmap_timeout, 1100),
+    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vfio_platform_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = vfio_platform_realize;
+    dc->props = vfio_platform_dev_properties;
+    dc->vmsd = &vfio_platform_vmstate;
+    dc->desc = "VFIO-based platform device assignment";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vfio_platform_dev_info = {
+    .name = TYPE_VFIO_PLATFORM,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VFIOPlatformDevice),
+    .class_init = vfio_platform_class_init,
+    .class_size = sizeof(VFIOPlatformDeviceClass),
+};
+
+static void register_vfio_platform_dev_type(void)
+{
+    type_register_static(&vfio_platform_dev_info);
+}
+
+type_init(register_vfio_platform_dev_type)
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
new file mode 100644
index 0000000..1ee072a
--- /dev/null
+++ b/include/hw/vfio/vfio-platform.h
@@ -0,0 +1,77 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#ifndef HW_VFIO_VFIO_PLATFORM_H
+#define HW_VFIO_VFIO_PLATFORM_H
+
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio-common.h"
+#include "qemu/event_notifier.h"
+#include "qemu/queue.h"
+#include "hw/irq.h"
+
+#define TYPE_VFIO_PLATFORM "vfio-platform"
+
+enum {
+    VFIO_IRQ_INACTIVE = 0,
+    VFIO_IRQ_PENDING = 1,
+    VFIO_IRQ_ACTIVE = 2,
+    /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
+};
+
+typedef struct VFIOINTp {
+    QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
+    QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
+    EventNotifier interrupt; /* eventfd triggered on interrupt */
+    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
+    qemu_irq qemuirq;
+    struct VFIOPlatformDevice *vdev; /* back pointer to device */
+    int state; /* inactive, pending, active */
+    bool kvm_accel; /* set when QEMU bypass through KVM enabled */
+    uint8_t pin; /* index */
+    uint8_t virtualID; /* virtual IRQ */
+} VFIOINTp;
+
+typedef struct VFIOPlatformDevice {
+    SysBusDevice sbdev;
+    VFIODevice vbasedev; /* not a QOM object */
+    VFIORegion **regions;
+    QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
+    /* queue of pending IRQ */
+    QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
+    char *compat; /* compatibility string */
+    bool irqfd_allowed;
+    uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
+    QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
+} VFIOPlatformDevice;
+
+
+typedef struct VFIOPlatformDeviceClass {
+    /*< private >*/
+    SysBusDeviceClass parent_class;
+    /*< public >*/
+} VFIOPlatformDeviceClass;
+
+#define VFIO_PLATFORM_DEVICE(obj) \
+     OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
+
+void vfio_intp_interrupt(void *opaque);
+void vfio_setup_irqfd(SysBusDevice *dev, int index, int virq);
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (6 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-11  9:37   ` Alexander Graf
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 09/10] hw/vfio/platform: Add irqfd support Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation Eric Auger
  9 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

set kvm_irqfds_allowed

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/intc/arm_gic_kvm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index 5038885..08b7bf9 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
                             KVM_DEV_ARM_VGIC_GRP_ADDR,
                             KVM_VGIC_V2_ADDR_TYPE_CPU,
                             s->dev_fd);
+
+    kvm_irqfds_allowed = true;
 }
 
 static void kvm_arm_gic_class_init(ObjectClass *klass, void *data)
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 09/10] hw/vfio/platform: Add irqfd support
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (7 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation Eric Auger
  9 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

This patch aims at optimizing IRQ handling using irqfd framework.

Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.

the virtual IRQ completion is trapped at interrupt controller
instead of on guest 1st access to any region after IRQ hit.
This removes the need for fast/slow path swap.

Overall this brings significant performance improvements.

It depends on host kernel KVM irqfd/GSI routing capability.

Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v4 -> v5:
- addition of sysemu/kvm.h header

v3 -> v4:
[Alvise Rigo]
Use of VFIO Platform driver v6 unmask/virqfd feature and removal
of resamplefd handler. Physical IRQ unmasking is now done in
VFIO driver.

v3:
[Eric Auger]
initial support with resamplefd handled on QEMU side since the
unmask was not supported on VFIO platform driver v5.
---
 hw/vfio/platform.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index f1a1b55..e5c652c 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -23,6 +23,7 @@
 #include "exec/memory.h"
 #include "qemu/queue.h"
 #include "hw/sysbus.h"
+#include "sysemu/kvm.h"
 
 extern const MemoryRegionOps vfio_region_ops;
 extern const MemoryListener vfio_memory_listener;
@@ -367,6 +368,99 @@ static int vfio_populate_interrupts(VFIODevice *vbasedev)
     return 0;
 }
 
+static void vfio_enable_intp_kvm(VFIOINTp *intp)
+{
+#ifdef CONFIG_KVM
+    struct kvm_irqfd irqfd = {
+        .fd = event_notifier_get_fd(&intp->interrupt),
+        .gsi = intp->virtualID,
+        .flags = KVM_IRQFD_FLAG_RESAMPLE,
+    };
+
+    struct vfio_irq_set *irq_set;
+    int ret, argsz;
+    int32_t *pfd;
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+
+    if (!kvm_irqfds_enabled() ||
+        !kvm_check_extension(kvm_state, KVM_CAP_IRQFD_RESAMPLE)) {
+        return;
+    }
+
+    /* Get to a known interrupt state */
+    qemu_set_fd_handler(irqfd.fd, NULL, NULL, NULL);
+    vfio_mask_irqindex(vbasedev, intp->pin);
+    intp->state = VFIO_IRQ_INACTIVE;
+    qemu_set_irq(intp->qemuirq, 0);
+
+    /* Get an eventfd for resample/unmask */
+    if (event_notifier_init(&intp->unmask, 0)) {
+        error_report("vfio: Error: event_notifier_init failed eoi");
+        goto fail;
+    }
+
+    /* KVM triggers it, VFIO listens for it */
+    irqfd.resamplefd = event_notifier_get_fd(&intp->unmask);
+
+    if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
+        error_report("vfio: Error: Failed to setup resample irqfd: %m");
+        goto fail_irqfd;
+    }
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+
+    *pfd = irqfd.resamplefd;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret) {
+        error_report("vfio: Error: Failed to setup IRQ unmask fd: %m");
+        goto fail_vfio;
+    }
+
+    vfio_unmask_irqindex(vbasedev, intp->pin);
+
+    intp->kvm_accel = true;
+
+    DPRINTF("%s irqfd pin=%d to virtID = %d fd=%d, resamplefd=%d)\n",
+            __func__, intp->pin, intp->virtualID,
+            irqfd.fd, irqfd.resamplefd);
+    return;
+
+fail_vfio:
+    irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+    kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+fail_irqfd:
+    event_notifier_cleanup(&intp->unmask);
+fail:
+    qemu_set_fd_handler(irqfd.fd, vfio_intp_interrupt, NULL, intp);
+    vfio_unmask_irqindex(vbasedev, intp->pin);
+#endif
+}
+
+void vfio_setup_irqfd(SysBusDevice *s, int index, int virq)
+{
+    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
+    VFIOINTp *intp;
+
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->pin == index) {
+            intp->virtualID = virq;
+            DPRINTF("enable irqfd for irq index %d (virtual IRQ %d)\n",
+                    index, virq);
+            vfio_enable_intp_kvm(intp);
+        }
+    }
+}
+
 static VFIODeviceOps vfio_platform_ops = {
     .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
                   ` (8 preceding siblings ...)
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 09/10] hw/vfio/platform: Add irqfd support Eric Auger
@ 2014-08-09 14:25 ` Eric Auger
  2014-08-11  9:40   ` Alexander Graf
  2014-08-18 21:54   ` Joel Schopp
  9 siblings, 2 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-09 14:25 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, kim.phillips, a.rigo
  Cc: peter.maydell, eric.auger, patches, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, joel.schopp,
	a.motakis, kvmarm

Generates the device node of VFIO devices, if any are invoked in
-device option. In case VFIO devices require more complex node
generation, they can be handled before.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/arm/dyn_sysbus_devtree.c | 138 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)

diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
index 56af62f..ac34f07 100644
--- a/hw/arm/dyn_sysbus_devtree.c
+++ b/hw/arm/dyn_sysbus_devtree.c
@@ -1,6 +1,139 @@
 #include "hw/arm/dyn_sysbus_devtree.h"
 #include "qemu/error-report.h"
 #include "sysemu/device_tree.h"
+#include "hw/vfio/vfio-platform.h"
+
+static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque);
+
+static char *format_compat(char * compat)
+{
+    char *str_ptr, *corrected_compat;
+    /*
+     * process compatibility property string passed by end-user
+     * replaces / by , and ; by NUL character
+     */
+    corrected_compat = g_strdup(compat);
+    /*
+     * the total length of the string has to include also the last
+     * NUL char.
+     */
+
+    str_ptr = corrected_compat;
+    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
+        *str_ptr = ',';
+    }
+
+    /* substitute ";" with the NUL char */
+    str_ptr = corrected_compat;
+    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
+        *str_ptr = '\0';
+    }
+
+    return corrected_compat;
+}
+
+static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformDevtreeData *data = opaque;
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    gchar irq_number_prop[8];
+    Object *obj = OBJECT(sbdev);
+    char *corrected_compat;
+    uint64_t irq_number;
+    int compat_str_len = strlen(vdev->compat)+1;
+    int i;
+
+    corrected_compat = format_compat(vdev->compat);
+    snprintf(vdev->compat, compat_str_len, "%s", corrected_compat);
+    g_free(corrected_compat);
+
+    vfio_fdt_add_device_node(sbdev, opaque);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+                                                 + data->irq_start;
+        /*
+         * for setting irqfd up we must provide the virtual IRQ number
+         * which is the sum of irq_start and actual platform bus irq
+         * index. At realize point we do not have this info.
+         */
+        if (vdev->irqfd_allowed) {
+            vfio_setup_irqfd(sbdev, i, irq_number);
+        }
+    }
+}
+
+static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformDevtreeData *data = opaque;
+    void *fdt = data->fdt;
+    const char *parent_node = data->node;
+    int compat_str_len;
+    char *nodename;
+    int i, ret;
+    uint32_t *irq_attr;
+    uint64_t *reg_attr;
+    uint64_t mmio_base;
+    uint64_t irq_number;
+    gchar mmio_base_prop[8];
+    gchar irq_number_prop[8];
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    Object *obj = OBJECT(sbdev);
+
+    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+                               vbasedev->name,
+                               mmio_base);
+
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    compat_str_len = strlen(vdev->compat) + 1;
+    qemu_fdt_setprop(fdt, nodename, "compatible",
+                            vdev->compat, compat_str_len);
+
+    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+        reg_attr[2*i] = 1;
+        reg_attr[2*i+1] = mmio_base;
+        reg_attr[2*i+2] = 1;
+        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
+    }
+
+    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+                     vbasedev->num_regions*2, reg_attr);
+    if (ret < 0) {
+        error_report("could not set reg property of node %s", nodename);
+    }
+
+    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+                                                 + data->irq_start;
+        irq_attr[3*i] = cpu_to_be32(0);
+        irq_attr[3*i+1] = cpu_to_be32(irq_number);
+        irq_attr[3*i+2] = cpu_to_be32(0x4);
+    }
+
+   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+    if (ret < 0) {
+        error_report("could not set interrupts property of node %s",
+                     nodename);
+    }
+
+    g_free(nodename);
+    g_free(irq_attr);
+    g_free(reg_attr);
+}
 
 int sysbus_device_create_devtree(Object *obj, void *opaque)
 {
@@ -17,6 +150,11 @@ int sysbus_device_create_devtree(Object *obj, void *opaque)
         return object_child_foreach(obj, sysbus_device_create_devtree, data);
     }
 
+    if (object_dynamic_cast(obj, TYPE_VFIO_PLATFORM)) {
+        wrap_vfio_fdt_add_node(sbdev, data);
+        matched = true;
+    }
+
     if (!matched) {
         error_report("Device %s is not supported by this machine yet.",
                      qdev_fw_name(DEVICE(dev)));
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-08-11  9:36   ` Alexander Graf
  2014-08-12  7:59     ` Bharat.Bhushan
  2014-08-11 20:13   ` Alex Williamson
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-08-11  9:36 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, Kim Phillips, joel.schopp, will.deacon,
	stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


On 09.08.14 16:25, Eric Auger wrote:
> Minimal VFIO platform implementation supporting
> - register space user mapping,
> - IRQ assignment based on eventfds handled on qemu side.
>
> irqfd kernel acceleration comes in a subsequent patch.
>
> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>
> ---
>
> v4 -> v5:
> - vfio-plaform.h included first
> - cleanup error handling in *populate*, vfio_get_device,
>    vfio_enable_intp
> - vfio_put_device not called anymore
> - add some includes to follow vfio policy
>
> v3 -> v4:
> [Eric Auger]
> - merge of "vfio: Add initial IRQ support in platform device"
>    to get a full functional patch although perfs are limited.
> - removal of unrealize function since I currently understand
>    it is only used with device hot-plug feature.
>
> v2 -> v3:
> [Eric Auger]
> - further factorization between PCI and platform (VFIORegion,
>    VFIODevice). same level of functionality.
>
> <= v2:
> [Kim Philipps]
> - Initial Creation of the device supporting register space mapping
> ---
>   hw/vfio/Makefile.objs           |   1 +
>   hw/vfio/platform.c              | 517 ++++++++++++++++++++++++++++++++++++++++
>   include/hw/vfio/vfio-platform.h |  77 ++++++
>   3 files changed, 595 insertions(+)
>   create mode 100644 hw/vfio/platform.c
>   create mode 100644 include/hw/vfio/vfio-platform.h
>
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..c5c76fe 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,5 @@
>   ifeq ($(CONFIG_LINUX), y)
>   obj-$(CONFIG_SOFTMMU) += common.o
>   obj-$(CONFIG_PCI) += pci.o
> +obj-$(CONFIG_SOFTMMU) += platform.o
>   endif
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> new file mode 100644
> index 0000000..f1a1b55
> --- /dev/null
> +++ b/hw/vfio/platform.c
> @@ -0,0 +1,517 @@
> +/*
> + * vfio based device assignment support - platform devices
> + *
> + * Copyright Linaro Limited, 2014
> + *
> + * Authors:
> + *  Kim Phillips <kim.phillips@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on vfio based PCI device assignment support:
> + *  Copyright Red Hat, Inc. 2012
> + */
> +
> +#include <linux/vfio.h>
> +#include <sys/ioctl.h>
> +
> +#include "hw/vfio/vfio-platform.h"
> +#include "qemu/error-report.h"
> +#include "qemu/range.h"
> +#include "sysemu/sysemu.h"
> +#include "exec/memory.h"
> +#include "qemu/queue.h"
> +#include "hw/sysbus.h"
> +
> +extern const MemoryRegionOps vfio_region_ops;
> +extern const MemoryListener vfio_memory_listener;
> +extern QLIST_HEAD(, VFIOGroup) group_list;
> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
> +void vfio_put_device(VFIOPlatformDevice *vdev);
> +
> +/*
> + * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
> + * is not a QOM Object and cannot be passed to memory region functions
> +*/
> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
> +{
> +    VFIORegion *region = vdev->regions[nr];
> +    unsigned size = region->size;
> +    char name[64];
> +
> +    if (!size) {
> +        return;
> +    }
> +
> +    snprintf(name, sizeof(name), "VFIO %s region %d",
> +             vdev->vbasedev.name, nr);
> +
> +    /* A "slow" read/write mapping underlies all regions */
> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
> +                          region, name, size);
> +
> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
> +
> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
> +        error_report("%s unsupported. Performance may be slow", name);
> +    }
> +}
> +
> +static void print_regions(VFIOPlatformDevice *vdev)
> +{
> +    int i;
> +
> +    DPRINTF("Device \"%s\" counts %d region(s):\n",
> +             vdev->vbasedev.name, vdev->vbasedev.num_regions);
> +
> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> +        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
> +                "fd= %d, offset = 0x%lx\n",
> +                vdev->regions[i]->nr,
> +                (unsigned long)vdev->regions[i]->flags,
> +                (unsigned long)vdev->regions[i]->size,
> +                vdev->regions[i]->vbasedev->fd,
> +                (unsigned long)vdev->regions[i]->fd_offset);
> +    }
> +}
> +
> +static int vfio_populate_regions(VFIODevice *vbasedev)
> +{
> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> +    int i, ret = 0;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
> +        reg_info.index = i;
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> +        if (ret) {
> +            error_report("vfio: Error getting region %d info: %m", i);
> +            goto error;
> +        }
> +
> +        vdev->regions[i]->flags = reg_info.flags;
> +        vdev->regions[i]->size = reg_info.size;
> +        vdev->regions[i]->fd_offset = reg_info.offset;
> +        vdev->regions[i]->nr = i;
> +        vdev->regions[i]->vbasedev = vbasedev;
> +    }
> +    print_regions(vdev);
> +error:
> +    return ret;
> +}
> +
> +/* not implemented yet */
> +static int vfio_platform_check_device(VFIODevice *vdev)
> +{
> +    return 0;
> +}
> +
> +/* not implemented yet */
> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
> +{
> +return false;
> +}
> +
> +/* not implemented yet */
> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
> +{
> +return 0;
> +}
> +
> +/*
> + * eoi function is called on the first access to any MMIO region
> + * after an IRQ was triggered. It is assumed this access corresponds
> + * to the IRQ status register reset.
> + * With such a mechanism, a single IRQ can be handled at a time since
> + * there is no way to know which IRQ was completed by the guest.
> + * (we would need additional details about the IRQ status register mask)
> + */
> +static void vfio_platform_eoi(VFIODevice *vbasedev)
> +{
> +    VFIOINTp *intp;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
> +        if (intp->state == VFIO_IRQ_ACTIVE) {
> +            DPRINTF("EOI IRQ #%d fd=%d\n",
> +                    intp->pin, event_notifier_get_fd(&intp->interrupt));
> +            intp->state = VFIO_IRQ_INACTIVE;
> +
> +            /* deassert the virtual IRQ and unmask physical one */
> +            qemu_set_irq(intp->qemuirq, 0);
> +            vfio_unmask_irqindex(vbasedev, intp->pin);
> +
> +            /* a single IRQ can be active at a time */
> +            break;
> +        }
> +    }
> +
> +    /* in case there are pending IRQs, handle them one at a time */
> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
> +        vfio_intp_interrupt(intp);
> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
> +    }
> +}
> +
> +/*
> + * enable/disable the fast path mode
> + * fast path = MMIO region is mmaped (no KVM TRAP)
> + * slow path = MMIO region is trapped and region callbacks are called
> + * slow path enables to trap the IRQ status register guest reset
> +*/
> +
> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
> +{
> +    VFIORegion *region;
> +    int i;
> +
> +    DPRINTF("fast path = %d\n", enabled);
> +
> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> +        region = vdev->regions[i];
> +
> +        /* register space is unmapped to trap EOI */
> +        memory_region_set_enabled(&region->mmap_mem, enabled);
> +    }
> +}
> +
> +/*
> + * Checks whether the IRQ is still pending. In the negative
> + * the fast path mode (where reg space is mmaped) can be restored.
> + * if the IRQ is still pending, we must keep on trapping IRQ status
> + * register reset with mmap disabled (slow path).
> + * the function is called on mmap_timer event.
> + * by construction a single fd is handled at a time. See EOI comment
> + * for additional details.
> + */
> +static void vfio_intp_mmap_enable(void *opaque)
> +{
> +    VFIOINTp *tmp;
> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
> +
> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> +            DPRINTF("IRQ #%d still active, stay in slow path\n",
> +                    tmp->pin);
> +            timer_mod(vdev->mmap_timer,
> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                          vdev->mmap_timeout);
> +            return;
> +        }
> +    }
> +    DPRINTF("no active IRQ, restore fast path\n");
> +    vfio_mmap_set_enabled(vdev, true);
> +}
> +
> +/*
> + * The fd handler
> + */
> +void vfio_intp_interrupt(void *opaque)
> +{
> +    int ret;
> +    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
> +    VFIOPlatformDevice *vdev = intp->vdev;
> +    bool one_active_irq = false;
> +
> +    /*
> +     * first check whether there is a pending IRQ
> +     * in the positive the new IRQ cannot be handled until the
> +     * active one is not completed.
> +     * by construction the same IRQ as the pending one cannot hit
> +     * since the physical IRQ was disabled by the VFIO driver
> +     */
> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> +            one_active_irq = true;
> +            break;
> +        }
> +    }
> +    if (one_active_irq) {
> +        /*
> +         * the new IRQ gets a pending status and is pushed in
> +         * the pending queue
> +         */
> +        intp->state = VFIO_IRQ_PENDING;
> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
> +                             intp, pqnext);
> +        return;
> +    }
> +
> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
> +    DPRINTF("Handle IRQ #%d (fd = %d)\n",
> +            intp->pin, event_notifier_get_fd(&intp->interrupt));
> +
> +    ret = event_notifier_test_and_clear(&intp->interrupt);
> +    if (!ret) {
> +        DPRINTF("Error when clearing fd=%d\n",
> +                event_notifier_get_fd(&intp->interrupt));
> +    }
> +
> +    intp->state = VFIO_IRQ_ACTIVE;
> +
> +    /* sets slow path */
> +    vfio_mmap_set_enabled(vdev, false);
> +
> +    /* trigger the virtual IRQ */
> +    qemu_set_irq(intp->qemuirq, 1);
> +
> +    /* schedule the mmap timer which will restore mmap path after EOI*/
> +    if (vdev->mmap_timeout) {
> +        timer_mod(vdev->mmap_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                      vdev->mmap_timeout);
> +    }
> +}
> +
> +static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
> +{
> +    struct vfio_irq_set *irq_set;
> +    int32_t *pfd;
> +    int ret, argsz;
> +    int device = vbasedev->fd;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
> +    VFIOINTp *intp;
> +
> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
> +    intp = g_malloc0(sizeof(*intp));
> +    intp->vdev = vdev;
> +    intp->pin = index;
> +    intp->state = VFIO_IRQ_INACTIVE;
> +    sysbus_init_irq(sbdev, &intp->qemuirq);
> +
> +    ret = event_notifier_init(&intp->interrupt, 0);
> +    if (ret) {
> +        g_free(intp);
> +        error_report("vfio: Error: event_notifier_init failed ");
> +        return ret;
> +    }
> +
> +    /* build the irq_set to be passed to the vfio kernel driver */
> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> +
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = index;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    pfd = (int32_t *)&irq_set->data;
> +
> +    *pfd = event_notifier_get_fd(&intp->interrupt);
> +
> +    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
> +
> +    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
> +
> +    /*
> +     * pass the index/fd binding to the kernel driver so that it
> +     * triggers this fd on HW IRQ
> +     */
> +    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
> +    g_free(irq_set);
> +    if (ret) {
> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
> +        event_notifier_cleanup(&intp->interrupt);
> +        return -errno;
> +    }
> +
> +    /* store the new intp in qlist */
> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
> +    return 0;
> +}
> +
> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
> +{
> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> +    int i, ret;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                    vfio_intp_mmap_enable, vdev);
> +
> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        irq.index = i;
> +
> +        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
> +
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> +        if (ret) {
> +            /* This can fail for an old kernel or legacy PCI dev */
> +            error_printf("vfio: error getting device %s irq info",
> +                         vbasedev->name);
> +        } else {
> +            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
> +                    irq.index, irq.count, irq.flags);
> +
> +            ret = vfio_enable_intp(vbasedev, irq.index);
> +            if (ret) {
> +                error_report("vfio: Error setting IRQ %d up", i);
> +                return ret;
> +            }
> +        }
> +    }
> +    return 0;
> +}
> +
> +static VFIODeviceOps vfio_platform_ops = {
> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
> +    .vfio_eoi = vfio_platform_eoi,
> +    .vfio_check_device = vfio_platform_check_device,
> +    .vfio_populate_regions = vfio_populate_regions,
> +    .vfio_populate_interrupts = vfio_populate_interrupts,
> +};
> +
> +static int vfio_base_device_init(VFIODevice *vbasedev)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev_iter;
> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
> +    ssize_t len;
> +    struct stat st;
> +    int groupid;
> +    int ret;
> +
> +    /* name must be set prior to the call */
> +    if (!vbasedev->name) {
> +        return -EINVAL;
> +    }
> +
> +    /* Check that the host device exists */
> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
> +             vbasedev->name);
> +
> +    if (stat(path, &st) < 0) {
> +        error_report("vfio: error: no such host device: %s", path);
> +        return -errno;
> +    }
> +
> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
> +    len = readlink(path, iommu_group_path, sizeof(path));
> +    if (len <= 0 || len >= sizeof(path)) {
> +        error_report("vfio: error no iommu_group for device");
> +        return len < 0 ? -errno : ENAMETOOLONG;
> +    }
> +
> +    iommu_group_path[len] = 0;
> +    group_name = basename(iommu_group_path);
> +
> +    if (sscanf(group_name, "%d", &groupid) != 1) {
> +        error_report("vfio: error reading %s: %m", path);
> +        return -errno;
> +    }
> +
> +    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
> +
> +    group = vfio_get_group(groupid, &address_space_memory);
> +    if (!group) {
> +        error_report("vfio: failed to get group %d", groupid);
> +        return -ENOENT;
> +    }
> +
> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
> +
> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
> +            error_report("vfio: error: device %s is already attached", path);
> +            vfio_put_group(group);
> +            return -EBUSY;
> +        }
> +    }
> +    ret = vfio_get_device(group, path, vbasedev);
> +    if (ret) {
> +        error_report("vfio: failed to get device %s", path);
> +        vfio_put_group(group);
> +    }
> + return ret;
> +}
> +
> +void vfio_put_device(VFIOPlatformDevice *vdev)
> +{
> +    unsigned int i;
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +            g_free(vdev->regions[i]);
> +    }
> +    g_free(vdev->regions);
> +    g_free(vdev->vbasedev.name);
> +    vfio_put_base_device(&vdev->vbasedev);
> +}
> +
> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
> +{
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    int i, ret;
> +
> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
> +    vbasedev->ops = &vfio_platform_ops;
> +
> +    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
> +
> +    ret = vfio_base_device_init(vbasedev);
> +    if (ret) {
> +        return;
> +    }
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vfio_map_region(vdev, i);
> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
> +    }
> +}
> +
> +static const VMStateDescription vfio_platform_vmstate = {
> +    .name = TYPE_VFIO_PLATFORM,
> +    .unmigratable = 1,
> +};
> +
> +static Property vfio_platform_dev_properties[] = {
> +    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
> +    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
> +    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
> +                       mmap_timeout, 1100),
> +    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vfio_platform_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = vfio_platform_realize;
> +    dc->props = vfio_platform_dev_properties;
> +    dc->vmsd = &vfio_platform_vmstate;
> +    dc->desc = "VFIO-based platform device assignment";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +}
> +
> +static const TypeInfo vfio_platform_dev_info = {
> +    .name = TYPE_VFIO_PLATFORM,
> +    .parent = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(VFIOPlatformDevice),
> +    .class_init = vfio_platform_class_init,
> +    .class_size = sizeof(VFIOPlatformDeviceClass),

This should be an abstract class. People must never instantiate a 
generic "vfio-platform" device. Only "vfio-xgmac", "vfio-etsec", etc 
devices should be exposed to the user.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd Eric Auger
@ 2014-08-11  9:37   ` Alexander Graf
  2014-08-11 12:04     ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-08-11  9:37 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


On 09.08.14 16:25, Eric Auger wrote:
> set kvm_irqfds_allowed
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>   hw/intc/arm_gic_kvm.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
> index 5038885..08b7bf9 100644
> --- a/hw/intc/arm_gic_kvm.c
> +++ b/hw/intc/arm_gic_kvm.c
> @@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev, Error **errp)
>                               KVM_DEV_ARM_VGIC_GRP_ADDR,
>                               KVM_VGIC_V2_ADDR_TYPE_CPU,
>                               s->dev_fd);
> +
> +    kvm_irqfds_allowed = true;

Is this always true? If it is, why not enable it separately while making 
vhost-net work for example?


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation Eric Auger
@ 2014-08-11  9:40   ` Alexander Graf
  2014-08-11 11:55     ` Eric Auger
  2014-08-18 21:54   ` Joel Schopp
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-08-11  9:40 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


On 09.08.14 16:25, Eric Auger wrote:
> Generates the device node of VFIO devices, if any are invoked in
> -device option. In case VFIO devices require more complex node
> generation, they can be handled before.
>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>   hw/arm/dyn_sysbus_devtree.c | 138 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 138 insertions(+)
>
> diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
> index 56af62f..ac34f07 100644
> --- a/hw/arm/dyn_sysbus_devtree.c
> +++ b/hw/arm/dyn_sysbus_devtree.c
> @@ -1,6 +1,139 @@
>   #include "hw/arm/dyn_sysbus_devtree.h"
>   #include "qemu/error-report.h"
>   #include "sysemu/device_tree.h"
> +#include "hw/vfio/vfio-platform.h"
> +
> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque);
> +
> +static char *format_compat(char * compat)
> +{
> +    char *str_ptr, *corrected_compat;
> +    /*
> +     * process compatibility property string passed by end-user
> +     * replaces / by , and ; by NUL character
> +     */
> +    corrected_compat = g_strdup(compat);
> +    /*
> +     * the total length of the string has to include also the last
> +     * NUL char.
> +     */
> +
> +    str_ptr = corrected_compat;
> +    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
> +        *str_ptr = ',';
> +    }
> +
> +    /* substitute ";" with the NUL char */
> +    str_ptr = corrected_compat;
> +    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
> +        *str_ptr = '\0';
> +    }
> +
> +    return corrected_compat;
> +}
> +
> +static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque)
> +{
> +    PlatformDevtreeData *data = opaque;
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    gchar irq_number_prop[8];
> +    Object *obj = OBJECT(sbdev);
> +    char *corrected_compat;
> +    uint64_t irq_number;
> +    int compat_str_len = strlen(vdev->compat)+1;
> +    int i;
> +
> +    corrected_compat = format_compat(vdev->compat);
> +    snprintf(vdev->compat, compat_str_len, "%s", corrected_compat);
> +    g_free(corrected_compat);
> +
> +    vfio_fdt_add_device_node(sbdev, opaque);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
> +                                                 + data->irq_start;
> +        /*
> +         * for setting irqfd up we must provide the virtual IRQ number
> +         * which is the sum of irq_start and actual platform bus irq
> +         * index. At realize point we do not have this info.
> +         */
> +        if (vdev->irqfd_allowed) {
> +            vfio_setup_irqfd(sbdev, i, irq_number);
> +        }
> +    }
> +}
> +
> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
> +{
> +    PlatformDevtreeData *data = opaque;
> +    void *fdt = data->fdt;
> +    const char *parent_node = data->node;
> +    int compat_str_len;
> +    char *nodename;
> +    int i, ret;
> +    uint32_t *irq_attr;
> +    uint64_t *reg_attr;
> +    uint64_t mmio_base;
> +    uint64_t irq_number;
> +    gchar mmio_base_prop[8];
> +    gchar irq_number_prop[8];
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    Object *obj = OBJECT(sbdev);
> +
> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
> +
> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +                               vbasedev->name,
> +                               mmio_base);
> +
> +    qemu_fdt_add_subnode(fdt, nodename);
> +
> +    compat_str_len = strlen(vdev->compat) + 1;
> +    qemu_fdt_setprop(fdt, nodename, "compatible",
> +                            vdev->compat, compat_str_len);
> +
> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
> +        reg_attr[2*i] = 1;
> +        reg_attr[2*i+1] = mmio_base;
> +        reg_attr[2*i+2] = 1;
> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
> +    }
> +
> +    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
> +                     vbasedev->num_regions*2, reg_attr);
> +    if (ret < 0) {
> +        error_report("could not set reg property of node %s", nodename);
> +    }
> +
> +    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
> +                                                 + data->irq_start;
> +        irq_attr[3*i] = cpu_to_be32(0);
> +        irq_attr[3*i+1] = cpu_to_be32(irq_number);
> +        irq_attr[3*i+2] = cpu_to_be32(0x4);
> +    }
> +
> +   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
> +                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
> +    if (ret < 0) {
> +        error_report("could not set interrupts property of node %s",
> +                     nodename);
> +    }
> +
> +    g_free(nodename);
> +    g_free(irq_attr);
> +    g_free(reg_attr);
> +}
>   
>   int sysbus_device_create_devtree(Object *obj, void *opaque)
>   {
> @@ -17,6 +150,11 @@ int sysbus_device_create_devtree(Object *obj, void *opaque)
>           return object_child_foreach(obj, sysbus_device_create_devtree, data);
>       }
>   
> +    if (object_dynamic_cast(obj, TYPE_VFIO_PLATFORM)) {

You should only ever check for specific VFIO device types. A generic 
"vfio-platform" device type will not work, since you won't have enough 
auxiliary information once devices become more complicated.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-11  9:40   ` Alexander Graf
@ 2014-08-11 11:55     ` Eric Auger
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-11 11:55 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 08/11/2014 11:40 AM, Alexander Graf wrote:
> 
> On 09.08.14 16:25, Eric Auger wrote:
>> Generates the device node of VFIO devices, if any are invoked in
>> -device option. In case VFIO devices require more complex node
>> generation, they can be handled before.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>   hw/arm/dyn_sysbus_devtree.c | 138
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 138 insertions(+)
>>
>> diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
>> index 56af62f..ac34f07 100644
>> --- a/hw/arm/dyn_sysbus_devtree.c
>> +++ b/hw/arm/dyn_sysbus_devtree.c
>> @@ -1,6 +1,139 @@
>>   #include "hw/arm/dyn_sysbus_devtree.h"
>>   #include "qemu/error-report.h"
>>   #include "sysemu/device_tree.h"
>> +#include "hw/vfio/vfio-platform.h"
>> +
>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque);
>> +
>> +static char *format_compat(char * compat)
>> +{
>> +    char *str_ptr, *corrected_compat;
>> +    /*
>> +     * process compatibility property string passed by end-user
>> +     * replaces / by , and ; by NUL character
>> +     */
>> +    corrected_compat = g_strdup(compat);
>> +    /*
>> +     * the total length of the string has to include also the last
>> +     * NUL char.
>> +     */
>> +
>> +    str_ptr = corrected_compat;
>> +    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
>> +        *str_ptr = ',';
>> +    }
>> +
>> +    /* substitute ";" with the NUL char */
>> +    str_ptr = corrected_compat;
>> +    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
>> +        *str_ptr = '\0';
>> +    }
>> +
>> +    return corrected_compat;
>> +}
>> +
>> +static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    PlatformDevtreeData *data = opaque;
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    gchar irq_number_prop[8];
>> +    Object *obj = OBJECT(sbdev);
>> +    char *corrected_compat;
>> +    uint64_t irq_number;
>> +    int compat_str_len = strlen(vdev->compat)+1;
>> +    int i;
>> +
>> +    corrected_compat = format_compat(vdev->compat);
>> +    snprintf(vdev->compat, compat_str_len, "%s", corrected_compat);
>> +    g_free(corrected_compat);
>> +
>> +    vfio_fdt_add_device_node(sbdev, opaque);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]",
>> i);
>> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
>> +                                                 + data->irq_start;
>> +        /*
>> +         * for setting irqfd up we must provide the virtual IRQ number
>> +         * which is the sum of irq_start and actual platform bus irq
>> +         * index. At realize point we do not have this info.
>> +         */
>> +        if (vdev->irqfd_allowed) {
>> +            vfio_setup_irqfd(sbdev, i, irq_number);
>> +        }
>> +    }
>> +}
>> +
>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    PlatformDevtreeData *data = opaque;
>> +    void *fdt = data->fdt;
>> +    const char *parent_node = data->node;
>> +    int compat_str_len;
>> +    char *nodename;
>> +    int i, ret;
>> +    uint32_t *irq_attr;
>> +    uint64_t *reg_attr;
>> +    uint64_t mmio_base;
>> +    uint64_t irq_number;
>> +    gchar mmio_base_prop[8];
>> +    gchar irq_number_prop[8];
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    Object *obj = OBJECT(sbdev);
>> +
>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>> +
>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>> +                               vbasedev->name,
>> +                               mmio_base);
>> +
>> +    qemu_fdt_add_subnode(fdt, nodename);
>> +
>> +    compat_str_len = strlen(vdev->compat) + 1;
>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>> +                            vdev->compat, compat_str_len);
>> +
>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>> +        reg_attr[2*i] = 1;
>> +        reg_attr[2*i+1] = mmio_base;
>> +        reg_attr[2*i+2] = 1;
>> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>> +    }
>> +
>> +    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
>> +                     vbasedev->num_regions*2, reg_attr);
>> +    if (ret < 0) {
>> +        error_report("could not set reg property of node %s", nodename);
>> +    }
>> +
>> +    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]",
>> i);
>> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
>> +                                                 + data->irq_start;
>> +        irq_attr[3*i] = cpu_to_be32(0);
>> +        irq_attr[3*i+1] = cpu_to_be32(irq_number);
>> +        irq_attr[3*i+2] = cpu_to_be32(0x4);
>> +    }
>> +
>> +   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
>> +                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
>> +    if (ret < 0) {
>> +        error_report("could not set interrupts property of node %s",
>> +                     nodename);
>> +    }
>> +
>> +    g_free(nodename);
>> +    g_free(irq_attr);
>> +    g_free(reg_attr);
>> +}
>>     int sysbus_device_create_devtree(Object *obj, void *opaque)
>>   {
>> @@ -17,6 +150,11 @@ int sysbus_device_create_devtree(Object *obj, void
>> *opaque)
>>           return object_child_foreach(obj,
>> sysbus_device_create_devtree, data);
>>       }
>>   +    if (object_dynamic_cast(obj, TYPE_VFIO_PLATFORM)) {
> 
> You should only ever check for specific VFIO device types. A generic
> "vfio-platform" device type will not work, since you won't have enough
> auxiliary information once devices become more complicated.

Hi Alex,

I will re-submit this week with calxeda-xgmac derived device back again
and abstract class too.

Thanks

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-11  9:37   ` Alexander Graf
@ 2014-08-11 12:04     ` Eric Auger
  2014-08-11 12:05       ` Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-11 12:04 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 08/11/2014 11:37 AM, Alexander Graf wrote:
> 
> On 09.08.14 16:25, Eric Auger wrote:
>> set kvm_irqfds_allowed
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>   hw/intc/arm_gic_kvm.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
>> index 5038885..08b7bf9 100644
>> --- a/hw/intc/arm_gic_kvm.c
>> +++ b/hw/intc/arm_gic_kvm.c
>> @@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev,
>> Error **errp)
>>                               KVM_DEV_ARM_VGIC_GRP_ADDR,
>>                               KVM_VGIC_V2_ADDR_TYPE_CPU,
>>                               s->dev_fd);
>> +
>> +    kvm_irqfds_allowed = true;
> 
> Is this always true? If it is, why not enable it separately while making
> vhost-net work for example?

Hi Alex,

yes I think so. As soon as KVM is enabled, KVM/arm would enable
injection though irqfd. Defintively makes sense to test it with
vhost-net too. Well a matter of priority ;-)

Best Regards

Eric

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-11 12:04     ` Eric Auger
@ 2014-08-11 12:05       ` Alexander Graf
  2014-08-11 12:27         ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-08-11 12:05 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


On 11.08.14 14:04, Eric Auger wrote:
> On 08/11/2014 11:37 AM, Alexander Graf wrote:
>> On 09.08.14 16:25, Eric Auger wrote:
>>> set kvm_irqfds_allowed
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>> ---
>>>    hw/intc/arm_gic_kvm.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
>>> index 5038885..08b7bf9 100644
>>> --- a/hw/intc/arm_gic_kvm.c
>>> +++ b/hw/intc/arm_gic_kvm.c
>>> @@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev,
>>> Error **errp)
>>>                                KVM_DEV_ARM_VGIC_GRP_ADDR,
>>>                                KVM_VGIC_V2_ADDR_TYPE_CPU,
>>>                                s->dev_fd);
>>> +
>>> +    kvm_irqfds_allowed = true;
>> Is this always true? If it is, why not enable it separately while making
>> vhost-net work for example?
> Hi Alex,
>
> yes I think so. As soon as KVM is enabled, KVM/arm would enable
> injection though irqfd. Defintively makes sense to test it with
> vhost-net too. Well a matter of priority ;-)

More a matter of accuracy. What if you use new QEMU on old KVM which 
does have in-kernel GIC support, but no irqfd support?


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-11 12:05       ` Alexander Graf
@ 2014-08-11 12:27         ` Eric Auger
  2014-08-11 12:29           ` Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-11 12:27 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 08/11/2014 02:05 PM, Alexander Graf wrote:
> 
> On 11.08.14 14:04, Eric Auger wrote:
>> On 08/11/2014 11:37 AM, Alexander Graf wrote:
>>> On 09.08.14 16:25, Eric Auger wrote:
>>>> set kvm_irqfds_allowed
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>> ---
>>>>    hw/intc/arm_gic_kvm.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
>>>> index 5038885..08b7bf9 100644
>>>> --- a/hw/intc/arm_gic_kvm.c
>>>> +++ b/hw/intc/arm_gic_kvm.c
>>>> @@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev,
>>>> Error **errp)
>>>>                                KVM_DEV_ARM_VGIC_GRP_ADDR,
>>>>                                KVM_VGIC_V2_ADDR_TYPE_CPU,
>>>>                                s->dev_fd);
>>>> +
>>>> +    kvm_irqfds_allowed = true;
>>> Is this always true? If it is, why not enable it separately while making
>>> vhost-net work for example?
>> Hi Alex,
>>
>> yes I think so. As soon as KVM is enabled, KVM/arm would enable
>> injection though irqfd. Defintively makes sense to test it with
>> vhost-net too. Well a matter of priority ;-)
> 
> More a matter of accuracy. What if you use new QEMU on old KVM which
> does have in-kernel GIC support, but no irqfd support?

Hi Alex,

VFIO device code also calls kvm_check_extension(kvm_state,
KVM_CAP_IRQFD_RESAMPLE) which would return false if IRQFD is not enabled
in old kernels. But with respect to vhost-net irqfd usage I cannot
comment yet and you may be right ;-)

Best Regards

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd
  2014-08-11 12:27         ` Eric Auger
@ 2014-08-11 12:29           ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-08-11 12:29 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, joel.schopp, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


On 11.08.14 14:27, Eric Auger wrote:
> On 08/11/2014 02:05 PM, Alexander Graf wrote:
>> On 11.08.14 14:04, Eric Auger wrote:
>>> On 08/11/2014 11:37 AM, Alexander Graf wrote:
>>>> On 09.08.14 16:25, Eric Auger wrote:
>>>>> set kvm_irqfds_allowed
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>> ---
>>>>>     hw/intc/arm_gic_kvm.c | 2 ++
>>>>>     1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
>>>>> index 5038885..08b7bf9 100644
>>>>> --- a/hw/intc/arm_gic_kvm.c
>>>>> +++ b/hw/intc/arm_gic_kvm.c
>>>>> @@ -576,6 +576,8 @@ static void kvm_arm_gic_realize(DeviceState *dev,
>>>>> Error **errp)
>>>>>                                 KVM_DEV_ARM_VGIC_GRP_ADDR,
>>>>>                                 KVM_VGIC_V2_ADDR_TYPE_CPU,
>>>>>                                 s->dev_fd);
>>>>> +
>>>>> +    kvm_irqfds_allowed = true;
>>>> Is this always true? If it is, why not enable it separately while making
>>>> vhost-net work for example?
>>> Hi Alex,
>>>
>>> yes I think so. As soon as KVM is enabled, KVM/arm would enable
>>> injection though irqfd. Defintively makes sense to test it with
>>> vhost-net too. Well a matter of priority ;-)
>> More a matter of accuracy. What if you use new QEMU on old KVM which
>> does have in-kernel GIC support, but no irqfd support?
> Hi Alex,
>
> VFIO device code also calls kvm_check_extension(kvm_state,
> KVM_CAP_IRQFD_RESAMPLE) which would return false if IRQFD is not enabled
> in old kernels. But with respect to vhost-net irqfd usage I cannot
> comment yet and you may be right ;-)

Yeah, please only set it when the kernel exposes that it supports irqfd ;).


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
@ 2014-08-11 19:20   ` Alex Williamson
  2014-08-12  5:57     ` Eric Auger
  2014-08-11 19:25   ` Alex Williamson
  2014-08-20 19:12   ` Joel Schopp
  2 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2014-08-11 19:20 UTC (permalink / raw)
  To: Eric Auger
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
> A new common module is created. It implements all functions
> that have no device specificity (PCI, Platform).
> 
> This patch only consists in move (no functional changes)
> 
> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v4 -> v5:
> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
> - VFIOdevice .name dealloc removed from vfio_put_base_device
> - add some includes according to vfio inclusion policy
> 
> v3 -> v4:
> [Eric Auger]
> move done after all PCI modifications to anticipate for
> VFIO Platform needs. Purpose is to alleviate the whole
> review process.
> 
> <= v3
> First split done by Kim Phillips
> ---
>  hw/vfio/Makefile.objs         |    1 +
>  hw/vfio/common.c              |  990 ++++++++++++++++++++++++++++++++++++++
>  hw/vfio/pci.c                 | 1070 +----------------------------------------
>  include/hw/vfio/vfio-common.h |  151 ++++++
>  4 files changed, 1147 insertions(+), 1065 deletions(-)
>  create mode 100644 hw/vfio/common.c
>  create mode 100644 include/hw/vfio/vfio-common.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index 31c7dab..e31f30e 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,3 +1,4 @@
>  ifeq ($(CONFIG_LINUX), y)
> +obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
>  endif
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> new file mode 100644
> index 0000000..297c508
> --- /dev/null
> +++ b/hw/vfio/common.c
> @@ -0,0 +1,990 @@
> +/*
> + * generic functions used by VFIO devices
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Alex Williamson <alex.williamson@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on qemu-kvm device-assignment:
> + *  Adapted for KVM by Qumranet.
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> + */
> +
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <linux/vfio.h>
> +
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/vfio.h"
> +#include "exec/address-spaces.h"
> +#include "exec/memory.h"
> +#include "hw/hw.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +
> +QLIST_HEAD(, VFIOGroup)
> +    group_list = QLIST_HEAD_INITIALIZER(group_list);
> +
> +QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
> +
> +#ifdef CONFIG_KVM
> +/*
> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
> + * for the life of the VM.  Closing the file descriptor only drops our
> + * reference to it and the device's reference to kvm.  Therefore once
> + * initialized, this file descriptor is only released on QEMU exit and
> + * we'll re-use it should another vfio device be attached before then.
> + */
> +static int vfio_kvm_device_fd = -1;
> +#endif
> +
> +/*
> + * Common VFIO interrupt disable
> + */
> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
> +{
> +    struct vfio_irq_set irq_set = {
> +        .argsz = sizeof(irq_set),
> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
> +        .index = index,
> +        .start = 0,
> +        .count = 0,
> +    };
> +
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +}
> +
> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
> +{
> +    struct vfio_irq_set irq_set = {
> +        .argsz = sizeof(irq_set),
> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
> +        .index = index,
> +        .start = 0,
> +        .count = 1,
> +    };
> +
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +}
> +
> +#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */

Can we remove the ifdef here and in the common header now?  I'm hoping
the compiler won't complain once it's no longer static.

...
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5f218b7..d2ccb3b 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -39,27 +39,12 @@
>  #include "qemu/range.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/sysemu.h"
> -#include "hw/vfio/vfio.h"
> +#include "hw/vfio/vfio-common.h"
>  
> -/* #define DEBUG_VFIO */
> -#ifdef DEBUG_VFIO
> -#define DPRINTF(fmt, ...) \
> -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define DPRINTF(fmt, ...) \
> -    do { } while (0)
> -#endif
> -
> -/* Extra debugging, trap acceleration paths for more logging */
> -#define VFIO_ALLOW_MMAP 1
> -#define VFIO_ALLOW_KVM_INTX 1
> -#define VFIO_ALLOW_KVM_MSI 1
> -#define VFIO_ALLOW_KVM_MSIX 1
> -
> -enum {
> -    VFIO_DEVICE_TYPE_PCI = 0,
> -    VFIO_DEVICE_TYPE_PLATFORM = 1,
> -};
> +extern const MemoryRegionOps vfio_region_ops;
> +extern const MemoryListener vfio_memory_listener;
> +extern QLIST_HEAD(, VFIOGroup) group_list;
> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;

This seems odd, why doesn't the common header provide these for us?  We
should also rename group_list to vfio_group_list to be polite to the
rest of the namespace.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
  2014-08-11 19:20   ` Alex Williamson
@ 2014-08-11 19:25   ` Alex Williamson
  2014-08-12  6:09     ` Eric Auger
  2014-08-20 19:12   ` Joel Schopp
  2 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2014-08-11 19:25 UTC (permalink / raw)
  To: Eric Auger
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> new file mode 100644
> index 0000000..4684ee5
> --- /dev/null
> +++ b/include/hw/vfio/vfio-common.h
> @@ -0,0 +1,151 @@
> +/*
> + * common header for vfio based device assignment support
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Alex Williamson <alex.williamson@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on qemu-kvm device-assignment:
> + *  Adapted for KVM by Qumranet.
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> + */
> +#ifndef HW_VFIO_VFIO_COMMON_H
> +#define HW_VFIO_VFIO_COMMON_H
> +
> +#include "qemu-common.h"
> +#include "exec/address-spaces.h"
> +#include "exec/memory.h"
> +#include "qemu/queue.h"
> +#include "qemu/notify.h"
> +
> +/*#define DEBUG_VFIO*/
> +#ifdef DEBUG_VFIO
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif


DPRINTF also need to be renamed to avoid conflicting namespace issues.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support Eric Auger
  2014-08-11  9:36   ` Alexander Graf
@ 2014-08-11 20:13   ` Alex Williamson
  2014-08-12  5:51     ` Eric Auger
  1 sibling, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2014-08-11 20:13 UTC (permalink / raw)
  To: Eric Auger
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
> Minimal VFIO platform implementation supporting
> - register space user mapping,
> - IRQ assignment based on eventfds handled on qemu side.
> 
> irqfd kernel acceleration comes in a subsequent patch.
> 
> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v4 -> v5:
> - vfio-plaform.h included first
> - cleanup error handling in *populate*, vfio_get_device,
>   vfio_enable_intp
> - vfio_put_device not called anymore
> - add some includes to follow vfio policy
> 
> v3 -> v4:
> [Eric Auger]
> - merge of "vfio: Add initial IRQ support in platform device"
>   to get a full functional patch although perfs are limited.
> - removal of unrealize function since I currently understand
>   it is only used with device hot-plug feature.
> 
> v2 -> v3:
> [Eric Auger]
> - further factorization between PCI and platform (VFIORegion,
>   VFIODevice). same level of functionality.
> 
> <= v2:
> [Kim Philipps]
> - Initial Creation of the device supporting register space mapping
> ---
>  hw/vfio/Makefile.objs           |   1 +
>  hw/vfio/platform.c              | 517 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio-platform.h |  77 ++++++
>  3 files changed, 595 insertions(+)
>  create mode 100644 hw/vfio/platform.c
>  create mode 100644 include/hw/vfio/vfio-platform.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..c5c76fe 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,5 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
> +obj-$(CONFIG_SOFTMMU) += platform.o
>  endif
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> new file mode 100644
> index 0000000..f1a1b55
> --- /dev/null
> +++ b/hw/vfio/platform.c
> @@ -0,0 +1,517 @@
> +/*
> + * vfio based device assignment support - platform devices
> + *
> + * Copyright Linaro Limited, 2014
> + *
> + * Authors:
> + *  Kim Phillips <kim.phillips@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on vfio based PCI device assignment support:
> + *  Copyright Red Hat, Inc. 2012
> + */
> +
> +#include <linux/vfio.h>
> +#include <sys/ioctl.h>
> +
> +#include "hw/vfio/vfio-platform.h"
> +#include "qemu/error-report.h"
> +#include "qemu/range.h"
> +#include "sysemu/sysemu.h"
> +#include "exec/memory.h"
> +#include "qemu/queue.h"
> +#include "hw/sysbus.h"
> +
> +extern const MemoryRegionOps vfio_region_ops;
> +extern const MemoryListener vfio_memory_listener;
> +extern QLIST_HEAD(, VFIOGroup) group_list;
> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
> +void vfio_put_device(VFIOPlatformDevice *vdev);
> +
> +/*
> + * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
> + * is not a QOM Object and cannot be passed to memory region functions
> +*/
> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
> +{
> +    VFIORegion *region = vdev->regions[nr];
> +    unsigned size = region->size;
> +    char name[64];
> +
> +    if (!size) {
> +        return;
> +    }
> +
> +    snprintf(name, sizeof(name), "VFIO %s region %d",
> +             vdev->vbasedev.name, nr);
> +
> +    /* A "slow" read/write mapping underlies all regions */
> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
> +                          region, name, size);
> +
> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
> +
> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
> +        error_report("%s unsupported. Performance may be slow", name);
> +    }
> +}
> +
> +static void print_regions(VFIOPlatformDevice *vdev)
> +{
> +    int i;
> +
> +    DPRINTF("Device \"%s\" counts %d region(s):\n",
> +             vdev->vbasedev.name, vdev->vbasedev.num_regions);
> +
> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> +        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
> +                "fd= %d, offset = 0x%lx\n",
> +                vdev->regions[i]->nr,
> +                (unsigned long)vdev->regions[i]->flags,
> +                (unsigned long)vdev->regions[i]->size,
> +                vdev->regions[i]->vbasedev->fd,
> +                (unsigned long)vdev->regions[i]->fd_offset);
> +    }
> +}
> +
> +static int vfio_populate_regions(VFIODevice *vbasedev)
> +{
> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> +    int i, ret = 0;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
> +        reg_info.index = i;
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> +        if (ret) {
> +            error_report("vfio: Error getting region %d info: %m", i);
> +            goto error;
> +        }
> +
> +        vdev->regions[i]->flags = reg_info.flags;
> +        vdev->regions[i]->size = reg_info.size;
> +        vdev->regions[i]->fd_offset = reg_info.offset;
> +        vdev->regions[i]->nr = i;
> +        vdev->regions[i]->vbasedev = vbasedev;
> +    }
> +    print_regions(vdev);
> +error:
> +    return ret;
> +}
> +
> +/* not implemented yet */
> +static int vfio_platform_check_device(VFIODevice *vdev)
> +{
> +    return 0;
> +}
> +
> +/* not implemented yet */
> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
> +{
> +return false;
> +}
> +
> +/* not implemented yet */
> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
> +{
> +return 0;
> +}
> +
> +/*
> + * eoi function is called on the first access to any MMIO region
> + * after an IRQ was triggered. It is assumed this access corresponds
> + * to the IRQ status register reset.
> + * With such a mechanism, a single IRQ can be handled at a time since
> + * there is no way to know which IRQ was completed by the guest.
> + * (we would need additional details about the IRQ status register mask)
> + */
> +static void vfio_platform_eoi(VFIODevice *vbasedev)
> +{
> +    VFIOINTp *intp;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
> +        if (intp->state == VFIO_IRQ_ACTIVE) {
> +            DPRINTF("EOI IRQ #%d fd=%d\n",
> +                    intp->pin, event_notifier_get_fd(&intp->interrupt));
> +            intp->state = VFIO_IRQ_INACTIVE;
> +
> +            /* deassert the virtual IRQ and unmask physical one */
> +            qemu_set_irq(intp->qemuirq, 0);
> +            vfio_unmask_irqindex(vbasedev, intp->pin);
> +
> +            /* a single IRQ can be active at a time */
> +            break;
> +        }
> +    }
> +
> +    /* in case there are pending IRQs, handle them one at a time */
> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
> +        vfio_intp_interrupt(intp);
> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
> +    }
> +}
> +
> +/*
> + * enable/disable the fast path mode
> + * fast path = MMIO region is mmaped (no KVM TRAP)
> + * slow path = MMIO region is trapped and region callbacks are called
> + * slow path enables to trap the IRQ status register guest reset
> +*/
> +
> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
> +{
> +    VFIORegion *region;
> +    int i;
> +
> +    DPRINTF("fast path = %d\n", enabled);
> +
> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> +        region = vdev->regions[i];
> +
> +        /* register space is unmapped to trap EOI */
> +        memory_region_set_enabled(&region->mmap_mem, enabled);
> +    }
> +}
> +
> +/*
> + * Checks whether the IRQ is still pending. In the negative
> + * the fast path mode (where reg space is mmaped) can be restored.
> + * if the IRQ is still pending, we must keep on trapping IRQ status
> + * register reset with mmap disabled (slow path).
> + * the function is called on mmap_timer event.
> + * by construction a single fd is handled at a time. See EOI comment
> + * for additional details.
> + */
> +static void vfio_intp_mmap_enable(void *opaque)
> +{
> +    VFIOINTp *tmp;
> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
> +
> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> +            DPRINTF("IRQ #%d still active, stay in slow path\n",
> +                    tmp->pin);
> +            timer_mod(vdev->mmap_timer,
> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                          vdev->mmap_timeout);
> +            return;
> +        }
> +    }
> +    DPRINTF("no active IRQ, restore fast path\n");
> +    vfio_mmap_set_enabled(vdev, true);
> +}
> +
> +/*
> + * The fd handler
> + */
> +void vfio_intp_interrupt(void *opaque)
> +{
> +    int ret;
> +    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
> +    VFIOPlatformDevice *vdev = intp->vdev;
> +    bool one_active_irq = false;
> +
> +    /*
> +     * first check whether there is a pending IRQ
> +     * in the positive the new IRQ cannot be handled until the
> +     * active one is not completed.
> +     * by construction the same IRQ as the pending one cannot hit
> +     * since the physical IRQ was disabled by the VFIO driver
> +     */
> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> +            one_active_irq = true;
> +            break;
> +        }
> +    }
> +    if (one_active_irq) {
> +        /*
> +         * the new IRQ gets a pending status and is pushed in
> +         * the pending queue
> +         */
> +        intp->state = VFIO_IRQ_PENDING;
> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
> +                             intp, pqnext);
> +        return;
> +    }
> +
> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
> +    DPRINTF("Handle IRQ #%d (fd = %d)\n",
> +            intp->pin, event_notifier_get_fd(&intp->interrupt));
> +
> +    ret = event_notifier_test_and_clear(&intp->interrupt);
> +    if (!ret) {
> +        DPRINTF("Error when clearing fd=%d\n",
> +                event_notifier_get_fd(&intp->interrupt));
> +    }
> +
> +    intp->state = VFIO_IRQ_ACTIVE;
> +
> +    /* sets slow path */
> +    vfio_mmap_set_enabled(vdev, false);
> +
> +    /* trigger the virtual IRQ */
> +    qemu_set_irq(intp->qemuirq, 1);
> +
> +    /* schedule the mmap timer which will restore mmap path after EOI*/
> +    if (vdev->mmap_timeout) {
> +        timer_mod(vdev->mmap_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                      vdev->mmap_timeout);
> +    }
> +}
> +
> +static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
> +{
> +    struct vfio_irq_set *irq_set;
> +    int32_t *pfd;
> +    int ret, argsz;
> +    int device = vbasedev->fd;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
> +    VFIOINTp *intp;
> +
> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
> +    intp = g_malloc0(sizeof(*intp));
> +    intp->vdev = vdev;
> +    intp->pin = index;
> +    intp->state = VFIO_IRQ_INACTIVE;
> +    sysbus_init_irq(sbdev, &intp->qemuirq);
> +
> +    ret = event_notifier_init(&intp->interrupt, 0);
> +    if (ret) {
> +        g_free(intp);
> +        error_report("vfio: Error: event_notifier_init failed ");
> +        return ret;
> +    }
> +
> +    /* build the irq_set to be passed to the vfio kernel driver */
> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> +
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = index;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    pfd = (int32_t *)&irq_set->data;
> +
> +    *pfd = event_notifier_get_fd(&intp->interrupt);
> +
> +    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
> +
> +    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
> +
> +    /*
> +     * pass the index/fd binding to the kernel driver so that it
> +     * triggers this fd on HW IRQ
> +     */
> +    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
> +    g_free(irq_set);
> +    if (ret) {
> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
> +        event_notifier_cleanup(&intp->interrupt);
> +        return -errno;
> +    }
> +
> +    /* store the new intp in qlist */
> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
> +    return 0;
> +}
> +
> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
> +{
> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> +    int i, ret;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                    vfio_intp_mmap_enable, vdev);
> +
> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        irq.index = i;
> +
> +        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
> +
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> +        if (ret) {
> +            /* This can fail for an old kernel or legacy PCI dev */
> +            error_printf("vfio: error getting device %s irq info",
> +                         vbasedev->name);

Strange comment for a platform device.  On PCI this comment only applied
to the virtual error IRQ since it may or may not be supported per
device.  For PCI, the number of IRQs and regions is really more of a
highest index, so it can be sparsely populated.  We know about the error
IRQ, so probe for it, but it may not be present.  Likewise, we know
about the VGA region, but it may not be supported by this device and
will return error on the info call.

> +        } else {
> +            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
> +                    irq.index, irq.count, irq.flags);
> +
> +            ret = vfio_enable_intp(vbasedev, irq.index);
> +            if (ret) {
> +                error_report("vfio: Error setting IRQ %d up", i);
> +                return ret;
> +            }
> +        }
> +    }
> +    return 0;
> +}
> +
> +static VFIODeviceOps vfio_platform_ops = {
> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
> +    .vfio_eoi = vfio_platform_eoi,
> +    .vfio_check_device = vfio_platform_check_device,
> +    .vfio_populate_regions = vfio_populate_regions,
> +    .vfio_populate_interrupts = vfio_populate_interrupts,
> +};
> +
> +static int vfio_base_device_init(VFIODevice *vbasedev)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev_iter;
> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
> +    ssize_t len;
> +    struct stat st;
> +    int groupid;
> +    int ret;
> +
> +    /* name must be set prior to the call */
> +    if (!vbasedev->name) {
> +        return -EINVAL;
> +    }
> +
> +    /* Check that the host device exists */
> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
> +             vbasedev->name);
> +
> +    if (stat(path, &st) < 0) {
> +        error_report("vfio: error: no such host device: %s", path);
> +        return -errno;
> +    }
> +
> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
> +    len = readlink(path, iommu_group_path, sizeof(path));
> +    if (len <= 0 || len >= sizeof(path)) {
> +        error_report("vfio: error no iommu_group for device");
> +        return len < 0 ? -errno : ENAMETOOLONG;
> +    }
> +
> +    iommu_group_path[len] = 0;
> +    group_name = basename(iommu_group_path);
> +
> +    if (sscanf(group_name, "%d", &groupid) != 1) {
> +        error_report("vfio: error reading %s: %m", path);
> +        return -errno;
> +    }
> +
> +    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
> +
> +    group = vfio_get_group(groupid, &address_space_memory);
> +    if (!group) {
> +        error_report("vfio: failed to get group %d", groupid);
> +        return -ENOENT;
> +    }
> +
> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
> +
> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
> +            error_report("vfio: error: device %s is already attached", path);
> +            vfio_put_group(group);
> +            return -EBUSY;
> +        }
> +    }
> +    ret = vfio_get_device(group, path, vbasedev);
> +    if (ret) {
> +        error_report("vfio: failed to get device %s", path);
> +        vfio_put_group(group);
> +    }
> + return ret;
> +}
> +
> +void vfio_put_device(VFIOPlatformDevice *vdev)
> +{
> +    unsigned int i;
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +            g_free(vdev->regions[i]);
> +    }
> +    g_free(vdev->regions);
> +    g_free(vdev->vbasedev.name);
> +    vfio_put_base_device(&vdev->vbasedev);
> +}
> +
> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
> +{
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    int i, ret;
> +
> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
> +    vbasedev->ops = &vfio_platform_ops;
> +
> +    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
> +
> +    ret = vfio_base_device_init(vbasedev);
> +    if (ret) {
> +        return;
> +    }
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vfio_map_region(vdev, i);
> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
> +    }
> +}
> +
> +static const VMStateDescription vfio_platform_vmstate = {
> +    .name = TYPE_VFIO_PLATFORM,
> +    .unmigratable = 1,
> +};
> +
> +static Property vfio_platform_dev_properties[] = {
> +    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),

Hmm, is this really a good name for this option?  "host" would give you
some consistency with vfio-pci.

> +    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
> +    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
> +                       mmap_timeout, 1100),
> +    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),

Should some of these be x- options or do you plan to support them long
term and support users twiddling them?

> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vfio_platform_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->realize = vfio_platform_realize;
> +    dc->props = vfio_platform_dev_properties;
> +    dc->vmsd = &vfio_platform_vmstate;
> +    dc->desc = "VFIO-based platform device assignment";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +}
> +
> +static const TypeInfo vfio_platform_dev_info = {
> +    .name = TYPE_VFIO_PLATFORM,
> +    .parent = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(VFIOPlatformDevice),
> +    .class_init = vfio_platform_class_init,
> +    .class_size = sizeof(VFIOPlatformDeviceClass),
> +};
> +
> +static void register_vfio_platform_dev_type(void)
> +{
> +    type_register_static(&vfio_platform_dev_info);
> +}
> +
> +type_init(register_vfio_platform_dev_type)
> diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
> new file mode 100644
> index 0000000..1ee072a
> --- /dev/null
> +++ b/include/hw/vfio/vfio-platform.h
> @@ -0,0 +1,77 @@
> +/*
> + * vfio based device assignment support - platform devices
> + *
> + * Copyright Linaro Limited, 2014
> + *
> + * Authors:
> + *  Kim Phillips <kim.phillips@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on vfio based PCI device assignment support:
> + *  Copyright Red Hat, Inc. 2012
> + */
> +
> +#ifndef HW_VFIO_VFIO_PLATFORM_H
> +#define HW_VFIO_VFIO_PLATFORM_H
> +
> +#include "hw/sysbus.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "qemu/event_notifier.h"
> +#include "qemu/queue.h"
> +#include "hw/irq.h"
> +
> +#define TYPE_VFIO_PLATFORM "vfio-platform"
> +
> +enum {
> +    VFIO_IRQ_INACTIVE = 0,
> +    VFIO_IRQ_PENDING = 1,
> +    VFIO_IRQ_ACTIVE = 2,
> +    /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
> +};
> +
> +typedef struct VFIOINTp {
> +    QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
> +    QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
> +    EventNotifier interrupt; /* eventfd triggered on interrupt */
> +    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
> +    qemu_irq qemuirq;
> +    struct VFIOPlatformDevice *vdev; /* back pointer to device */
> +    int state; /* inactive, pending, active */
> +    bool kvm_accel; /* set when QEMU bypass through KVM enabled */
> +    uint8_t pin; /* index */
> +    uint8_t virtualID; /* virtual IRQ */
> +} VFIOINTp;
> +
> +typedef struct VFIOPlatformDevice {
> +    SysBusDevice sbdev;
> +    VFIODevice vbasedev; /* not a QOM object */
> +    VFIORegion **regions;
> +    QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
> +    /* queue of pending IRQ */
> +    QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
> +    char *compat; /* compatibility string */
> +    bool irqfd_allowed;
> +    uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
> +    QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
> +} VFIOPlatformDevice;
> +
> +
> +typedef struct VFIOPlatformDeviceClass {
> +    /*< private >*/
> +    SysBusDeviceClass parent_class;
> +    /*< public >*/
> +} VFIOPlatformDeviceClass;
> +
> +#define VFIO_PLATFORM_DEVICE(obj) \
> +     OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
> +#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
> +     OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
> +#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
> +     OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
> +
> +void vfio_intp_interrupt(void *opaque);
> +void vfio_setup_irqfd(SysBusDevice *dev, int index, int virq);

This was never defined.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-08-12  2:34   ` David Gibson
  0 siblings, 0 replies; 50+ messages in thread
From: David Gibson @ 2014-08-12  2:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kim.phillips, eric.auger, patches, joel.schopp,
	will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, agraf,
	alex.williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On Sat, Aug 09, 2014 at 03:25:42PM +0100, Eric Auger wrote:
> Introduce the VFIODevice struct that is going to be shared by
> VFIOPCIDevice and VFIOPlatformDevice.
> 
> Additional fields will be added there later on for review
> convenience.
> 
> the group's device_list becomes a list of VFIODevice
> 
> This obliges to rework the reset_handler which becomes generic and
> calls VFIODevice ops that are specialized in each parent object.
> Also functions that iterate on this list must take care that the
> devices can be something else than VFIOPCIDevice. The type is used
> to discriminate them.
> 
> we profit from this step to change the prototype of
> vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
> apply to VFIODevice. They are renamed as *_irqindex.
> The index is passed as parameter to anticipate their usage for
> platform IRQs
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v4->v5:
> - fix style issues
> - in vfio_initfn, rework allocation of vdev->vbasedev.name and
>   replace snprintf by g_strdup_printf
> ---
>  hw/vfio/pci.c | 239 +++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 146 insertions(+), 93 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c2cdd73..ae827c5 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -56,6 +56,11 @@
>  #define VFIO_ALLOW_KVM_MSI 1
>  #define VFIO_ALLOW_KVM_MSIX 1
>  
> +enum {
> +    VFIO_DEVICE_TYPE_PCI = 0,
> +    VFIO_DEVICE_TYPE_PLATFORM = 1,
> +};
> +
>  struct VFIOPCIDevice;
>  
>  typedef struct VFIOQuirk {
> @@ -193,9 +198,27 @@ typedef struct VFIOMSIXInfo {
>      void *mmap;
>  } VFIOMSIXInfo;
>  
> +typedef struct VFIODeviceOps VFIODeviceOps;
> +
> +typedef struct VFIODevice {
> +    QLIST_ENTRY(VFIODevice) next;
> +    struct VFIOGroup *group;
> +    char *name;
> +    int fd;
> +    int type;

I'm assuming this takes values from the enum above.  Is this actually
necessary as a field, or could the same information be derived from
the device's QOM class information?

> +    bool reset_works;
> +    bool needs_reset;
> +    VFIODeviceOps *ops;
> +} VFIODevice;
> +
> +struct VFIODeviceOps {
> +    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
> +    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> +};

Shouldn't these be methods in the QOM class, rather than a separate
ops structure?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-08-12  2:41   ` David Gibson
  2014-08-12  6:54     ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: David Gibson @ 2014-08-12  2:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kim.phillips, eric.auger, patches, joel.schopp,
	will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, agraf,
	alex.williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On Sat, Aug 09, 2014 at 03:25:44PM +0100, Eric Auger wrote:
> vfio_get_device now takes a VFIODevice as argument. The function is split
> into 4 functional parts: dev_info query, device check, region populate
> and interrupt populate. the last 3 are specialized by parent device and
> are added into DeviceOps.

Why is splitting these up into 4 stages useful, rather than having a
single sub-class specific callback?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-11 20:13   ` Alex Williamson
@ 2014-08-12  5:51     ` Eric Auger
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-12  5:51 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On 08/11/2014 10:13 PM, Alex Williamson wrote:
> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>> Minimal VFIO platform implementation supporting
>> - register space user mapping,
>> - IRQ assignment based on eventfds handled on qemu side.
>>
>> irqfd kernel acceleration comes in a subsequent patch.
>>
>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v4 -> v5:
>> - vfio-plaform.h included first
>> - cleanup error handling in *populate*, vfio_get_device,
>>   vfio_enable_intp
>> - vfio_put_device not called anymore
>> - add some includes to follow vfio policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> - merge of "vfio: Add initial IRQ support in platform device"
>>   to get a full functional patch although perfs are limited.
>> - removal of unrealize function since I currently understand
>>   it is only used with device hot-plug feature.
>>
>> v2 -> v3:
>> [Eric Auger]
>> - further factorization between PCI and platform (VFIORegion,
>>   VFIODevice). same level of functionality.
>>
>> <= v2:
>> [Kim Philipps]
>> - Initial Creation of the device supporting register space mapping
>> ---
>>  hw/vfio/Makefile.objs           |   1 +
>>  hw/vfio/platform.c              | 517 ++++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio-platform.h |  77 ++++++
>>  3 files changed, 595 insertions(+)
>>  create mode 100644 hw/vfio/platform.c
>>  create mode 100644 include/hw/vfio/vfio-platform.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..c5c76fe 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,5 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>  endif
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> new file mode 100644
>> index 0000000..f1a1b55
>> --- /dev/null
>> +++ b/hw/vfio/platform.c
>> @@ -0,0 +1,517 @@
>> +/*
>> + * vfio based device assignment support - platform devices
>> + *
>> + * Copyright Linaro Limited, 2014
>> + *
>> + * Authors:
>> + *  Kim Phillips <kim.phillips@linaro.org>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on vfio based PCI device assignment support:
>> + *  Copyright Red Hat, Inc. 2012
>> + */
>> +
>> +#include <linux/vfio.h>
>> +#include <sys/ioctl.h>
>> +
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/range.h"
>> +#include "sysemu/sysemu.h"
>> +#include "exec/memory.h"
>> +#include "qemu/queue.h"
>> +#include "hw/sysbus.h"
>> +
>> +extern const MemoryRegionOps vfio_region_ops;
>> +extern const MemoryListener vfio_memory_listener;
>> +extern QLIST_HEAD(, VFIOGroup) group_list;
>> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
>> +void vfio_put_device(VFIOPlatformDevice *vdev);
>> +
>> +/*
>> + * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
>> + * is not a QOM Object and cannot be passed to memory region functions
>> +*/
>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
>> +{
>> +    VFIORegion *region = vdev->regions[nr];
>> +    unsigned size = region->size;
>> +    char name[64];
>> +
>> +    if (!size) {
>> +        return;
>> +    }
>> +
>> +    snprintf(name, sizeof(name), "VFIO %s region %d",
>> +             vdev->vbasedev.name, nr);
>> +
>> +    /* A "slow" read/write mapping underlies all regions */
>> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
>> +                          region, name, size);
>> +
>> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>> +
>> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
>> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
>> +        error_report("%s unsupported. Performance may be slow", name);
>> +    }
>> +}
>> +
>> +static void print_regions(VFIOPlatformDevice *vdev)
>> +{
>> +    int i;
>> +
>> +    DPRINTF("Device \"%s\" counts %d region(s):\n",
>> +             vdev->vbasedev.name, vdev->vbasedev.num_regions);
>> +
>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>> +        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
>> +                "fd= %d, offset = 0x%lx\n",
>> +                vdev->regions[i]->nr,
>> +                (unsigned long)vdev->regions[i]->flags,
>> +                (unsigned long)vdev->regions[i]->size,
>> +                vdev->regions[i]->vbasedev->fd,
>> +                (unsigned long)vdev->regions[i]->fd_offset);
>> +    }
>> +}
>> +
>> +static int vfio_populate_regions(VFIODevice *vbasedev)
>> +{
>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> +    int i, ret = 0;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
>> +        reg_info.index = i;
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>> +        if (ret) {
>> +            error_report("vfio: Error getting region %d info: %m", i);
>> +            goto error;
>> +        }
>> +
>> +        vdev->regions[i]->flags = reg_info.flags;
>> +        vdev->regions[i]->size = reg_info.size;
>> +        vdev->regions[i]->fd_offset = reg_info.offset;
>> +        vdev->regions[i]->nr = i;
>> +        vdev->regions[i]->vbasedev = vbasedev;
>> +    }
>> +    print_regions(vdev);
>> +error:
>> +    return ret;
>> +}
>> +
>> +/* not implemented yet */
>> +static int vfio_platform_check_device(VFIODevice *vdev)
>> +{
>> +    return 0;
>> +}
>> +
>> +/* not implemented yet */
>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
>> +{
>> +return false;
>> +}
>> +
>> +/* not implemented yet */
>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
>> +{
>> +return 0;
>> +}
>> +
>> +/*
>> + * eoi function is called on the first access to any MMIO region
>> + * after an IRQ was triggered. It is assumed this access corresponds
>> + * to the IRQ status register reset.
>> + * With such a mechanism, a single IRQ can be handled at a time since
>> + * there is no way to know which IRQ was completed by the guest.
>> + * (we would need additional details about the IRQ status register mask)
>> + */
>> +static void vfio_platform_eoi(VFIODevice *vbasedev)
>> +{
>> +    VFIOINTp *intp;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>> +        if (intp->state == VFIO_IRQ_ACTIVE) {
>> +            DPRINTF("EOI IRQ #%d fd=%d\n",
>> +                    intp->pin, event_notifier_get_fd(&intp->interrupt));
>> +            intp->state = VFIO_IRQ_INACTIVE;
>> +
>> +            /* deassert the virtual IRQ and unmask physical one */
>> +            qemu_set_irq(intp->qemuirq, 0);
>> +            vfio_unmask_irqindex(vbasedev, intp->pin);
>> +
>> +            /* a single IRQ can be active at a time */
>> +            break;
>> +        }
>> +    }
>> +
>> +    /* in case there are pending IRQs, handle them one at a time */
>> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
>> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
>> +        vfio_intp_interrupt(intp);
>> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
>> +    }
>> +}
>> +
>> +/*
>> + * enable/disable the fast path mode
>> + * fast path = MMIO region is mmaped (no KVM TRAP)
>> + * slow path = MMIO region is trapped and region callbacks are called
>> + * slow path enables to trap the IRQ status register guest reset
>> +*/
>> +
>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
>> +{
>> +    VFIORegion *region;
>> +    int i;
>> +
>> +    DPRINTF("fast path = %d\n", enabled);
>> +
>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>> +        region = vdev->regions[i];
>> +
>> +        /* register space is unmapped to trap EOI */
>> +        memory_region_set_enabled(&region->mmap_mem, enabled);
>> +    }
>> +}
>> +
>> +/*
>> + * Checks whether the IRQ is still pending. In the negative
>> + * the fast path mode (where reg space is mmaped) can be restored.
>> + * if the IRQ is still pending, we must keep on trapping IRQ status
>> + * register reset with mmap disabled (slow path).
>> + * the function is called on mmap_timer event.
>> + * by construction a single fd is handled at a time. See EOI comment
>> + * for additional details.
>> + */
>> +static void vfio_intp_mmap_enable(void *opaque)
>> +{
>> +    VFIOINTp *tmp;
>> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
>> +
>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>> +            DPRINTF("IRQ #%d still active, stay in slow path\n",
>> +                    tmp->pin);
>> +            timer_mod(vdev->mmap_timer,
>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                          vdev->mmap_timeout);
>> +            return;
>> +        }
>> +    }
>> +    DPRINTF("no active IRQ, restore fast path\n");
>> +    vfio_mmap_set_enabled(vdev, true);
>> +}
>> +
>> +/*
>> + * The fd handler
>> + */
>> +void vfio_intp_interrupt(void *opaque)
>> +{
>> +    int ret;
>> +    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
>> +    VFIOPlatformDevice *vdev = intp->vdev;
>> +    bool one_active_irq = false;
>> +
>> +    /*
>> +     * first check whether there is a pending IRQ
>> +     * in the positive the new IRQ cannot be handled until the
>> +     * active one is not completed.
>> +     * by construction the same IRQ as the pending one cannot hit
>> +     * since the physical IRQ was disabled by the VFIO driver
>> +     */
>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>> +            one_active_irq = true;
>> +            break;
>> +        }
>> +    }
>> +    if (one_active_irq) {
>> +        /*
>> +         * the new IRQ gets a pending status and is pushed in
>> +         * the pending queue
>> +         */
>> +        intp->state = VFIO_IRQ_PENDING;
>> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
>> +                             intp, pqnext);
>> +        return;
>> +    }
>> +
>> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
>> +    DPRINTF("Handle IRQ #%d (fd = %d)\n",
>> +            intp->pin, event_notifier_get_fd(&intp->interrupt));
>> +
>> +    ret = event_notifier_test_and_clear(&intp->interrupt);
>> +    if (!ret) {
>> +        DPRINTF("Error when clearing fd=%d\n",
>> +                event_notifier_get_fd(&intp->interrupt));
>> +    }
>> +
>> +    intp->state = VFIO_IRQ_ACTIVE;
>> +
>> +    /* sets slow path */
>> +    vfio_mmap_set_enabled(vdev, false);
>> +
>> +    /* trigger the virtual IRQ */
>> +    qemu_set_irq(intp->qemuirq, 1);
>> +
>> +    /* schedule the mmap timer which will restore mmap path after EOI*/
>> +    if (vdev->mmap_timeout) {
>> +        timer_mod(vdev->mmap_timer,
>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                      vdev->mmap_timeout);
>> +    }
>> +}
>> +
>> +static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
>> +{
>> +    struct vfio_irq_set *irq_set;
>> +    int32_t *pfd;
>> +    int ret, argsz;
>> +    int device = vbasedev->fd;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
>> +    VFIOINTp *intp;
>> +
>> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
>> +    intp = g_malloc0(sizeof(*intp));
>> +    intp->vdev = vdev;
>> +    intp->pin = index;
>> +    intp->state = VFIO_IRQ_INACTIVE;
>> +    sysbus_init_irq(sbdev, &intp->qemuirq);
>> +
>> +    ret = event_notifier_init(&intp->interrupt, 0);
>> +    if (ret) {
>> +        g_free(intp);
>> +        error_report("vfio: Error: event_notifier_init failed ");
>> +        return ret;
>> +    }
>> +
>> +    /* build the irq_set to be passed to the vfio kernel driver */
>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>> +
>> +    irq_set = g_malloc0(argsz);
>> +    irq_set->argsz = argsz;
>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
>> +    irq_set->index = index;
>> +    irq_set->start = 0;
>> +    irq_set->count = 1;
>> +    pfd = (int32_t *)&irq_set->data;
>> +
>> +    *pfd = event_notifier_get_fd(&intp->interrupt);
>> +
>> +    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
>> +
>> +    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
>> +
>> +    /*
>> +     * pass the index/fd binding to the kernel driver so that it
>> +     * triggers this fd on HW IRQ
>> +     */
>> +    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    g_free(irq_set);
>> +    if (ret) {
>> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
>> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
>> +        event_notifier_cleanup(&intp->interrupt);
>> +        return -errno;
>> +    }
>> +
>> +    /* store the new intp in qlist */
>> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>> +    return 0;
>> +}
>> +
>> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
>> +{
>> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
>> +    int i, ret;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +                                    vfio_intp_mmap_enable, vdev);
>> +
>> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        irq.index = i;
>> +
>> +        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
>> +
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>> +        if (ret) {
>> +            /* This can fail for an old kernel or legacy PCI dev */
>> +            error_printf("vfio: error getting device %s irq info",
>> +                         vbasedev->name);
> 
> Strange comment for a platform device.  On PCI this comment only applied
> to the virtual error IRQ since it may or may not be supported per
> device.  For PCI, the number of IRQs and regions is really more of a
> highest index, so it can be sparsely populated.  We know about the error
> IRQ, so probe for it, but it may not be present.  Likewise, we know
> about the VGA region, but it may not be supported by this device and
> will return error on the info call.

Hi Alex,

thanks for explaining the legacy. I will treat that as an error then.
> 
>> +        } else {
>> +            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
>> +                    irq.index, irq.count, irq.flags);
>> +
>> +            ret = vfio_enable_intp(vbasedev, irq.index);
>> +            if (ret) {
>> +                error_report("vfio: Error setting IRQ %d up", i);
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +static VFIODeviceOps vfio_platform_ops = {
>> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
>> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
>> +    .vfio_eoi = vfio_platform_eoi,
>> +    .vfio_check_device = vfio_platform_check_device,
>> +    .vfio_populate_regions = vfio_populate_regions,
>> +    .vfio_populate_interrupts = vfio_populate_interrupts,
>> +};
>> +
>> +static int vfio_base_device_init(VFIODevice *vbasedev)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev_iter;
>> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>> +    ssize_t len;
>> +    struct stat st;
>> +    int groupid;
>> +    int ret;
>> +
>> +    /* name must be set prior to the call */
>> +    if (!vbasedev->name) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Check that the host device exists */
>> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
>> +             vbasedev->name);
>> +
>> +    if (stat(path, &st) < 0) {
>> +        error_report("vfio: error: no such host device: %s", path);
>> +        return -errno;
>> +    }
>> +
>> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>> +    len = readlink(path, iommu_group_path, sizeof(path));
>> +    if (len <= 0 || len >= sizeof(path)) {
>> +        error_report("vfio: error no iommu_group for device");
>> +        return len < 0 ? -errno : ENAMETOOLONG;
>> +    }
>> +
>> +    iommu_group_path[len] = 0;
>> +    group_name = basename(iommu_group_path);
>> +
>> +    if (sscanf(group_name, "%d", &groupid) != 1) {
>> +        error_report("vfio: error reading %s: %m", path);
>> +        return -errno;
>> +    }
>> +
>> +    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
>> +
>> +    group = vfio_get_group(groupid, &address_space_memory);
>> +    if (!group) {
>> +        error_report("vfio: failed to get group %d", groupid);
>> +        return -ENOENT;
>> +    }
>> +
>> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
>> +
>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>> +            error_report("vfio: error: device %s is already attached", path);
>> +            vfio_put_group(group);
>> +            return -EBUSY;
>> +        }
>> +    }
>> +    ret = vfio_get_device(group, path, vbasedev);
>> +    if (ret) {
>> +        error_report("vfio: failed to get device %s", path);
>> +        vfio_put_group(group);
>> +    }
>> + return ret;
>> +}
>> +
>> +void vfio_put_device(VFIOPlatformDevice *vdev)
>> +{
>> +    unsigned int i;
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +            g_free(vdev->regions[i]);
>> +    }
>> +    g_free(vdev->regions);
>> +    g_free(vdev->vbasedev.name);
>> +    vfio_put_base_device(&vdev->vbasedev);
>> +}
>> +
>> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
>> +{
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    int i, ret;
>> +
>> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
>> +    vbasedev->ops = &vfio_platform_ops;
>> +
>> +    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
>> +
>> +    ret = vfio_base_device_init(vbasedev);
>> +    if (ret) {
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vfio_map_region(vdev, i);
>> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
>> +    }
>> +}
>> +
>> +static const VMStateDescription vfio_platform_vmstate = {
>> +    .name = TYPE_VFIO_PLATFORM,
>> +    .unmigratable = 1,
>> +};
>> +
>> +static Property vfio_platform_dev_properties[] = {
>> +    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
> 
> Hmm, is this really a good name for this option?  "host" would give you
> some consistency with vfio-pci.
ok
> 
>> +    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
>> +    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
>> +                       mmap_timeout, 1100),
>> +    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> 
> Should some of these be x- options or do you plan to support them long
> term and support users twiddling them?
- compat should disappear if we transform the vfio-platform class as an
abstract
- irqfd currently is here for testing
- mmap-timeout-ms will stay
> 
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void vfio_platform_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +    dc->realize = vfio_platform_realize;
>> +    dc->props = vfio_platform_dev_properties;
>> +    dc->vmsd = &vfio_platform_vmstate;
>> +    dc->desc = "VFIO-based platform device assignment";
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +}
>> +
>> +static const TypeInfo vfio_platform_dev_info = {
>> +    .name = TYPE_VFIO_PLATFORM,
>> +    .parent = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(VFIOPlatformDevice),
>> +    .class_init = vfio_platform_class_init,
>> +    .class_size = sizeof(VFIOPlatformDeviceClass),
>> +};
>> +
>> +static void register_vfio_platform_dev_type(void)
>> +{
>> +    type_register_static(&vfio_platform_dev_info);
>> +}
>> +
>> +type_init(register_vfio_platform_dev_type)
>> diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
>> new file mode 100644
>> index 0000000..1ee072a
>> --- /dev/null
>> +++ b/include/hw/vfio/vfio-platform.h
>> @@ -0,0 +1,77 @@
>> +/*
>> + * vfio based device assignment support - platform devices
>> + *
>> + * Copyright Linaro Limited, 2014
>> + *
>> + * Authors:
>> + *  Kim Phillips <kim.phillips@linaro.org>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on vfio based PCI device assignment support:
>> + *  Copyright Red Hat, Inc. 2012
>> + */
>> +
>> +#ifndef HW_VFIO_VFIO_PLATFORM_H
>> +#define HW_VFIO_VFIO_PLATFORM_H
>> +
>> +#include "hw/sysbus.h"
>> +#include "hw/vfio/vfio-common.h"
>> +#include "qemu/event_notifier.h"
>> +#include "qemu/queue.h"
>> +#include "hw/irq.h"
>> +
>> +#define TYPE_VFIO_PLATFORM "vfio-platform"
>> +
>> +enum {
>> +    VFIO_IRQ_INACTIVE = 0,
>> +    VFIO_IRQ_PENDING = 1,
>> +    VFIO_IRQ_ACTIVE = 2,
>> +    /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
>> +};
>> +
>> +typedef struct VFIOINTp {
>> +    QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
>> +    QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
>> +    EventNotifier interrupt; /* eventfd triggered on interrupt */
>> +    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
>> +    qemu_irq qemuirq;
>> +    struct VFIOPlatformDevice *vdev; /* back pointer to device */
>> +    int state; /* inactive, pending, active */
>> +    bool kvm_accel; /* set when QEMU bypass through KVM enabled */
>> +    uint8_t pin; /* index */
>> +    uint8_t virtualID; /* virtual IRQ */
>> +} VFIOINTp;
>> +
>> +typedef struct VFIOPlatformDevice {
>> +    SysBusDevice sbdev;
>> +    VFIODevice vbasedev; /* not a QOM object */
>> +    VFIORegion **regions;
>> +    QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
>> +    /* queue of pending IRQ */
>> +    QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
>> +    char *compat; /* compatibility string */
>> +    bool irqfd_allowed;
>> +    uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
>> +    QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
>> +} VFIOPlatformDevice;
>> +
>> +
>> +typedef struct VFIOPlatformDeviceClass {
>> +    /*< private >*/
>> +    SysBusDeviceClass parent_class;
>> +    /*< public >*/
>> +} VFIOPlatformDeviceClass;
>> +
>> +#define VFIO_PLATFORM_DEVICE(obj) \
>> +     OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
>> +#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
>> +     OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
>> +#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
>> +     OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
>> +
>> +void vfio_intp_interrupt(void *opaque);
>> +void vfio_setup_irqfd(SysBusDevice *dev, int index, int virq);
Indeed belongs to irqfd patch file!

Thanks

Best Regards

Eric
> 
> This was never defined.  Thanks,
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-11 19:20   ` Alex Williamson
@ 2014-08-12  5:57     ` Eric Auger
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-12  5:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On 08/11/2014 09:20 PM, Alex Williamson wrote:
> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>> A new common module is created. It implements all functions
>> that have no device specificity (PCI, Platform).
>>
>> This patch only consists in move (no functional changes)
>>
>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v4 -> v5:
>> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
>> - VFIOdevice .name dealloc removed from vfio_put_base_device
>> - add some includes according to vfio inclusion policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> move done after all PCI modifications to anticipate for
>> VFIO Platform needs. Purpose is to alleviate the whole
>> review process.
>>
>> <= v3
>> First split done by Kim Phillips
>> ---
>>  hw/vfio/Makefile.objs         |    1 +
>>  hw/vfio/common.c              |  990 ++++++++++++++++++++++++++++++++++++++
>>  hw/vfio/pci.c                 | 1070 +----------------------------------------
>>  include/hw/vfio/vfio-common.h |  151 ++++++
>>  4 files changed, 1147 insertions(+), 1065 deletions(-)
>>  create mode 100644 hw/vfio/common.c
>>  create mode 100644 include/hw/vfio/vfio-common.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index 31c7dab..e31f30e 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,3 +1,4 @@
>>  ifeq ($(CONFIG_LINUX), y)
>> +obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>>  endif
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> new file mode 100644
>> index 0000000..297c508
>> --- /dev/null
>> +++ b/hw/vfio/common.c
>> @@ -0,0 +1,990 @@
>> +/*
>> + * generic functions used by VFIO devices
>> + *
>> + * Copyright Red Hat, Inc. 2012
>> + *
>> + * Authors:
>> + *  Alex Williamson <alex.williamson@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on qemu-kvm device-assignment:
>> + *  Adapted for KVM by Qumranet.
>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>> + */
>> +
>> +#include <sys/ioctl.h>
>> +#include <sys/mman.h>
>> +#include <linux/vfio.h>
>> +
>> +#include "hw/vfio/vfio-common.h"
>> +#include "hw/vfio/vfio.h"
>> +#include "exec/address-spaces.h"
>> +#include "exec/memory.h"
>> +#include "hw/hw.h"
>> +#include "qemu/error-report.h"
>> +#include "sysemu/kvm.h"
>> +
>> +QLIST_HEAD(, VFIOGroup)
>> +    group_list = QLIST_HEAD_INITIALIZER(group_list);
>> +
>> +QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>> +
>> +#ifdef CONFIG_KVM
>> +/*
>> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
>> + * for the life of the VM.  Closing the file descriptor only drops our
>> + * reference to it and the device's reference to kvm.  Therefore once
>> + * initialized, this file descriptor is only released on QEMU exit and
>> + * we'll re-use it should another vfio device be attached before then.
>> + */
>> +static int vfio_kvm_device_fd = -1;
>> +#endif
>> +
>> +/*
>> + * Common VFIO interrupt disable
>> + */
>> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>> +{
>> +    struct vfio_irq_set irq_set = {
>> +        .argsz = sizeof(irq_set),
>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
>> +        .index = index,
>> +        .start = 0,
>> +        .count = 0,
>> +    };
>> +
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +}
>> +
>> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>> +{
>> +    struct vfio_irq_set irq_set = {
>> +        .argsz = sizeof(irq_set),
>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>> +        .index = index,
>> +        .start = 0,
>> +        .count = 1,
>> +    };
>> +
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +}
>> +
>> +#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
> 
> Can we remove the ifdef here and in the common header now?  I'm hoping
> the compiler won't complain once it's no longer static.
OK
> 
> ...
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 5f218b7..d2ccb3b 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -39,27 +39,12 @@
>>  #include "qemu/range.h"
>>  #include "sysemu/kvm.h"
>>  #include "sysemu/sysemu.h"
>> -#include "hw/vfio/vfio.h"
>> +#include "hw/vfio/vfio-common.h"
>>  
>> -/* #define DEBUG_VFIO */
>> -#ifdef DEBUG_VFIO
>> -#define DPRINTF(fmt, ...) \
>> -    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>> -#else
>> -#define DPRINTF(fmt, ...) \
>> -    do { } while (0)
>> -#endif
>> -
>> -/* Extra debugging, trap acceleration paths for more logging */
>> -#define VFIO_ALLOW_MMAP 1
>> -#define VFIO_ALLOW_KVM_INTX 1
>> -#define VFIO_ALLOW_KVM_MSI 1
>> -#define VFIO_ALLOW_KVM_MSIX 1
>> -
>> -enum {
>> -    VFIO_DEVICE_TYPE_PCI = 0,
>> -    VFIO_DEVICE_TYPE_PLATFORM = 1,
>> -};
>> +extern const MemoryRegionOps vfio_region_ops;
>> +extern const MemoryListener vfio_memory_listener;
>> +extern QLIST_HEAD(, VFIOGroup) group_list;
>> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
> 
> This seems odd, why doesn't the common header provide these for us?  We
> should also rename group_list to vfio_group_list to be polite to the
> rest of the namespace.  Thanks,

OK will rework that

Thanks

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-11 19:25   ` Alex Williamson
@ 2014-08-12  6:09     ` Eric Auger
  2014-08-13 19:59       ` Alex Williamson
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-12  6:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On 08/11/2014 09:25 PM, Alex Williamson wrote:
> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> new file mode 100644
>> index 0000000..4684ee5
>> --- /dev/null
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -0,0 +1,151 @@
>> +/*
>> + * common header for vfio based device assignment support
>> + *
>> + * Copyright Red Hat, Inc. 2012
>> + *
>> + * Authors:
>> + *  Alex Williamson <alex.williamson@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on qemu-kvm device-assignment:
>> + *  Adapted for KVM by Qumranet.
>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>> + */
>> +#ifndef HW_VFIO_VFIO_COMMON_H
>> +#define HW_VFIO_VFIO_COMMON_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/address-spaces.h"
>> +#include "exec/memory.h"
>> +#include "qemu/queue.h"
>> +#include "qemu/notify.h"
>> +
>> +/*#define DEBUG_VFIO*/
>> +#ifdef DEBUG_VFIO
>> +#define DPRINTF(fmt, ...) \
>> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>> +#else
>> +#define DPRINTF(fmt, ...) \
>> +    do { } while (0)
>> +#endif
> 
> 
> DPRINTF also need to be renamed to avoid conflicting namespace issues.
Ji Alex,

OK.

As I am going to touch at traces,
- are you OK if I use the new .name field to simply format strings?

    DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
            vdev->host.bus, vdev->host.slot, vdev->host.function,
            'A' + vdev->intx.pin);
- Also Alex was suggesting to use trace points. What is your position
about that? Also I am not 100% sure of what it consists in? is it trace
events as documented in docs/tracing.txt

Thanks

Eric



> Thanks,
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device
  2014-08-12  2:41   ` David Gibson
@ 2014-08-12  6:54     ` Eric Auger
  2014-08-13  3:32       ` David Gibson
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-12  6:54 UTC (permalink / raw)
  To: David Gibson
  Cc: peter.maydell, kim.phillips, eric.auger, patches, joel.schopp,
	will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, agraf,
	alex.williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

On 08/12/2014 04:41 AM, David Gibson wrote:
> On Sat, Aug 09, 2014 at 03:25:44PM +0100, Eric Auger wrote:
>> vfio_get_device now takes a VFIODevice as argument. The function is split
>> into 4 functional parts: dev_info query, device check, region populate
>> and interrupt populate. the last 3 are specialized by parent device and
>> are added into DeviceOps.
> 
> Why is splitting these up into 4 stages useful, rather than having a
> single sub-class specific callback?

Hi David,

VFIOPlatformDevice already inherits from SysBusDevice and hence cannot
inherit from another VFIODevice. Same for VFIOPCIDevice that inherits
from PCIDevice. This is why I created this non QOM struct. But did you
mean something else?

Then splitting into 4: This was to share some code between platform and
PCI (dev_info query) and vfio_get_device was quite big already. I
thought it makes sense to split it into functional parts.

Best Regards

Eric
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-11  9:36   ` Alexander Graf
@ 2014-08-12  7:59     ` Bharat.Bhushan
  2014-08-12 16:34       ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: Bharat.Bhushan @ 2014-08-12  7:59 UTC (permalink / raw)
  To: Alexander Graf, Eric Auger, eric.auger, christoffer.dall,
	qemu-devel, Kim Phillips, a.rigo
  Cc: peter.maydell, patches, Kim Phillips, joel.schopp, will.deacon,
	Stuart Yoder, alex.williamson, a.motakis, kvmarm



> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Monday, August 11, 2014 3:06 PM
> To: Eric Auger; eric.auger@st.com; christoffer.dall@linaro.org; qemu-
> devel@nongnu.org; Phillips Kim-R1AAHA; a.rigo@virtualopensystems.com
> Cc: will.deacon@arm.com; kvmarm@lists.cs.columbia.edu;
> alex.williamson@redhat.com; Bhushan Bharat-R65777; peter.maydell@linaro.org;
> Yoder Stuart-B08248; a.motakis@virtualopensystems.com; patches@linaro.org;
> joel.schopp@amd.com; Kim Phillips
> Subject: Re: [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
> 
> 
> On 09.08.14 16:25, Eric Auger wrote:
> > Minimal VFIO platform implementation supporting
> > - register space user mapping,
> > - IRQ assignment based on eventfds handled on qemu side.
> >
> > irqfd kernel acceleration comes in a subsequent patch.
> >
> > Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> > Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >
> > ---
> >
> > v4 -> v5:
> > - vfio-plaform.h included first
> > - cleanup error handling in *populate*, vfio_get_device,
> >    vfio_enable_intp
> > - vfio_put_device not called anymore
> > - add some includes to follow vfio policy
> >
> > v3 -> v4:
> > [Eric Auger]
> > - merge of "vfio: Add initial IRQ support in platform device"
> >    to get a full functional patch although perfs are limited.
> > - removal of unrealize function since I currently understand
> >    it is only used with device hot-plug feature.
> >
> > v2 -> v3:
> > [Eric Auger]
> > - further factorization between PCI and platform (VFIORegion,
> >    VFIODevice). same level of functionality.
> >
> > <= v2:
> > [Kim Philipps]
> > - Initial Creation of the device supporting register space mapping
> > ---
> >   hw/vfio/Makefile.objs           |   1 +
> >   hw/vfio/platform.c              | 517
> ++++++++++++++++++++++++++++++++++++++++
> >   include/hw/vfio/vfio-platform.h |  77 ++++++
> >   3 files changed, 595 insertions(+)
> >   create mode 100644 hw/vfio/platform.c
> >   create mode 100644 include/hw/vfio/vfio-platform.h
> >
> > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > index e31f30e..c5c76fe 100644
> > --- a/hw/vfio/Makefile.objs
> > +++ b/hw/vfio/Makefile.objs
> > @@ -1,4 +1,5 @@
> >   ifeq ($(CONFIG_LINUX), y)
> >   obj-$(CONFIG_SOFTMMU) += common.o
> >   obj-$(CONFIG_PCI) += pci.o
> > +obj-$(CONFIG_SOFTMMU) += platform.o
> >   endif
> > diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> > new file mode 100644
> > index 0000000..f1a1b55
> > --- /dev/null
> > +++ b/hw/vfio/platform.c
> > @@ -0,0 +1,517 @@
> > +/*
> > + * vfio based device assignment support - platform devices
> > + *
> > + * Copyright Linaro Limited, 2014
> > + *
> > + * Authors:
> > + *  Kim Phillips <kim.phillips@linaro.org>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * Based on vfio based PCI device assignment support:
> > + *  Copyright Red Hat, Inc. 2012
> > + */
> > +
> > +#include <linux/vfio.h>
> > +#include <sys/ioctl.h>
> > +
> > +#include "hw/vfio/vfio-platform.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/range.h"
> > +#include "sysemu/sysemu.h"
> > +#include "exec/memory.h"
> > +#include "qemu/queue.h"
> > +#include "hw/sysbus.h"
> > +
> > +extern const MemoryRegionOps vfio_region_ops;
> > +extern const MemoryListener vfio_memory_listener;
> > +extern QLIST_HEAD(, VFIOGroup) group_list;
> > +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
> > +void vfio_put_device(VFIOPlatformDevice *vdev);
> > +
> > +/*
> > + * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
> > + * is not a QOM Object and cannot be passed to memory region functions
> > +*/
> > +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
> > +{
> > +    VFIORegion *region = vdev->regions[nr];
> > +    unsigned size = region->size;
> > +    char name[64];
> > +
> > +    if (!size) {
> > +        return;
> > +    }
> > +
> > +    snprintf(name, sizeof(name), "VFIO %s region %d",
> > +             vdev->vbasedev.name, nr);
> > +
> > +    /* A "slow" read/write mapping underlies all regions */
> > +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
> > +                          region, name, size);
> > +
> > +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
> > +
> > +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
> > +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
> > +        error_report("%s unsupported. Performance may be slow", name);
> > +    }
> > +}
> > +
> > +static void print_regions(VFIOPlatformDevice *vdev)
> > +{
> > +    int i;
> > +
> > +    DPRINTF("Device \"%s\" counts %d region(s):\n",
> > +             vdev->vbasedev.name, vdev->vbasedev.num_regions);
> > +
> > +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> > +        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
> > +                "fd= %d, offset = 0x%lx\n",
> > +                vdev->regions[i]->nr,
> > +                (unsigned long)vdev->regions[i]->flags,
> > +                (unsigned long)vdev->regions[i]->size,
> > +                vdev->regions[i]->vbasedev->fd,
> > +                (unsigned long)vdev->regions[i]->fd_offset);
> > +    }
> > +}
> > +
> > +static int vfio_populate_regions(VFIODevice *vbasedev)
> > +{
> > +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> > +    int i, ret = 0;
> > +    VFIOPlatformDevice *vdev =
> > +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> > +
> > +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
> > +
> > +    for (i = 0; i < vbasedev->num_regions; i++) {
> > +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
> > +        reg_info.index = i;
> > +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> > +        if (ret) {
> > +            error_report("vfio: Error getting region %d info: %m", i);
> > +            goto error;
> > +        }
> > +
> > +        vdev->regions[i]->flags = reg_info.flags;
> > +        vdev->regions[i]->size = reg_info.size;
> > +        vdev->regions[i]->fd_offset = reg_info.offset;
> > +        vdev->regions[i]->nr = i;
> > +        vdev->regions[i]->vbasedev = vbasedev;
> > +    }
> > +    print_regions(vdev);
> > +error:
> > +    return ret;
> > +}
> > +
> > +/* not implemented yet */
> > +static int vfio_platform_check_device(VFIODevice *vdev)
> > +{
> > +    return 0;
> > +}
> > +
> > +/* not implemented yet */
> > +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
> > +{
> > +return false;
> > +}
> > +
> > +/* not implemented yet */
> > +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
> > +{
> > +return 0;
> > +}
> > +
> > +/*
> > + * eoi function is called on the first access to any MMIO region
> > + * after an IRQ was triggered. It is assumed this access corresponds
> > + * to the IRQ status register reset.
> > + * With such a mechanism, a single IRQ can be handled at a time since
> > + * there is no way to know which IRQ was completed by the guest.
> > + * (we would need additional details about the IRQ status register mask)
> > + */
> > +static void vfio_platform_eoi(VFIODevice *vbasedev)
> > +{
> > +    VFIOINTp *intp;
> > +    VFIOPlatformDevice *vdev =
> > +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> > +
> > +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
> > +        if (intp->state == VFIO_IRQ_ACTIVE) {
> > +            DPRINTF("EOI IRQ #%d fd=%d\n",
> > +                    intp->pin, event_notifier_get_fd(&intp->interrupt));
> > +            intp->state = VFIO_IRQ_INACTIVE;
> > +
> > +            /* deassert the virtual IRQ and unmask physical one */
> > +            qemu_set_irq(intp->qemuirq, 0);
> > +            vfio_unmask_irqindex(vbasedev, intp->pin);
> > +
> > +            /* a single IRQ can be active at a time */
> > +            break;
> > +        }
> > +    }
> > +
> > +    /* in case there are pending IRQs, handle them one at a time */
> > +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
> > +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
> > +        vfio_intp_interrupt(intp);

We are calling vfio_intp_interrupt() with physical interrupt enabled, while there is a comment in vfio_intp_interrupt() which says physical interrupt is disabled by VFIO.

> > +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
> > +    }
> > +}
> > +
> > +/*
> > + * enable/disable the fast path mode
> > + * fast path = MMIO region is mmaped (no KVM TRAP)
> > + * slow path = MMIO region is trapped and region callbacks are called
> > + * slow path enables to trap the IRQ status register guest reset
> > +*/
> > +
> > +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
> > +{
> > +    VFIORegion *region;
> > +    int i;
> > +
> > +    DPRINTF("fast path = %d\n", enabled);
> > +
> > +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> > +        region = vdev->regions[i];
> > +
> > +        /* register space is unmapped to trap EOI */
> > +        memory_region_set_enabled(&region->mmap_mem, enabled);
> > +    }
> > +}
> > +
> > +/*
> > + * Checks whether the IRQ is still pending. In the negative
> > + * the fast path mode (where reg space is mmaped) can be restored.
> > + * if the IRQ is still pending, we must keep on trapping IRQ status
> > + * register reset with mmap disabled (slow path).
> > + * the function is called on mmap_timer event.
> > + * by construction a single fd is handled at a time. See EOI comment
> > + * for additional details.
> > + */
> > +static void vfio_intp_mmap_enable(void *opaque)
> > +{
> > +    VFIOINTp *tmp;
> > +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
> > +
> > +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> > +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> > +            DPRINTF("IRQ #%d still active, stay in slow path\n",
> > +                    tmp->pin);
> > +            timer_mod(vdev->mmap_timer,
> > +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > +                          vdev->mmap_timeout);
> > +            return;
> > +        }
> > +    }
> > +    DPRINTF("no active IRQ, restore fast path\n");
> > +    vfio_mmap_set_enabled(vdev, true);
> > +}
> > +
> > +/*
> > + * The fd handler
> > + */
> > +void vfio_intp_interrupt(void *opaque)
> > +{
> > +    int ret;
> > +    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
> > +    VFIOPlatformDevice *vdev = intp->vdev;
> > +    bool one_active_irq = false;
> > +
> > +    /*
> > +     * first check whether there is a pending IRQ
> > +     * in the positive the new IRQ cannot be handled until the
> > +     * active one is not completed.
> > +     * by construction the same IRQ as the pending one cannot hit
> > +     * since the physical IRQ was disabled by the VFIO driver
> > +     */

Here we assume physical interrupt disabled.

> > +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> > +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> > +            one_active_irq = true;
> > +            break;
> > +        }
> > +    }
> > +    if (one_active_irq) {
> > +        /*
> > +         * the new IRQ gets a pending status and is pushed in
> > +         * the pending queue
> > +         */
> > +        intp->state = VFIO_IRQ_PENDING;
> > +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
> > +                             intp, pqnext);
> > +        return;
> > +    }
> > +
> > +    /* no active IRQ, the new IRQ can be forwarded to the guest */
> > +    DPRINTF("Handle IRQ #%d (fd = %d)\n",
> > +            intp->pin, event_notifier_get_fd(&intp->interrupt));
> > +
> > +    ret = event_notifier_test_and_clear(&intp->interrupt);
> > +    if (!ret) {
> > +        DPRINTF("Error when clearing fd=%d\n",
> > +                event_notifier_get_fd(&intp->interrupt));
> > +    }
> > +
> > +    intp->state = VFIO_IRQ_ACTIVE;
> > +
> > +    /* sets slow path */
> > +    vfio_mmap_set_enabled(vdev, false);
> > +
> > +    /* trigger the virtual IRQ */
> > +    qemu_set_irq(intp->qemuirq, 1);
> > +
> > +    /* schedule the mmap timer which will restore mmap path after EOI*/
> > +    if (vdev->mmap_timeout) {
> > +        timer_mod(vdev->mmap_timer,
> > +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > +                      vdev->mmap_timeout);
> > +    }
> > +}
> > +
> > +static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
> > +{
> > +    struct vfio_irq_set *irq_set;
> > +    int32_t *pfd;
> > +    int ret, argsz;
> > +    int device = vbasedev->fd;
> > +    VFIOPlatformDevice *vdev =
> > +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> > +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
> > +    VFIOINTp *intp;
> > +
> > +    /* allocate and populate a new VFIOINTp structure put in a queue list */
> > +    intp = g_malloc0(sizeof(*intp));
> > +    intp->vdev = vdev;
> > +    intp->pin = index;
> > +    intp->state = VFIO_IRQ_INACTIVE;
> > +    sysbus_init_irq(sbdev, &intp->qemuirq);
> > +
> > +    ret = event_notifier_init(&intp->interrupt, 0);
> > +    if (ret) {
> > +        g_free(intp);
> > +        error_report("vfio: Error: event_notifier_init failed ");
> > +        return ret;
> > +    }
> > +
> > +    /* build the irq_set to be passed to the vfio kernel driver */
> > +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> > +
> > +    irq_set = g_malloc0(argsz);
> > +    irq_set->argsz = argsz;
> > +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
> > +    irq_set->index = index;
> > +    irq_set->start = 0;
> > +    irq_set->count = 1;
> > +    pfd = (int32_t *)&irq_set->data;
> > +
> > +    *pfd = event_notifier_get_fd(&intp->interrupt);
> > +
> > +    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
> > +
> > +    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
> > +
> > +    /*
> > +     * pass the index/fd binding to the kernel driver so that it
> > +     * triggers this fd on HW IRQ
> > +     */
> > +    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
> > +    g_free(irq_set);
> > +    if (ret) {
> > +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
> > +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
> > +        event_notifier_cleanup(&intp->interrupt);
> > +        return -errno;
> > +    }
> > +
> > +    /* store the new intp in qlist */
> > +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
> > +    return 0;
> > +}
> > +
> > +static int vfio_populate_interrupts(VFIODevice *vbasedev)
> > +{
> > +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> > +    int i, ret;
> > +    VFIOPlatformDevice *vdev =
> > +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> > +
> > +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> > +                                    vfio_intp_mmap_enable, vdev);
> > +
> > +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
> > +
> > +    for (i = 0; i < vbasedev->num_irqs; i++) {
> > +        irq.index = i;
> > +
> > +        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
> > +
> > +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> > +        if (ret) {
> > +            /* This can fail for an old kernel or legacy PCI dev */
> > +            error_printf("vfio: error getting device %s irq info",
> > +                         vbasedev->name);
> > +        } else {
> > +            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
> > +                    irq.index, irq.count, irq.flags);
> > +
> > +            ret = vfio_enable_intp(vbasedev, irq.index);
> > +            if (ret) {
> > +                error_report("vfio: Error setting IRQ %d up", i);
> > +                return ret;
> > +            }
> > +        }
> > +    }
> > +    return 0;
> > +}
> > +
> > +static VFIODeviceOps vfio_platform_ops = {
> > +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
> > +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
> > +    .vfio_eoi = vfio_platform_eoi,
> > +    .vfio_check_device = vfio_platform_check_device,
> > +    .vfio_populate_regions = vfio_populate_regions,
> > +    .vfio_populate_interrupts = vfio_populate_interrupts,
> > +};
> > +
> > +static int vfio_base_device_init(VFIODevice *vbasedev)
> > +{
> > +    VFIOGroup *group;
> > +    VFIODevice *vbasedev_iter;
> > +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
> > +    ssize_t len;
> > +    struct stat st;
> > +    int groupid;
> > +    int ret;
> > +
> > +    /* name must be set prior to the call */
> > +    if (!vbasedev->name) {
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* Check that the host device exists */
> > +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
> > +             vbasedev->name);
> > +
> > +    if (stat(path, &st) < 0) {
> > +        error_report("vfio: error: no such host device: %s", path);
> > +        return -errno;
> > +    }
> > +
> > +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
> > +    len = readlink(path, iommu_group_path, sizeof(path));
> > +    if (len <= 0 || len >= sizeof(path)) {
> > +        error_report("vfio: error no iommu_group for device");
> > +        return len < 0 ? -errno : ENAMETOOLONG;
> > +    }
> > +
> > +    iommu_group_path[len] = 0;
> > +    group_name = basename(iommu_group_path);
> > +
> > +    if (sscanf(group_name, "%d", &groupid) != 1) {
> > +        error_report("vfio: error reading %s: %m", path);
> > +        return -errno;
> > +    }
> > +
> > +    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
> > +
> > +    group = vfio_get_group(groupid, &address_space_memory);
> > +    if (!group) {
> > +        error_report("vfio: failed to get group %d", groupid);
> > +        return -ENOENT;
> > +    }
> > +
> > +    snprintf(path, sizeof(path), "%s", vbasedev->name);
> > +
> > +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> > +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
> > +            error_report("vfio: error: device %s is already attached", path);
> > +            vfio_put_group(group);
> > +            return -EBUSY;
> > +        }
> > +    }
> > +    ret = vfio_get_device(group, path, vbasedev);
> > +    if (ret) {
> > +        error_report("vfio: failed to get device %s", path);
> > +        vfio_put_group(group);
> > +    }
> > + return ret;
> > +}
> > +
> > +void vfio_put_device(VFIOPlatformDevice *vdev)
> > +{
> > +    unsigned int i;
> > +    VFIODevice *vbasedev = &vdev->vbasedev;
> > +
> > +    for (i = 0; i < vbasedev->num_regions; i++) {
> > +            g_free(vdev->regions[i]);
> > +    }
> > +    g_free(vdev->regions);
> > +    g_free(vdev->vbasedev.name);
> > +    vfio_put_base_device(&vdev->vbasedev);
> > +}
> > +
> > +static void vfio_platform_realize(DeviceState *dev, Error **errp)
> > +{
> > +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> > +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
> > +    VFIODevice *vbasedev = &vdev->vbasedev;
> > +    int i, ret;
> > +
> > +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
> > +    vbasedev->ops = &vfio_platform_ops;
> > +
> > +    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
> > +
> > +    ret = vfio_base_device_init(vbasedev);
> > +    if (ret) {
> > +        return;
> > +    }
> > +
> > +    for (i = 0; i < vbasedev->num_regions; i++) {
> > +        vfio_map_region(vdev, i);
> > +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
> > +    }
> > +}
> > +
> > +static const VMStateDescription vfio_platform_vmstate = {
> > +    .name = TYPE_VFIO_PLATFORM,
> > +    .unmigratable = 1,
> > +};
> > +
> > +static Property vfio_platform_dev_properties[] = {
> > +    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
> > +    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
> > +    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
> > +                       mmap_timeout, 1100),
> > +    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> > +    DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > +static void vfio_platform_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +
> > +    dc->realize = vfio_platform_realize;
> > +    dc->props = vfio_platform_dev_properties;
> > +    dc->vmsd = &vfio_platform_vmstate;
> > +    dc->desc = "VFIO-based platform device assignment";
> > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > +}
> > +
> > +static const TypeInfo vfio_platform_dev_info = {
> > +    .name = TYPE_VFIO_PLATFORM,
> > +    .parent = TYPE_SYS_BUS_DEVICE,
> > +    .instance_size = sizeof(VFIOPlatformDevice),
> > +    .class_init = vfio_platform_class_init,
> > +    .class_size = sizeof(VFIOPlatformDeviceClass),
> 
> This should be an abstract class. People must never instantiate a
> generic "vfio-platform" device. Only "vfio-xgmac", "vfio-etsec", etc
> devices should be exposed to the user.
> 
> 
> Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
  2014-08-12  7:59     ` Bharat.Bhushan
@ 2014-08-12 16:34       ` Eric Auger
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-12 16:34 UTC (permalink / raw)
  To: Bharat.Bhushan, Alexander Graf, eric.auger, christoffer.dall,
	qemu-devel, Kim Phillips, a.rigo
  Cc: peter.maydell, patches, Kim Phillips, joel.schopp, will.deacon,
	Stuart Yoder, alex.williamson, a.motakis, kvmarm

On 08/12/2014 09:59 AM, Bharat.Bhushan@freescale.com wrote:
> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Monday, August 11, 2014 3:06 PM
>> To: Eric Auger; eric.auger@st.com; christoffer.dall@linaro.org; qemu-
>> devel@nongnu.org; Phillips Kim-R1AAHA; a.rigo@virtualopensystems.com
>> Cc: will.deacon@arm.com; kvmarm@lists.cs.columbia.edu;
>> alex.williamson@redhat.com; Bhushan Bharat-R65777; peter.maydell@linaro.org;
>> Yoder Stuart-B08248; a.motakis@virtualopensystems.com; patches@linaro.org;
>> joel.schopp@amd.com; Kim Phillips
>> Subject: Re: [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support
>>
>>
>> On 09.08.14 16:25, Eric Auger wrote:
>>> Minimal VFIO platform implementation supporting
>>> - register space user mapping,
>>> - IRQ assignment based on eventfds handled on qemu side.
>>>
>>> irqfd kernel acceleration comes in a subsequent patch.
>>>
>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>>
>>> v4 -> v5:
>>> - vfio-plaform.h included first
>>> - cleanup error handling in *populate*, vfio_get_device,
>>>    vfio_enable_intp
>>> - vfio_put_device not called anymore
>>> - add some includes to follow vfio policy
>>>
>>> v3 -> v4:
>>> [Eric Auger]
>>> - merge of "vfio: Add initial IRQ support in platform device"
>>>    to get a full functional patch although perfs are limited.
>>> - removal of unrealize function since I currently understand
>>>    it is only used with device hot-plug feature.
>>>
>>> v2 -> v3:
>>> [Eric Auger]
>>> - further factorization between PCI and platform (VFIORegion,
>>>    VFIODevice). same level of functionality.
>>>
>>> <= v2:
>>> [Kim Philipps]
>>> - Initial Creation of the device supporting register space mapping
>>> ---
>>>   hw/vfio/Makefile.objs           |   1 +
>>>   hw/vfio/platform.c              | 517
>> ++++++++++++++++++++++++++++++++++++++++
>>>   include/hw/vfio/vfio-platform.h |  77 ++++++
>>>   3 files changed, 595 insertions(+)
>>>   create mode 100644 hw/vfio/platform.c
>>>   create mode 100644 include/hw/vfio/vfio-platform.h
>>>
>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>> index e31f30e..c5c76fe 100644
>>> --- a/hw/vfio/Makefile.objs
>>> +++ b/hw/vfio/Makefile.objs
>>> @@ -1,4 +1,5 @@
>>>   ifeq ($(CONFIG_LINUX), y)
>>>   obj-$(CONFIG_SOFTMMU) += common.o
>>>   obj-$(CONFIG_PCI) += pci.o
>>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>>   endif
>>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>>> new file mode 100644
>>> index 0000000..f1a1b55
>>> --- /dev/null
>>> +++ b/hw/vfio/platform.c
>>> @@ -0,0 +1,517 @@
>>> +/*
>>> + * vfio based device assignment support - platform devices
>>> + *
>>> + * Copyright Linaro Limited, 2014
>>> + *
>>> + * Authors:
>>> + *  Kim Phillips <kim.phillips@linaro.org>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>> + * the COPYING file in the top-level directory.
>>> + *
>>> + * Based on vfio based PCI device assignment support:
>>> + *  Copyright Red Hat, Inc. 2012
>>> + */
>>> +
>>> +#include <linux/vfio.h>
>>> +#include <sys/ioctl.h>
>>> +
>>> +#include "hw/vfio/vfio-platform.h"
>>> +#include "qemu/error-report.h"
>>> +#include "qemu/range.h"
>>> +#include "sysemu/sysemu.h"
>>> +#include "exec/memory.h"
>>> +#include "qemu/queue.h"
>>> +#include "hw/sysbus.h"
>>> +
>>> +extern const MemoryRegionOps vfio_region_ops;
>>> +extern const MemoryListener vfio_memory_listener;
>>> +extern QLIST_HEAD(, VFIOGroup) group_list;
>>> +extern QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces;
>>> +void vfio_put_device(VFIOPlatformDevice *vdev);
>>> +
>>> +/*
>>> + * It is mandatory to pass a VFIOPlatformDevice since VFIODevice
>>> + * is not a QOM Object and cannot be passed to memory region functions
>>> +*/
>>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
>>> +{
>>> +    VFIORegion *region = vdev->regions[nr];
>>> +    unsigned size = region->size;
>>> +    char name[64];
>>> +
>>> +    if (!size) {
>>> +        return;
>>> +    }
>>> +
>>> +    snprintf(name, sizeof(name), "VFIO %s region %d",
>>> +             vdev->vbasedev.name, nr);
>>> +
>>> +    /* A "slow" read/write mapping underlies all regions */
>>> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
>>> +                          region, name, size);
>>> +
>>> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>>> +
>>> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
>>> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
>>> +        error_report("%s unsupported. Performance may be slow", name);
>>> +    }
>>> +}
>>> +
>>> +static void print_regions(VFIOPlatformDevice *vdev)
>>> +{
>>> +    int i;
>>> +
>>> +    DPRINTF("Device \"%s\" counts %d region(s):\n",
>>> +             vdev->vbasedev.name, vdev->vbasedev.num_regions);
>>> +
>>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>>> +        DPRINTF("- region %d flags = 0x%lx, size = 0x%lx, "
>>> +                "fd= %d, offset = 0x%lx\n",
>>> +                vdev->regions[i]->nr,
>>> +                (unsigned long)vdev->regions[i]->flags,
>>> +                (unsigned long)vdev->regions[i]->size,
>>> +                vdev->regions[i]->vbasedev->fd,
>>> +                (unsigned long)vdev->regions[i]->fd_offset);
>>> +    }
>>> +}
>>> +
>>> +static int vfio_populate_regions(VFIODevice *vbasedev)
>>> +{
>>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> +    int i, ret = 0;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +
>>> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
>>> +        reg_info.index = i;
>>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>>> +        if (ret) {
>>> +            error_report("vfio: Error getting region %d info: %m", i);
>>> +            goto error;
>>> +        }
>>> +
>>> +        vdev->regions[i]->flags = reg_info.flags;
>>> +        vdev->regions[i]->size = reg_info.size;
>>> +        vdev->regions[i]->fd_offset = reg_info.offset;
>>> +        vdev->regions[i]->nr = i;
>>> +        vdev->regions[i]->vbasedev = vbasedev;
>>> +    }
>>> +    print_regions(vdev);
>>> +error:
>>> +    return ret;
>>> +}
>>> +
>>> +/* not implemented yet */
>>> +static int vfio_platform_check_device(VFIODevice *vdev)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +/* not implemented yet */
>>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
>>> +{
>>> +return false;
>>> +}
>>> +
>>> +/* not implemented yet */
>>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
>>> +{
>>> +return 0;
>>> +}
>>> +
>>> +/*
>>> + * eoi function is called on the first access to any MMIO region
>>> + * after an IRQ was triggered. It is assumed this access corresponds
>>> + * to the IRQ status register reset.
>>> + * With such a mechanism, a single IRQ can be handled at a time since
>>> + * there is no way to know which IRQ was completed by the guest.
>>> + * (we would need additional details about the IRQ status register mask)
>>> + */
>>> +static void vfio_platform_eoi(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOINTp *intp;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +
>>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>>> +        if (intp->state == VFIO_IRQ_ACTIVE) {
>>> +            DPRINTF("EOI IRQ #%d fd=%d\n",
>>> +                    intp->pin, event_notifier_get_fd(&intp->interrupt));
>>> +            intp->state = VFIO_IRQ_INACTIVE;
>>> +
>>> +            /* deassert the virtual IRQ and unmask physical one */
>>> +            qemu_set_irq(intp->qemuirq, 0);
>>> +            vfio_unmask_irqindex(vbasedev, intp->pin);
>>> +
>>> +            /* a single IRQ can be active at a time */
>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    /* in case there are pending IRQs, handle them one at a time */
>>> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
>>> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
>>> +        vfio_intp_interrupt(intp);
> 
> We are calling vfio_intp_interrupt() with physical interrupt enabled, while there is a comment in vfio_intp_interrupt() which says physical interrupt is disabled by VFIO.
Hi Bharat,

What I wanted to say is vfio_intp_interrupt cannot be called several
times, on the same IRQ, from eventfd handler while this IRQ is
pending/active (because VFIO driver unmasked that IRQ). I also call
vfio_intp_interrupt for handling pending IRQs - those who hit while the
virtual IRQ was active -, from MMIO handler. Nethertheless after more
careful review, I foresee 2 problems:
- need a lock in vfio_interrupt_intp. It can be called from the eventfd
handler and from the MMIO handler. I am not sure about the threading
model but I guess both can run concurrently with the risk several IRQs
get active at the same time, which is wrong (no way to detect which one
completes).
- I should not handle a new IRQ before pending ones are not handled.

I do not know whether multiple IRQ handling without irqfd support makes
much sense. However I need to fix that + clarify my comment indeed.

Thanks

Best Regards

Eric
> 
>>> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * enable/disable the fast path mode
>>> + * fast path = MMIO region is mmaped (no KVM TRAP)
>>> + * slow path = MMIO region is trapped and region callbacks are called
>>> + * slow path enables to trap the IRQ status register guest reset
>>> +*/
>>> +
>>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
>>> +{
>>> +    VFIORegion *region;
>>> +    int i;
>>> +
>>> +    DPRINTF("fast path = %d\n", enabled);
>>> +
>>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>>> +        region = vdev->regions[i];
>>> +
>>> +        /* register space is unmapped to trap EOI */
>>> +        memory_region_set_enabled(&region->mmap_mem, enabled);
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * Checks whether the IRQ is still pending. In the negative
>>> + * the fast path mode (where reg space is mmaped) can be restored.
>>> + * if the IRQ is still pending, we must keep on trapping IRQ status
>>> + * register reset with mmap disabled (slow path).
>>> + * the function is called on mmap_timer event.
>>> + * by construction a single fd is handled at a time. See EOI comment
>>> + * for additional details.
>>> + */
>>> +static void vfio_intp_mmap_enable(void *opaque)
>>> +{
>>> +    VFIOINTp *tmp;
>>> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
>>> +
>>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>>> +            DPRINTF("IRQ #%d still active, stay in slow path\n",
>>> +                    tmp->pin);
>>> +            timer_mod(vdev->mmap_timer,
>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>> +                          vdev->mmap_timeout);
>>> +            return;
>>> +        }
>>> +    }
>>> +    DPRINTF("no active IRQ, restore fast path\n");
>>> +    vfio_mmap_set_enabled(vdev, true);
>>> +}
>>> +
>>> +/*
>>> + * The fd handler
>>> + */
>>> +void vfio_intp_interrupt(void *opaque)
>>> +{
>>> +    int ret;
>>> +    VFIOINTp *tmp, *intp = (VFIOINTp *)opaque;
>>> +    VFIOPlatformDevice *vdev = intp->vdev;
>>> +    bool one_active_irq = false;
>>> +
>>> +    /*
>>> +     * first check whether there is a pending IRQ
>>> +     * in the positive the new IRQ cannot be handled until the
>>> +     * active one is not completed.
>>> +     * by construction the same IRQ as the pending one cannot hit
>>> +     * since the physical IRQ was disabled by the VFIO driver
>>> +     */
> 
> Here we assume physical interrupt disabled.
> 
>>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>>> +            one_active_irq = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +    if (one_active_irq) {
>>> +        /*
>>> +         * the new IRQ gets a pending status and is pushed in
>>> +         * the pending queue
>>> +         */
>>> +        intp->state = VFIO_IRQ_PENDING;
>>> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
>>> +                             intp, pqnext);
>>> +        return;
>>> +    }
>>> +
>>> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
>>> +    DPRINTF("Handle IRQ #%d (fd = %d)\n",
>>> +            intp->pin, event_notifier_get_fd(&intp->interrupt));
>>> +
>>> +    ret = event_notifier_test_and_clear(&intp->interrupt);
>>> +    if (!ret) {
>>> +        DPRINTF("Error when clearing fd=%d\n",
>>> +                event_notifier_get_fd(&intp->interrupt));
>>> +    }
>>> +
>>> +    intp->state = VFIO_IRQ_ACTIVE;
>>> +
>>> +    /* sets slow path */
>>> +    vfio_mmap_set_enabled(vdev, false);
>>> +
>>> +    /* trigger the virtual IRQ */
>>> +    qemu_set_irq(intp->qemuirq, 1);
>>> +
>>> +    /* schedule the mmap timer which will restore mmap path after EOI*/
>>> +    if (vdev->mmap_timeout) {
>>> +        timer_mod(vdev->mmap_timer,
>>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>> +                      vdev->mmap_timeout);
>>> +    }
>>> +}
>>> +
>>> +static int vfio_enable_intp(VFIODevice *vbasedev, unsigned int index)
>>> +{
>>> +    struct vfio_irq_set *irq_set;
>>> +    int32_t *pfd;
>>> +    int ret, argsz;
>>> +    int device = vbasedev->fd;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
>>> +    VFIOINTp *intp;
>>> +
>>> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
>>> +    intp = g_malloc0(sizeof(*intp));
>>> +    intp->vdev = vdev;
>>> +    intp->pin = index;
>>> +    intp->state = VFIO_IRQ_INACTIVE;
>>> +    sysbus_init_irq(sbdev, &intp->qemuirq);
>>> +
>>> +    ret = event_notifier_init(&intp->interrupt, 0);
>>> +    if (ret) {
>>> +        g_free(intp);
>>> +        error_report("vfio: Error: event_notifier_init failed ");
>>> +        return ret;
>>> +    }
>>> +
>>> +    /* build the irq_set to be passed to the vfio kernel driver */
>>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +
>>> +    irq_set = g_malloc0(argsz);
>>> +    irq_set->argsz = argsz;
>>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
>>> +    irq_set->index = index;
>>> +    irq_set->start = 0;
>>> +    irq_set->count = 1;
>>> +    pfd = (int32_t *)&irq_set->data;
>>> +
>>> +    *pfd = event_notifier_get_fd(&intp->interrupt);
>>> +
>>> +    DPRINTF("register fd=%d/irq index=%d to kernel\n", *pfd, index);
>>> +
>>> +    qemu_set_fd_handler(*pfd, vfio_intp_interrupt, NULL, intp);
>>> +
>>> +    /*
>>> +     * pass the index/fd binding to the kernel driver so that it
>>> +     * triggers this fd on HW IRQ
>>> +     */
>>> +    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_set);
>>> +    g_free(irq_set);
>>> +    if (ret) {
>>> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
>>> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
>>> +        event_notifier_cleanup(&intp->interrupt);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    /* store the new intp in qlist */
>>> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>>> +    return 0;
>>> +}
>>> +
>>> +static int vfio_populate_interrupts(VFIODevice *vbasedev)
>>> +{
>>> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
>>> +    int i, ret;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +
>>> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>>> +                                    vfio_intp_mmap_enable, vdev);
>>> +
>>> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>>> +
>>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>>> +        irq.index = i;
>>> +
>>> +        DPRINTF("Retrieve IRQ info from vfio platform driver ...\n");
>>> +
>>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>>> +        if (ret) {
>>> +            /* This can fail for an old kernel or legacy PCI dev */
>>> +            error_printf("vfio: error getting device %s irq info",
>>> +                         vbasedev->name);
>>> +        } else {
>>> +            DPRINTF("- IRQ index %d: count %d, flags=0x%x\n",
>>> +                    irq.index, irq.count, irq.flags);
>>> +
>>> +            ret = vfio_enable_intp(vbasedev, irq.index);
>>> +            if (ret) {
>>> +                error_report("vfio: Error setting IRQ %d up", i);
>>> +                return ret;
>>> +            }
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +static VFIODeviceOps vfio_platform_ops = {
>>> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
>>> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
>>> +    .vfio_eoi = vfio_platform_eoi,
>>> +    .vfio_check_device = vfio_platform_check_device,
>>> +    .vfio_populate_regions = vfio_populate_regions,
>>> +    .vfio_populate_interrupts = vfio_populate_interrupts,
>>> +};
>>> +
>>> +static int vfio_base_device_init(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOGroup *group;
>>> +    VFIODevice *vbasedev_iter;
>>> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>>> +    ssize_t len;
>>> +    struct stat st;
>>> +    int groupid;
>>> +    int ret;
>>> +
>>> +    /* name must be set prior to the call */
>>> +    if (!vbasedev->name) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /* Check that the host device exists */
>>> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
>>> +             vbasedev->name);
>>> +
>>> +    if (stat(path, &st) < 0) {
>>> +        error_report("vfio: error: no such host device: %s", path);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>>> +    len = readlink(path, iommu_group_path, sizeof(path));
>>> +    if (len <= 0 || len >= sizeof(path)) {
>>> +        error_report("vfio: error no iommu_group for device");
>>> +        return len < 0 ? -errno : ENAMETOOLONG;
>>> +    }
>>> +
>>> +    iommu_group_path[len] = 0;
>>> +    group_name = basename(iommu_group_path);
>>> +
>>> +    if (sscanf(group_name, "%d", &groupid) != 1) {
>>> +        error_report("vfio: error reading %s: %m", path);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    DPRINTF("%s(%s) group %d\n", __func__, vbasedev->name, groupid);
>>> +
>>> +    group = vfio_get_group(groupid, &address_space_memory);
>>> +    if (!group) {
>>> +        error_report("vfio: failed to get group %d", groupid);
>>> +        return -ENOENT;
>>> +    }
>>> +
>>> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
>>> +
>>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>>> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>>> +            error_report("vfio: error: device %s is already attached", path);
>>> +            vfio_put_group(group);
>>> +            return -EBUSY;
>>> +        }
>>> +    }
>>> +    ret = vfio_get_device(group, path, vbasedev);
>>> +    if (ret) {
>>> +        error_report("vfio: failed to get device %s", path);
>>> +        vfio_put_group(group);
>>> +    }
>>> + return ret;
>>> +}
>>> +
>>> +void vfio_put_device(VFIOPlatformDevice *vdev)
>>> +{
>>> +    unsigned int i;
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +            g_free(vdev->regions[i]);
>>> +    }
>>> +    g_free(vdev->regions);
>>> +    g_free(vdev->vbasedev.name);
>>> +    vfio_put_base_device(&vdev->vbasedev);
>>> +}
>>> +
>>> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
>>> +{
>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
>>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +    int i, ret;
>>> +
>>> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
>>> +    vbasedev->ops = &vfio_platform_ops;
>>> +
>>> +    DPRINTF("vfio device %s, compat = %s\n", vbasedev->name, vdev->compat);
>>> +
>>> +    ret = vfio_base_device_init(vbasedev);
>>> +    if (ret) {
>>> +        return;
>>> +    }
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        vfio_map_region(vdev, i);
>>> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
>>> +    }
>>> +}
>>> +
>>> +static const VMStateDescription vfio_platform_vmstate = {
>>> +    .name = TYPE_VFIO_PLATFORM,
>>> +    .unmigratable = 1,
>>> +};
>>> +
>>> +static Property vfio_platform_dev_properties[] = {
>>> +    DEFINE_PROP_STRING("vfio_device", VFIOPlatformDevice, vbasedev.name),
>>> +    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
>>> +    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
>>> +                       mmap_timeout, 1100),
>>> +    DEFINE_PROP_BOOL("irqfd", VFIOPlatformDevice, irqfd_allowed, true),
>>> +    DEFINE_PROP_END_OF_LIST(),
>>> +};
>>> +
>>> +static void vfio_platform_class_init(ObjectClass *klass, void *data)
>>> +{
>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>> +
>>> +    dc->realize = vfio_platform_realize;
>>> +    dc->props = vfio_platform_dev_properties;
>>> +    dc->vmsd = &vfio_platform_vmstate;
>>> +    dc->desc = "VFIO-based platform device assignment";
>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>> +}
>>> +
>>> +static const TypeInfo vfio_platform_dev_info = {
>>> +    .name = TYPE_VFIO_PLATFORM,
>>> +    .parent = TYPE_SYS_BUS_DEVICE,
>>> +    .instance_size = sizeof(VFIOPlatformDevice),
>>> +    .class_init = vfio_platform_class_init,
>>> +    .class_size = sizeof(VFIOPlatformDeviceClass),
>>
>> This should be an abstract class. People must never instantiate a
>> generic "vfio-platform" device. Only "vfio-xgmac", "vfio-etsec", etc
>> devices should be exposed to the user.
>>
>>
>> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device
  2014-08-12  6:54     ` Eric Auger
@ 2014-08-13  3:32       ` David Gibson
  2014-08-29 10:00         ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: David Gibson @ 2014-08-13  3:32 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kim.phillips, eric.auger, joel.schopp, patches,
	will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, agraf,
	alex.williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On Tue, Aug 12, 2014 at 08:54:34AM +0200, Eric Auger wrote:
> On 08/12/2014 04:41 AM, David Gibson wrote:
> > On Sat, Aug 09, 2014 at 03:25:44PM +0100, Eric Auger wrote:
> >> vfio_get_device now takes a VFIODevice as argument. The function is split
> >> into 4 functional parts: dev_info query, device check, region populate
> >> and interrupt populate. the last 3 are specialized by parent device and
> >> are added into DeviceOps.
> > 
> > Why is splitting these up into 4 stages useful, rather than having a
> > single sub-class specific callback?
> 
> Hi David,
> 
> VFIOPlatformDevice already inherits from SysBusDevice and hence cannot
> inherit from another VFIODevice. Same for VFIOPCIDevice that inherits
> from PCIDevice. This is why I created this non QOM struct. But did you
> mean something else?

Ah, yes, sorry, I missed that, though it's obvious now I think about
it.

> Then splitting into 4: This was to share some code between platform and
> PCI (dev_info query) and vfio_get_device was quite big already. I
> thought it makes sense to split it into functional parts.

Hm, ok.  So splitting out dev_info_query certainly makes sense then.
But does splitting the two populate sections make sense?  Is it
plausible that two different VFIO capable busses would share one of
these functions but not the other?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-12  6:09     ` Eric Auger
@ 2014-08-13 19:59       ` Alex Williamson
  2014-09-01 16:31         ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2014-08-13 19:59 UTC (permalink / raw)
  To: Eric Auger
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On Tue, 2014-08-12 at 08:09 +0200, Eric Auger wrote:
> On 08/11/2014 09:25 PM, Alex Williamson wrote:
> > On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> new file mode 100644
> >> index 0000000..4684ee5
> >> --- /dev/null
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -0,0 +1,151 @@
> >> +/*
> >> + * common header for vfio based device assignment support
> >> + *
> >> + * Copyright Red Hat, Inc. 2012
> >> + *
> >> + * Authors:
> >> + *  Alex Williamson <alex.williamson@redhat.com>
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> >> + * the COPYING file in the top-level directory.
> >> + *
> >> + * Based on qemu-kvm device-assignment:
> >> + *  Adapted for KVM by Qumranet.
> >> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> >> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> >> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> >> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> >> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> >> + */
> >> +#ifndef HW_VFIO_VFIO_COMMON_H
> >> +#define HW_VFIO_VFIO_COMMON_H
> >> +
> >> +#include "qemu-common.h"
> >> +#include "exec/address-spaces.h"
> >> +#include "exec/memory.h"
> >> +#include "qemu/queue.h"
> >> +#include "qemu/notify.h"
> >> +
> >> +/*#define DEBUG_VFIO*/
> >> +#ifdef DEBUG_VFIO
> >> +#define DPRINTF(fmt, ...) \
> >> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> >> +#else
> >> +#define DPRINTF(fmt, ...) \
> >> +    do { } while (0)
> >> +#endif
> > 
> > 
> > DPRINTF also need to be renamed to avoid conflicting namespace issues.
> Ji Alex,
> 
> OK.
> 
> As I am going to touch at traces,
> - are you OK if I use the new .name field to simply format strings?

Sure, that's fine.

>     DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
>             vdev->host.bus, vdev->host.slot, vdev->host.function,
>             'A' + vdev->intx.pin);
> - Also Alex was suggesting to use trace points. What is your position
> about that? Also I am not 100% sure of what it consists in? is it trace
> events as documented in docs/tracing.txt

I think it would be a great conversion, but it's not required.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation Eric Auger
  2014-08-11  9:40   ` Alexander Graf
@ 2014-08-18 21:54   ` Joel Schopp
  2014-08-18 22:11     ` Peter Maydell
  2014-08-19  6:59     ` Eric Auger
  1 sibling, 2 replies; 50+ messages in thread
From: Joel Schopp @ 2014-08-18 21:54 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, will.deacon, agraf, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


+static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformDevtreeData *data = opaque;
+    void *fdt = data->fdt;
+    const char *parent_node = data->node;
+    int compat_str_len;
+    char *nodename;
+    int i, ret;
+    uint32_t *irq_attr;
+    uint64_t *reg_attr;
+    uint64_t mmio_base;
+    uint64_t irq_number;
+    gchar mmio_base_prop[8];
+    gchar irq_number_prop[8];
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    Object *obj = OBJECT(sbdev);
+
+    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+                               vbasedev->name,
+                               mmio_base);
+
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    compat_str_len = strlen(vdev->compat) + 1;
+    qemu_fdt_setprop(fdt, nodename, "compatible",
+                            vdev->compat, compat_str_len);
+
+    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+        reg_attr[2*i] = 1;
+        reg_attr[2*i+1] = mmio_base;
+        reg_attr[2*i+2] = 1;
+        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
+    }

This should be 4 instead of 2. 
Also, to support 64 bit systems I think this should be 2 instead of 1.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 21:54   ` Joel Schopp
@ 2014-08-18 22:11     ` Peter Maydell
  2014-08-18 22:26       ` Joel Schopp
  2014-08-19  7:24       ` Eric Auger
  2014-08-19  6:59     ` Eric Auger
  1 sibling, 2 replies; 50+ messages in thread
From: Peter Maydell @ 2014-08-18 22:11 UTC (permalink / raw)
  To: Joel Schopp
  Cc: Alexander Graf, Kim Phillips, eric.auger, Eric Auger,
	Patch Tracking, Will Deacon, Alvise Rigo, QEMU Developers,
	Bharat Bhushan, Alex Williamson, Stuart Yoder, Antonios Motakis,
	kvmarm, Christoffer Dall

On 18 August 2014 22:54, Joel Schopp <joel.schopp@amd.com> wrote:
>
> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
> +{
> +    PlatformDevtreeData *data = opaque;
> +    void *fdt = data->fdt;
> +    const char *parent_node = data->node;
> +    int compat_str_len;
> +    char *nodename;
> +    int i, ret;
> +    uint32_t *irq_attr;
> +    uint64_t *reg_attr;
> +    uint64_t mmio_base;
> +    uint64_t irq_number;
> +    gchar mmio_base_prop[8];
> +    gchar irq_number_prop[8];
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    Object *obj = OBJECT(sbdev);
> +
> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
> +
> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +                               vbasedev->name,
> +                               mmio_base);
> +
> +    qemu_fdt_add_subnode(fdt, nodename);
> +
> +    compat_str_len = strlen(vdev->compat) + 1;

At this point you've already substituted the NULs in,
so you can't call strlen(), I think.

> +    qemu_fdt_setprop(fdt, nodename, "compatible",
> +                            vdev->compat, compat_str_len);
> +
> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
> +        reg_attr[2*i] = 1;
> +        reg_attr[2*i+1] = mmio_base;
> +        reg_attr[2*i+2] = 1;
> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
> +    }
>
> This should be 4 instead of 2.
> Also, to support 64 bit systems I think this should be 2 instead of 1.

Actually it depends entirely on what the board has done to
create the device tree node that we're inserting this child
node into. For ARM boot.c sets both #address-cells and
#size-cells to 2 regardless of whether the system is 32
or 64 bits, for simplicity. I imagine PPC does something
different. If we're editing a dtb that the user passed in (which
I think would be pretty lunatic so we shouldn't do this)
we'd have to actually walk the dtb to try to figure out what
the semantics of the reg property should be.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 22:11     ` Peter Maydell
@ 2014-08-18 22:26       ` Joel Schopp
  2014-08-19  7:32         ` Eric Auger
  2014-08-19 10:59         ` Alexander Graf
  2014-08-19  7:24       ` Eric Auger
  1 sibling, 2 replies; 50+ messages in thread
From: Joel Schopp @ 2014-08-18 22:26 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alexander Graf, Kim Phillips, eric.auger, Eric Auger,
	Patch Tracking, Will Deacon, Alvise Rigo, QEMU Developers,
	Bharat Bhushan, Alex Williamson, Stuart Yoder, Antonios Motakis,
	kvmarm, Christoffer Dall


On 08/18/2014 05:11 PM, Peter Maydell wrote:
> On 18 August 2014 22:54, Joel Schopp <joel.schopp@amd.com> wrote:
>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    PlatformDevtreeData *data = opaque;
>> +    void *fdt = data->fdt;
>> +    const char *parent_node = data->node;
>> +    int compat_str_len;
>> +    char *nodename;
>> +    int i, ret;
>> +    uint32_t *irq_attr;
>> +    uint64_t *reg_attr;
>> +    uint64_t mmio_base;
>> +    uint64_t irq_number;
>> +    gchar mmio_base_prop[8];
>> +    gchar irq_number_prop[8];
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    Object *obj = OBJECT(sbdev);
>> +
>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>> +
>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>> +                               vbasedev->name,
>> +                               mmio_base);
>> +
>> +    qemu_fdt_add_subnode(fdt, nodename);
>> +
>> +    compat_str_len = strlen(vdev->compat) + 1;
> At this point you've already substituted the NULs in,
> so you can't call strlen(), I think.
>
>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>> +                            vdev->compat, compat_str_len);
>> +
>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>> +        reg_attr[2*i] = 1;
>> +        reg_attr[2*i+1] = mmio_base;
>> +        reg_attr[2*i+2] = 1;
>> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>> +    }
>>
>> This should be 4 instead of 2.
>> Also, to support 64 bit systems I think this should be 2 instead of 1.
> Actually it depends entirely on what the board has done to
> create the device tree node that we're inserting this child
> node into. For ARM boot.c sets both #address-cells and
> #size-cells to 2 regardless of whether the system is 32
> or 64 bits, for simplicity. I imagine PPC does something
> different. If we're editing a dtb that the user passed in (which
> I think would be pretty lunatic so we shouldn't do this)
> we'd have to actually walk the dtb to try to figure out what
> the semantics of the reg property should be.
For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.

I think you are right on the size.  I also wonder if the user doesn't
pass in a dtb if qemu should try to recreate the device-tree entry from
the platform device entry in the host kernel?  If so would that best be
done by recreating the values from /proc/device-tree ?

I also wish that qemu had a flag to output the generated dtb to a file
much like lkvm (kvmtool) has.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 21:54   ` Joel Schopp
  2014-08-18 22:11     ` Peter Maydell
@ 2014-08-19  6:59     ` Eric Auger
  1 sibling, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-19  6:59 UTC (permalink / raw)
  To: Joel Schopp, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, will.deacon, agraf, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 08/18/2014 11:54 PM, Joel Schopp wrote:
> 
> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
> +{
> +    PlatformDevtreeData *data = opaque;
> +    void *fdt = data->fdt;
> +    const char *parent_node = data->node;
> +    int compat_str_len;
> +    char *nodename;
> +    int i, ret;
> +    uint32_t *irq_attr;
> +    uint64_t *reg_attr;
> +    uint64_t mmio_base;
> +    uint64_t irq_number;
> +    gchar mmio_base_prop[8];
> +    gchar irq_number_prop[8];
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    Object *obj = OBJECT(sbdev);
> +
> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
> +
> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +                               vbasedev->name,
> +                               mmio_base);
> +
> +    qemu_fdt_add_subnode(fdt, nodename);
> +
> +    compat_str_len = strlen(vdev->compat) + 1;
> +    qemu_fdt_setprop(fdt, nodename, "compatible",
> +                            vdev->compat, compat_str_len);
> +
> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
> +        reg_attr[2*i] = 1;
> +        reg_attr[2*i+1] = mmio_base;
> +        reg_attr[2*i+2] = 1;
> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
> +    }
> 
> This should be 4 instead of 2. 
Hi Joel,

Yes definitively! Forgot to restore the original value after trying
different qemu_fdt_setprop_* functions. sorry for that.

Best Regards

Eric
> Also, to support 64 bit systems I think this should be 2 instead of 1.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 22:11     ` Peter Maydell
  2014-08-18 22:26       ` Joel Schopp
@ 2014-08-19  7:24       ` Eric Auger
  2014-08-19  8:17         ` Peter Maydell
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-08-19  7:24 UTC (permalink / raw)
  To: Peter Maydell, Joel Schopp
  Cc: Alexander Graf, Kim Phillips, eric.auger, Patch Tracking,
	Will Deacon, QEMU Developers, Alvise Rigo, Bharat Bhushan,
	Alex Williamson, Stuart Yoder, Antonios Motakis, kvmarm,
	Christoffer Dall

On 08/19/2014 12:11 AM, Peter Maydell wrote:
> On 18 August 2014 22:54, Joel Schopp <joel.schopp@amd.com> wrote:
>>
>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    PlatformDevtreeData *data = opaque;
>> +    void *fdt = data->fdt;
>> +    const char *parent_node = data->node;
>> +    int compat_str_len;
>> +    char *nodename;
>> +    int i, ret;
>> +    uint32_t *irq_attr;
>> +    uint64_t *reg_attr;
>> +    uint64_t mmio_base;
>> +    uint64_t irq_number;
>> +    gchar mmio_base_prop[8];
>> +    gchar irq_number_prop[8];
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    Object *obj = OBJECT(sbdev);
>> +
>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>> +
>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>> +                               vbasedev->name,
>> +                               mmio_base);
>> +
>> +    qemu_fdt_add_subnode(fdt, nodename);
>> +
>> +    compat_str_len = strlen(vdev->compat) + 1;
> 
> At this point you've already substituted the NULs in,
> so you can't call strlen(), I think.
Hi Peter,

yes you're right. Thanks
> 
>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>> +                            vdev->compat, compat_str_len);
>> +
>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>> +        reg_attr[2*i] = 1;
>> +        reg_attr[2*i+1] = mmio_base;
>> +        reg_attr[2*i+2] = 1;
>> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>> +    }
>>
>> This should be 4 instead of 2.
>> Also, to support 64 bit systems I think this should be 2 instead of 1.
> 
> Actually it depends entirely on what the board has done to
> create the device tree node that we're inserting this child
> node into. For ARM boot.c sets both #address-cells and
> #size-cells to 2 regardless of whether the system is 32
> or 64 bits, for simplicity. I imagine PPC does something
> different. If we're editing a dtb that the user passed in (which
> I think would be pretty lunatic so we shouldn't do this)
> we'd have to actually walk the dtb to try to figure out what
> the semantics of the reg property should be.

Putting size=1 was the only solution I found to use an offset relative
to the parent bus instead of an absolute base address. I would explain
this because, in platform_bus_create_devtree, the function that creates
the "platform bus" node, #address-cells and #size-cells currently are
set to 1. I assume the motivation was that bus size was supposed to be
smaller than 4GB. Then I guess the problem is shifted to the inclusion
of the platform bus in any ARM platform.

Thanks

Eric
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 22:26       ` Joel Schopp
@ 2014-08-19  7:32         ` Eric Auger
  2014-08-19 10:59         ` Alexander Graf
  1 sibling, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-19  7:32 UTC (permalink / raw)
  To: Joel Schopp, Peter Maydell
  Cc: Alexander Graf, Kim Phillips, eric.auger, Patch Tracking,
	Will Deacon, QEMU Developers, Alvise Rigo, Bharat Bhushan,
	Alex Williamson, Stuart Yoder, Antonios Motakis, kvmarm,
	Christoffer Dall

On 08/19/2014 12:26 AM, Joel Schopp wrote:
> 
> On 08/18/2014 05:11 PM, Peter Maydell wrote:
>> On 18 August 2014 22:54, Joel Schopp <joel.schopp@amd.com> wrote:
>>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +    PlatformDevtreeData *data = opaque;
>>> +    void *fdt = data->fdt;
>>> +    const char *parent_node = data->node;
>>> +    int compat_str_len;
>>> +    char *nodename;
>>> +    int i, ret;
>>> +    uint32_t *irq_attr;
>>> +    uint64_t *reg_attr;
>>> +    uint64_t mmio_base;
>>> +    uint64_t irq_number;
>>> +    gchar mmio_base_prop[8];
>>> +    gchar irq_number_prop[8];
>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +    Object *obj = OBJECT(sbdev);
>>> +
>>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>>> +
>>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>>> +                               vbasedev->name,
>>> +                               mmio_base);
>>> +
>>> +    qemu_fdt_add_subnode(fdt, nodename);
>>> +
>>> +    compat_str_len = strlen(vdev->compat) + 1;
>> At this point you've already substituted the NULs in,
>> so you can't call strlen(), I think.
>>
>>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>>> +                            vdev->compat, compat_str_len);
>>> +
>>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>>> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>>> +        reg_attr[2*i] = 1;
>>> +        reg_attr[2*i+1] = mmio_base;
>>> +        reg_attr[2*i+2] = 1;
>>> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>>> +    }
>>>
>>> This should be 4 instead of 2.
>>> Also, to support 64 bit systems I think this should be 2 instead of 1.
>> Actually it depends entirely on what the board has done to
>> create the device tree node that we're inserting this child
>> node into. For ARM boot.c sets both #address-cells and
>> #size-cells to 2 regardless of whether the system is 32
>> or 64 bits, for simplicity. I imagine PPC does something
>> different. If we're editing a dtb that the user passed in (which
>> I think would be pretty lunatic so we shouldn't do this)
>> we'd have to actually walk the dtb to try to figure out what
>> the semantics of the reg property should be.
> For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
> overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.
> 
> I think you are right on the size.  I also wonder if the user doesn't
> pass in a dtb if qemu should try to recreate the device-tree entry from
> the platform device entry in the host kernel?  If so would that best be
> done by recreating the values from /proc/device-tree ?
Antonios recently submitted a patch to retrieve dt info from the vfio
platform device.
[RFC 0/4] VFIO: PLATFORM: Return device tree info for a platform device node
https://www.mail-archive.com/kvm@vger.kernel.org/msg106282.html

Best Regards

Eric
> 
> I also wish that qemu had a flag to output the generated dtb to a file
> much like lkvm (kvmtool) has.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-19  7:24       ` Eric Auger
@ 2014-08-19  8:17         ` Peter Maydell
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Maydell @ 2014-08-19  8:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: Joel Schopp, Kim Phillips, eric.auger, Patch Tracking,
	Alexander Graf, Will Deacon, QEMU Developers, Alvise Rigo,
	Bharat Bhushan, Alex Williamson, Stuart Yoder, Antonios Motakis,
	kvmarm, Christoffer Dall

On 19 August 2014 08:24, Eric Auger <eric.auger@linaro.org> wrote:
> Putting size=1 was the only solution I found to use an offset relative
> to the parent bus instead of an absolute base address. I would explain
> this because, in platform_bus_create_devtree, the function that creates
> the "platform bus" node, #address-cells and #size-cells currently are
> set to 1. I assume the motivation was that bus size was supposed to be
> smaller than 4GB. Then I guess the problem is shifted to the inclusion
> of the platform bus in any ARM platform.

Ah, I see. Yes, if the containing node is setting addr/size to 1
then 1 is correct, and the limitation then is just the 4GB max.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-18 22:26       ` Joel Schopp
  2014-08-19  7:32         ` Eric Auger
@ 2014-08-19 10:59         ` Alexander Graf
  2014-08-19 14:15           ` Joel Schopp
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-08-19 10:59 UTC (permalink / raw)
  To: Joel Schopp, Peter Maydell
  Cc: Kim Phillips, eric.auger, Eric Auger, Patch Tracking,
	Will Deacon, Alvise Rigo, QEMU Developers, Bharat Bhushan,
	Alex Williamson, Stuart Yoder, Antonios Motakis, kvmarm,
	Christoffer Dall



On 19.08.14 00:26, Joel Schopp wrote:
> 
> On 08/18/2014 05:11 PM, Peter Maydell wrote:
>> On 18 August 2014 22:54, Joel Schopp <joel.schopp@amd.com> wrote:
>>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +    PlatformDevtreeData *data = opaque;
>>> +    void *fdt = data->fdt;
>>> +    const char *parent_node = data->node;
>>> +    int compat_str_len;
>>> +    char *nodename;
>>> +    int i, ret;
>>> +    uint32_t *irq_attr;
>>> +    uint64_t *reg_attr;
>>> +    uint64_t mmio_base;
>>> +    uint64_t irq_number;
>>> +    gchar mmio_base_prop[8];
>>> +    gchar irq_number_prop[8];
>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +    Object *obj = OBJECT(sbdev);
>>> +
>>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>>> +
>>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>>> +                               vbasedev->name,
>>> +                               mmio_base);
>>> +
>>> +    qemu_fdt_add_subnode(fdt, nodename);
>>> +
>>> +    compat_str_len = strlen(vdev->compat) + 1;
>> At this point you've already substituted the NULs in,
>> so you can't call strlen(), I think.
>>
>>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>>> +                            vdev->compat, compat_str_len);
>>> +
>>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>>> +        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>>> +        reg_attr[2*i] = 1;
>>> +        reg_attr[2*i+1] = mmio_base;
>>> +        reg_attr[2*i+2] = 1;
>>> +        reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>>> +    }
>>>
>>> This should be 4 instead of 2.
>>> Also, to support 64 bit systems I think this should be 2 instead of 1.
>> Actually it depends entirely on what the board has done to
>> create the device tree node that we're inserting this child
>> node into. For ARM boot.c sets both #address-cells and
>> #size-cells to 2 regardless of whether the system is 32
>> or 64 bits, for simplicity. I imagine PPC does something
>> different. If we're editing a dtb that the user passed in (which
>> I think would be pretty lunatic so we shouldn't do this)
>> we'd have to actually walk the dtb to try to figure out what
>> the semantics of the reg property should be.
> For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
> overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.
> 
> I think you are right on the size.  I also wonder if the user doesn't
> pass in a dtb if qemu should try to recreate the device-tree entry from
> the platform device entry in the host kernel?  If so would that best be
> done by recreating the values from /proc/device-tree ?
> 
> I also wish that qemu had a flag to output the generated dtb to a file
> much like lkvm (kvmtool) has.

It does. "qemu-system-foo -machine dumpdtb=mydtb.dtb" should dump the
generated dtb into a file called mydtb.dtb.


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-19 10:59         ` Alexander Graf
@ 2014-08-19 14:15           ` Joel Schopp
  2014-08-19 14:29             ` Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Joel Schopp @ 2014-08-19 14:15 UTC (permalink / raw)
  To: Alexander Graf, Peter Maydell
  Cc: Kim Phillips, eric.auger, Eric Auger, Patch Tracking,
	Will Deacon, Alvise Rigo, QEMU Developers, Bharat Bhushan,
	Alex Williamson, Stuart Yoder, Antonios Motakis, kvmarm,
	Christoffer Dall


>> For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
>> overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.
>>
>> I think you are right on the size.  I also wonder if the user doesn't
>> pass in a dtb if qemu should try to recreate the device-tree entry from
>> the platform device entry in the host kernel?  If so would that best be
>> done by recreating the values from /proc/device-tree ?
>>
>> I also wish that qemu had a flag to output the generated dtb to a file
>> much like lkvm (kvmtool) has.
> It does. "qemu-system-foo -machine dumpdtb=mydtb.dtb" should dump the
> generated dtb into a file called mydtb.dtb.

Would a patch that adds this output to --help be welcomed?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation
  2014-08-19 14:15           ` Joel Schopp
@ 2014-08-19 14:29             ` Alexander Graf
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Graf @ 2014-08-19 14:29 UTC (permalink / raw)
  To: Joel Schopp, Peter Maydell
  Cc: Kim Phillips, eric.auger, Eric Auger, Patch Tracking,
	Will Deacon, Alvise Rigo, QEMU Developers, Bharat Bhushan,
	Alex Williamson, Stuart Yoder, Antonios Motakis, kvmarm,
	Christoffer Dall



On 19.08.14 16:15, Joel Schopp wrote:
> 
>>> For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
>>> overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.
>>>
>>> I think you are right on the size.  I also wonder if the user doesn't
>>> pass in a dtb if qemu should try to recreate the device-tree entry from
>>> the platform device entry in the host kernel?  If so would that best be
>>> done by recreating the values from /proc/device-tree ?
>>>
>>> I also wish that qemu had a flag to output the generated dtb to a file
>>> much like lkvm (kvmtool) has.
>> It does. "qemu-system-foo -machine dumpdtb=mydtb.dtb" should dump the
>> generated dtb into a file called mydtb.dtb.
> 
> Would a patch that adds this output to --help be welcomed?

Not sure -help is the right place for these debugging options. But I
would definitely love to see all -machine options properly documented in
the man page!


Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
  2014-08-11 19:20   ` Alex Williamson
  2014-08-11 19:25   ` Alex Williamson
@ 2014-08-20 19:12   ` Joel Schopp
  2014-08-20 19:41     ` Alex Williamson
  2 siblings, 1 reply; 50+ messages in thread
From: Joel Schopp @ 2014-08-20 19:12 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel,
	kim.phillips, a.rigo
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, agraf,
	stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm


> +int vfio_get_device(VFIOGroup *group, const char *name,
> +                       VFIODevice *vbasedev)
> +{
> +    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> +    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> +    int ret;
> +
> +    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> +    if (ret < 0) {
Should be:
if(ret) {
instead of:
if (ret < 0) {

The ioctl can, and sometimes does, return positive values in case of
errors.  This should also be fixed in vfio_container_do_ioctl()

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-20 19:12   ` Joel Schopp
@ 2014-08-20 19:41     ` Alex Williamson
  2014-08-20 20:08       ` Joel Schopp
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Williamson @ 2014-08-20 19:41 UTC (permalink / raw)
  To: Joel Schopp
  Cc: agraf, kim.phillips, eric.auger, Eric Auger, peter.maydell,
	patches, will.deacon, a.rigo, qemu-devel, Bharat.Bhushan,
	Kim Phillips, stuart.yoder, a.motakis, kvmarm, christoffer.dall

On Wed, 2014-08-20 at 14:12 -0500, Joel Schopp wrote:
> > +int vfio_get_device(VFIOGroup *group, const char *name,
> > +                       VFIODevice *vbasedev)
> > +{
> > +    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> > +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> > +    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> > +    int ret;
> > +
> > +    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> > +    if (ret < 0) {
> Should be:
> if(ret) {
> instead of:
> if (ret < 0) {
> 
> The ioctl can, and sometimes does, return positive values in case of
> errors.  This should also be fixed in vfio_container_do_ioctl()

This particular ioctl usually does return a positive value, the file
descriptor for the the device, so I think it's correct as written.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-20 19:41     ` Alex Williamson
@ 2014-08-20 20:08       ` Joel Schopp
  0 siblings, 0 replies; 50+ messages in thread
From: Joel Schopp @ 2014-08-20 20:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf, kim.phillips, eric.auger, Eric Auger, peter.maydell,
	patches, will.deacon, a.rigo, qemu-devel, Bharat.Bhushan,
	Kim Phillips, stuart.yoder, a.motakis, kvmarm, christoffer.dall


On 08/20/2014 02:41 PM, Alex Williamson wrote:
> On Wed, 2014-08-20 at 14:12 -0500, Joel Schopp wrote:
>>> +int vfio_get_device(VFIOGroup *group, const char *name,
>>> +                       VFIODevice *vbasedev)
>>> +{
>>> +    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> +    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
>>> +    int ret;
>>> +
>>> +    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
>>> +    if (ret < 0) {
>> Should be:
>> if(ret) {
>> instead of:
>> if (ret < 0) {
>>
>> The ioctl can, and sometimes does, return positive values in case of
>> errors.  This should also be fixed in vfio_container_do_ioctl()
> This particular ioctl usually does return a positive value, the file
> descriptor for the the device, so I think it's correct as written.
> Thanks,
Thanks for the catch, I stand corrected.  The kernel I am running
against contains corresponding patches that are spitting out an
erroneous pr_err() on if(ret).  In retrospect it looks like the kernel
patches and not the qemu patches are in the wrong.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device
  2014-08-13  3:32       ` David Gibson
@ 2014-08-29 10:00         ` Eric Auger
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Auger @ 2014-08-29 10:00 UTC (permalink / raw)
  To: David Gibson
  Cc: peter.maydell, kim.phillips, eric.auger, joel.schopp, patches,
	will.deacon, qemu-devel, a.rigo, Bharat.Bhushan, agraf,
	alex.williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

On 08/13/2014 05:32 AM, David Gibson wrote:
> On Tue, Aug 12, 2014 at 08:54:34AM +0200, Eric Auger wrote:
>> On 08/12/2014 04:41 AM, David Gibson wrote:
>>> On Sat, Aug 09, 2014 at 03:25:44PM +0100, Eric Auger wrote:
>>>> vfio_get_device now takes a VFIODevice as argument. The function is split
>>>> into 4 functional parts: dev_info query, device check, region populate
>>>> and interrupt populate. the last 3 are specialized by parent device and
>>>> are added into DeviceOps.
>>>
>>> Why is splitting these up into 4 stages useful, rather than having a
>>> single sub-class specific callback?
>>
>> Hi David,
>>
>> VFIOPlatformDevice already inherits from SysBusDevice and hence cannot
>> inherit from another VFIODevice. Same for VFIOPCIDevice that inherits
>> from PCIDevice. This is why I created this non QOM struct. But did you
>> mean something else?
> 
> Ah, yes, sorry, I missed that, though it's obvious now I think about
> it.
> 
>> Then splitting into 4: This was to share some code between platform and
>> PCI (dev_info query) and vfio_get_device was quite big already. I
>> thought it makes sense to split it into functional parts.
> 
> Hm, ok.  So splitting out dev_info_query certainly makes sense then.
> But does splitting the two populate sections make sense?  Is it
> plausible that two different VFIO capable busses would share one of
> these functions but not the other?

Hi David,

Coming back to you on that topic. There is no other justification for
splitting the code into 3 functions except than having shorter functions
with reduced functionality. But I acknowledge it would simplify the diff
between original code and new one so I intend to keep a single
specialized functions instead of 3.

Best Regards

Eric

> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-08-13 19:59       ` Alex Williamson
@ 2014-09-01 16:31         ` Eric Auger
  2014-09-01 17:41           ` Alexander Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-09-01 16:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: agraf, kim.phillips, eric.auger, peter.maydell, Kim Phillips,
	patches, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	stuart.yoder, joel.schopp, a.motakis, kvmarm, christoffer.dall

On 08/13/2014 09:59 PM, Alex Williamson wrote:
> On Tue, 2014-08-12 at 08:09 +0200, Eric Auger wrote:
>> On 08/11/2014 09:25 PM, Alex Williamson wrote:
>>> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>> new file mode 100644
>>>> index 0000000..4684ee5
>>>> --- /dev/null
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -0,0 +1,151 @@
>>>> +/*
>>>> + * common header for vfio based device assignment support
>>>> + *
>>>> + * Copyright Red Hat, Inc. 2012
>>>> + *
>>>> + * Authors:
>>>> + *  Alex Williamson <alex.williamson@redhat.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>> + * the COPYING file in the top-level directory.
>>>> + *
>>>> + * Based on qemu-kvm device-assignment:
>>>> + *  Adapted for KVM by Qumranet.
>>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>>>> + */
>>>> +#ifndef HW_VFIO_VFIO_COMMON_H
>>>> +#define HW_VFIO_VFIO_COMMON_H
>>>> +
>>>> +#include "qemu-common.h"
>>>> +#include "exec/address-spaces.h"
>>>> +#include "exec/memory.h"
>>>> +#include "qemu/queue.h"
>>>> +#include "qemu/notify.h"
>>>> +
>>>> +/*#define DEBUG_VFIO*/
>>>> +#ifdef DEBUG_VFIO
>>>> +#define DPRINTF(fmt, ...) \
>>>> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>>>> +#else
>>>> +#define DPRINTF(fmt, ...) \
>>>> +    do { } while (0)
>>>> +#endif
>>>
>>>
>>> DPRINTF also need to be renamed to avoid conflicting namespace issues.
>> Ji Alex,
>>
>> OK.
>>
>> As I am going to touch at traces,
>> - are you OK if I use the new .name field to simply format strings?
> 
> Sure, that's fine.
> 
>>     DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
>>             vdev->host.bus, vdev->host.slot, vdev->host.function,
>>             'A' + vdev->intx.pin);
>> - Also Alex was suggesting to use trace points. What is your position
>> about that? Also I am not 100% sure of what it consists in? is it trace
>> events as documented in docs/tracing.txt
> 
> I think it would be a great conversion, but it's not required.  Thanks,

Hi Alex,

I am currently progressing on the conversion to trace points (I did it
for platform and common and now do the job for PCI). I wonder whether it
makes sense I convert all DPRINTF into trace-points or only convert a
subset (state transitions, ...). Would you accept a mixture of DPRINTFs
and trace-points or do you advise to convert everything?

Also the tracing.txt doc says we should use the name of the function as
prefix. That being said it could be interesting to trace all pci* or all
platform* and wildcard seems to work fine to select the trace-events. So
my second question is would you accept using pci_<function_name>_* as a
generic pattern.

Thanks in advance

Best Regards

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-09-01 16:31         ` Eric Auger
@ 2014-09-01 17:41           ` Alexander Graf
  2014-09-02  7:13             ` Eric Auger
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Graf @ 2014-09-01 17:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kim.phillips, eric.auger, patches, Kim Phillips,
	joel.schopp, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	Alex Williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall



> Am 01.09.2014 um 18:31 schrieb Eric Auger <eric.auger@linaro.org>:
> 
>> On 08/13/2014 09:59 PM, Alex Williamson wrote:
>>> On Tue, 2014-08-12 at 08:09 +0200, Eric Auger wrote:
>>>> On 08/11/2014 09:25 PM, Alex Williamson wrote:
>>>>> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>> new file mode 100644
>>>>> index 0000000..4684ee5
>>>>> --- /dev/null
>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>> @@ -0,0 +1,151 @@
>>>>> +/*
>>>>> + * common header for vfio based device assignment support
>>>>> + *
>>>>> + * Copyright Red Hat, Inc. 2012
>>>>> + *
>>>>> + * Authors:
>>>>> + *  Alex Williamson <alex.williamson@redhat.com>
>>>>> + *
>>>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>>> + * the COPYING file in the top-level directory.
>>>>> + *
>>>>> + * Based on qemu-kvm device-assignment:
>>>>> + *  Adapted for KVM by Qumranet.
>>>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>>>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>>>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>>>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>>>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>>>>> + */
>>>>> +#ifndef HW_VFIO_VFIO_COMMON_H
>>>>> +#define HW_VFIO_VFIO_COMMON_H
>>>>> +
>>>>> +#include "qemu-common.h"
>>>>> +#include "exec/address-spaces.h"
>>>>> +#include "exec/memory.h"
>>>>> +#include "qemu/queue.h"
>>>>> +#include "qemu/notify.h"
>>>>> +
>>>>> +/*#define DEBUG_VFIO*/
>>>>> +#ifdef DEBUG_VFIO
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>>>>> +#else
>>>>> +#define DPRINTF(fmt, ...) \
>>>>> +    do { } while (0)
>>>>> +#endif
>>>> 
>>>> 
>>>> DPRINTF also need to be renamed to avoid conflicting namespace issues.
>>> Ji Alex,
>>> 
>>> OK.
>>> 
>>> As I am going to touch at traces,
>>> - are you OK if I use the new .name field to simply format strings?
>> 
>> Sure, that's fine.
>> 
>>>    DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
>>>            vdev->host.bus, vdev->host.slot, vdev->host.function,
>>>            'A' + vdev->intx.pin);
>>> - Also Alex was suggesting to use trace points. What is your position
>>> about that? Also I am not 100% sure of what it consists in? is it trace
>>> events as documented in docs/tracing.txt
>> 
>> I think it would be a great conversion, but it's not required.  Thanks,
> 
> Hi Alex,
> 
> I am currently progressing on the conversion to trace points (I did it
> for platform and common and now do the job for PCI). I wonder whether it
> makes sense I convert all DPRINTF into trace-points or only convert a
> subset (state transitions, ...). Would you accept a mixture of DPRINTFs
> and trace-points or do you advise to convert everything?

Yeah, it's perfectly good to even just nit introduce new dprintfs.

> 
> Also the tracing.txt doc says we should use the name of the function as
> prefix. That being said it could be interesting to trace all pci* or all
> platform* and wildcard seems to work fine to select the trace-events. So
> my second question is would you accept using pci_<function_name>_* as a
> generic pattern.

Not sure - maybe be more explicit and call it vfio_pci_...?


Alex

> 
> Thanks in advance
> 
> Best Regards
> 
> Eric
>> 
>> Alex
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-09-01 17:41           ` Alexander Graf
@ 2014-09-02  7:13             ` Eric Auger
  2014-09-02 21:13               ` Alex Williamson
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Auger @ 2014-09-02  7:13 UTC (permalink / raw)
  To: Alexander Graf
  Cc: peter.maydell, kim.phillips, eric.auger, patches, Kim Phillips,
	joel.schopp, will.deacon, qemu-devel, a.rigo, Bharat.Bhushan,
	Alex Williamson, stuart.yoder, a.motakis, kvmarm,
	christoffer.dall

On 09/01/2014 07:41 PM, Alexander Graf wrote:
> 
> 
>> Am 01.09.2014 um 18:31 schrieb Eric Auger <eric.auger@linaro.org>:
>>
>>> On 08/13/2014 09:59 PM, Alex Williamson wrote:
>>>> On Tue, 2014-08-12 at 08:09 +0200, Eric Auger wrote:
>>>>> On 08/11/2014 09:25 PM, Alex Williamson wrote:
>>>>>> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
>>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>>>>> new file mode 100644
>>>>>> index 0000000..4684ee5
>>>>>> --- /dev/null
>>>>>> +++ b/include/hw/vfio/vfio-common.h
>>>>>> @@ -0,0 +1,151 @@
>>>>>> +/*
>>>>>> + * common header for vfio based device assignment support
>>>>>> + *
>>>>>> + * Copyright Red Hat, Inc. 2012
>>>>>> + *
>>>>>> + * Authors:
>>>>>> + *  Alex Williamson <alex.williamson@redhat.com>
>>>>>> + *
>>>>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>>>> + * the COPYING file in the top-level directory.
>>>>>> + *
>>>>>> + * Based on qemu-kvm device-assignment:
>>>>>> + *  Adapted for KVM by Qumranet.
>>>>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>>>>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>>>>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>>>>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>>>>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>>>>>> + */
>>>>>> +#ifndef HW_VFIO_VFIO_COMMON_H
>>>>>> +#define HW_VFIO_VFIO_COMMON_H
>>>>>> +
>>>>>> +#include "qemu-common.h"
>>>>>> +#include "exec/address-spaces.h"
>>>>>> +#include "exec/memory.h"
>>>>>> +#include "qemu/queue.h"
>>>>>> +#include "qemu/notify.h"
>>>>>> +
>>>>>> +/*#define DEBUG_VFIO*/
>>>>>> +#ifdef DEBUG_VFIO
>>>>>> +#define DPRINTF(fmt, ...) \
>>>>>> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
>>>>>> +#else
>>>>>> +#define DPRINTF(fmt, ...) \
>>>>>> +    do { } while (0)
>>>>>> +#endif
>>>>>
>>>>>
>>>>> DPRINTF also need to be renamed to avoid conflicting namespace issues.
>>>> Ji Alex,
>>>>
>>>> OK.
>>>>
>>>> As I am going to touch at traces,
>>>> - are you OK if I use the new .name field to simply format strings?
>>>
>>> Sure, that's fine.
>>>
>>>>    DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
>>>>            vdev->host.bus, vdev->host.slot, vdev->host.function,
>>>>            'A' + vdev->intx.pin);
>>>> - Also Alex was suggesting to use trace points. What is your position
>>>> about that? Also I am not 100% sure of what it consists in? is it trace
>>>> events as documented in docs/tracing.txt
>>>
>>> I think it would be a great conversion, but it's not required.  Thanks,
>>
>> Hi Alex,
>>
>> I am currently progressing on the conversion to trace points (I did it
>> for platform and common and now do the job for PCI). I wonder whether it
>> makes sense I convert all DPRINTF into trace-points or only convert a
>> subset (state transitions, ...). Would you accept a mixture of DPRINTFs
>> and trace-points or do you advise to convert everything?
> 
> Yeah, it's perfectly good to even just nit introduce new dprintfs.
ok thanks
> 
>>
>> Also the tracing.txt doc says we should use the name of the function as
>> prefix. That being said it could be interesting to trace all pci* or all
>> platform* and wildcard seems to work fine to select the trace-events. So
>> my second question is would you accept using pci_<function_name>_* as a
>> generic pattern.
> 
> Not sure - maybe be more explicit and call it vfio_pci_...?
well. maybe as a first draft I will follow the tracing.txt guideline and
you will tell me, both Alex's, what you think of the outcome. Anyway it
is not a big deal then to change ...

Thanks

Eric
> 
> 
> Alex
> 
>>
>> Thanks in advance
>>
>> Best Regards
>>
>> Eric
>>>
>>> Alex
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module
  2014-09-02  7:13             ` Eric Auger
@ 2014-09-02 21:13               ` Alex Williamson
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Williamson @ 2014-09-02 21:13 UTC (permalink / raw)
  To: Eric Auger
  Cc: peter.maydell, kim.phillips, eric.auger, patches, qemu-devel,
	joel.schopp, will.deacon, a.rigo, Alexander Graf, Bharat.Bhushan,
	Kim Phillips, stuart.yoder, a.motakis, kvmarm, christoffer.dall

On Tue, 2014-09-02 at 09:13 +0200, Eric Auger wrote:
> On 09/01/2014 07:41 PM, Alexander Graf wrote:
> > 
> > 
> >> Am 01.09.2014 um 18:31 schrieb Eric Auger <eric.auger@linaro.org>:
> >>
> >>> On 08/13/2014 09:59 PM, Alex Williamson wrote:
> >>>> On Tue, 2014-08-12 at 08:09 +0200, Eric Auger wrote:
> >>>>> On 08/11/2014 09:25 PM, Alex Williamson wrote:
> >>>>>> On Sat, 2014-08-09 at 15:25 +0100, Eric Auger wrote:
> >>>>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >>>>>> new file mode 100644
> >>>>>> index 0000000..4684ee5
> >>>>>> --- /dev/null
> >>>>>> +++ b/include/hw/vfio/vfio-common.h
> >>>>>> @@ -0,0 +1,151 @@
> >>>>>> +/*
> >>>>>> + * common header for vfio based device assignment support
> >>>>>> + *
> >>>>>> + * Copyright Red Hat, Inc. 2012
> >>>>>> + *
> >>>>>> + * Authors:
> >>>>>> + *  Alex Williamson <alex.williamson@redhat.com>
> >>>>>> + *
> >>>>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> >>>>>> + * the COPYING file in the top-level directory.
> >>>>>> + *
> >>>>>> + * Based on qemu-kvm device-assignment:
> >>>>>> + *  Adapted for KVM by Qumranet.
> >>>>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> >>>>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> >>>>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> >>>>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> >>>>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> >>>>>> + */
> >>>>>> +#ifndef HW_VFIO_VFIO_COMMON_H
> >>>>>> +#define HW_VFIO_VFIO_COMMON_H
> >>>>>> +
> >>>>>> +#include "qemu-common.h"
> >>>>>> +#include "exec/address-spaces.h"
> >>>>>> +#include "exec/memory.h"
> >>>>>> +#include "qemu/queue.h"
> >>>>>> +#include "qemu/notify.h"
> >>>>>> +
> >>>>>> +/*#define DEBUG_VFIO*/
> >>>>>> +#ifdef DEBUG_VFIO
> >>>>>> +#define DPRINTF(fmt, ...) \
> >>>>>> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> >>>>>> +#else
> >>>>>> +#define DPRINTF(fmt, ...) \
> >>>>>> +    do { } while (0)
> >>>>>> +#endif
> >>>>>
> >>>>>
> >>>>> DPRINTF also need to be renamed to avoid conflicting namespace issues.
> >>>> Ji Alex,
> >>>>
> >>>> OK.
> >>>>
> >>>> As I am going to touch at traces,
> >>>> - are you OK if I use the new .name field to simply format strings?
> >>>
> >>> Sure, that's fine.
> >>>
> >>>>    DPRINTF("%s(%04x:%02x:%02x.%x) Pin %c\n", __func__, vdev->host.domain,
> >>>>            vdev->host.bus, vdev->host.slot, vdev->host.function,
> >>>>            'A' + vdev->intx.pin);
> >>>> - Also Alex was suggesting to use trace points. What is your position
> >>>> about that? Also I am not 100% sure of what it consists in? is it trace
> >>>> events as documented in docs/tracing.txt
> >>>
> >>> I think it would be a great conversion, but it's not required.  Thanks,
> >>
> >> Hi Alex,
> >>
> >> I am currently progressing on the conversion to trace points (I did it
> >> for platform and common and now do the job for PCI). I wonder whether it
> >> makes sense I convert all DPRINTF into trace-points or only convert a
> >> subset (state transitions, ...). Would you accept a mixture of DPRINTFs
> >> and trace-points or do you advise to convert everything?
> > 
> > Yeah, it's perfectly good to even just nit introduce new dprintfs.
> ok thanks
> > 
> >>
> >> Also the tracing.txt doc says we should use the name of the function as
> >> prefix. That being said it could be interesting to trace all pci* or all
> >> platform* and wildcard seems to work fine to select the trace-events. So
> >> my second question is would you accept using pci_<function_name>_* as a
> >> generic pattern.
> > 
> > Not sure - maybe be more explicit and call it vfio_pci_...?
> well. maybe as a first draft I will follow the tracing.txt guideline and
> you will tell me, both Alex's, what you think of the outcome. Anyway it
> is not a big deal then to change ...

I haven't touched tracing yet, so I'll defer to you and agraf for now ;)
Thanks,

Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-09-02 21:13 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-09 14:25 [Qemu-devel] [PATCH v5 00/10] KVM platform device passthrough Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 01/10] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 02/10] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 03/10] hw/vfio/pci: introduce VFIODevice Eric Auger
2014-08-12  2:34   ` David Gibson
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 04/10] hw/vfio/pci: Introduce VFIORegion Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 05/10] hw/vfio/pci: split vfio_get_device Eric Auger
2014-08-12  2:41   ` David Gibson
2014-08-12  6:54     ` Eric Auger
2014-08-13  3:32       ` David Gibson
2014-08-29 10:00         ` Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 06/10] hw/vfio: create common module Eric Auger
2014-08-11 19:20   ` Alex Williamson
2014-08-12  5:57     ` Eric Auger
2014-08-11 19:25   ` Alex Williamson
2014-08-12  6:09     ` Eric Auger
2014-08-13 19:59       ` Alex Williamson
2014-09-01 16:31         ` Eric Auger
2014-09-01 17:41           ` Alexander Graf
2014-09-02  7:13             ` Eric Auger
2014-09-02 21:13               ` Alex Williamson
2014-08-20 19:12   ` Joel Schopp
2014-08-20 19:41     ` Alex Williamson
2014-08-20 20:08       ` Joel Schopp
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 07/10] hw/vfio/platform: add vfio-platform support Eric Auger
2014-08-11  9:36   ` Alexander Graf
2014-08-12  7:59     ` Bharat.Bhushan
2014-08-12 16:34       ` Eric Auger
2014-08-11 20:13   ` Alex Williamson
2014-08-12  5:51     ` Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 08/10] hw/intc/arm_gic_kvm: advertise irqfd Eric Auger
2014-08-11  9:37   ` Alexander Graf
2014-08-11 12:04     ` Eric Auger
2014-08-11 12:05       ` Alexander Graf
2014-08-11 12:27         ` Eric Auger
2014-08-11 12:29           ` Alexander Graf
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 09/10] hw/vfio/platform: Add irqfd support Eric Auger
2014-08-09 14:25 ` [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation Eric Auger
2014-08-11  9:40   ` Alexander Graf
2014-08-11 11:55     ` Eric Auger
2014-08-18 21:54   ` Joel Schopp
2014-08-18 22:11     ` Peter Maydell
2014-08-18 22:26       ` Joel Schopp
2014-08-19  7:32         ` Eric Auger
2014-08-19 10:59         ` Alexander Graf
2014-08-19 14:15           ` Joel Schopp
2014-08-19 14:29             ` Alexander Graf
2014-08-19  7:24       ` Eric Auger
2014-08-19  8:17         ` Peter Maydell
2014-08-19  6:59     ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.