All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
@ 2014-09-09  7:31 Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

This RFC series aims at enabling KVM platform device passthrough.
It implements a VFIO platform device, derived from VFIO PCI device.

The VFIO platform device uses the host VFIO platform driver which must
be bound to the assigned device prior to the QEMU system start.

- the guest can directly access the device register space
- assigned device IRQs are transparently routed to the guest by
  QEMU/KVM (3 methods currently are supported: user-level eventfd
  handling, irqfd, forwarded IRQs)
- iommu is transparently programmed to prevent the device from
  accessing physical pages outside of the guest address space

This patch series is made of the following patch files:

1-7) Modifications to PCI code to prepare for VFIO platform device
8) split of PCI specific code and generic code (move)
9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
      support (MMIO direct access and IRQ assignment).
12) fake injection test modality (to test multiple IRQ)
13) addition of irqfd/virqfd support
14-16) forwarded IRQ

Dependency List:

QEMU dependencies:
[1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
    http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
[2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
[3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
    Eric Auger
    http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
[4] [RFC] vfio: migration to trace points, Eric Auger
    http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html

Kernel Dependencies:
[5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
    https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
[6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
    https://lkml.org/lkml/2014/9/1/141
[7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
    http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
[8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
    https://lkml.org/lkml/2014/9/1/344
[9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
    Marc Zyngier
    http://lwn.net/Articles/603514/

kernel pieces can be found at:
http://git.linaro.org/people/eric.auger/linux.git
(branch 3.17rc3_irqfd_forward_integ_v2)
QEMU pieces can be found at:
http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)

The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
is assigned to KVM host while the second one is assigned to the guest.
Reworked PCI device is not tested.

Wiki for Calxeda Midway setup:
https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway

History:

v5->v6:
- rebase on 2.1rc5 PCI code
- forwarded IRQ first integraton
- vfio_device property renamed into host property
- split IRQ setup in different functions that match the 3 supported
  injection techniques (user handled eventfd, irqfd, forwarded IRQ):
  removes dynamic switch between injection methods
- introduce fake interrupts as a test modality:
  x makes possible to test multiple IRQ user-side handling.
  x this is a test feature only: enable to trigger a fd as if the
    real physical IRQ hit. No virtual IRQ is injected into the guest
    but handling is simulated so that the state machine can be tested
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs using fake interrupt modality
- irqfd no more advertised in this patchset (handled in [3])
- VFIOPlatformDeviceClass becomes abstract and Calxeda xgmac device
  and class is re-introduced (as per v4)
- all DPRINTF removed in platform and replaced by trace-points
- corrects compilation with configure --disable-kvm
- simplifies the split for vfio_get_device and introduce a unique
  specialized function named vfio_populate_device
- group_list renamed into vfio_group_list
- hw/arm/dyn_sysbus_devtree.c currently only support vfio-calxeda-xgmac
  instantiation. Needs to be specialized for other VFIO devices
- fix 2 bugs in dyn_sysbus_devtree(reg_attr index and compat)

v4->v5:
- rebase on v2.1.0 PCI code
- take into account Alex Williamson comments on PCI code rework
  - trace updates in vfio_region_write/read
  - remove fd from VFIORegion
  - get/put ckeanup
- bug fix: bar region's vbasedev field duly initialization
- misc cleanups in platform device
- device tree node generation removed from device and handled in
  hw/arm/dyn_sysbus_devtree.c
- remove "hw/vfio: add an example calxeda_xgmac": with removal of
  device tree node generation we do not have so many things to
  implement in that derived device yet. May be re-introduced later
  on if needed typically for reset/migration.
- no GSI routing table anymore

v3->v4 changes (Eric Auger, Alvise Rigo)
- rebase on last VFIO PCI code (v2.1.0-rc0)
- full git history rework to ease PCI code change review
- mv include files in hw/vfio
- DPRINTF reformatting temporarily moved out
- support of VFIO virq (removal of resamplefd handler on user-side)
- integration with sysbus dynamic instantiation framwork
- removal of unrealize and cleanup routines until it is better
  understood what is really needed
- Support of VFIO for Amba devices should be handled in an inherited
  device to specialize the device tree generation (clock handle currently
  missing in framework however)
- "Always use eventfd as notifying mechanism" temporarily moved out
- static instantiation is not mainstream (although it remains possible)
  note if static instantiation is used, irqfd must be setup in machine file
  when virtual IRQ is known
- create the GSI routing table on qemu side

v2->v3 changes (Alvise Rigo, Eric Auger):
- Following Alex W recommandations, further efforts to factorize the
  code between PCI:introduction of VFIODevice and VFIORegion
  as base classes
- unique reset handler for platform and PCI
- cleanup following Kim's comments
- multiple IRQ support mechanics should be in place although not
  tested
- Better handling of MMIO multiple regions
- New features and fixes by Alvise (multiple compat string, exec
  flag, force eventfd usage, amba device tree support)
- irqfd support

v1->v2 changes (Kim Phillips, Eric Auger):
- IRQ initial support (legacy mode where eventfds are handled on
  user side)
- hacked dynamic instantiation

v1 (Kim Phillips):
- initial split between PCI and platform
- MMIO support only
- static instantiation

Best Regards

Eric


Eric Auger (15):
  hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  hw/vfio/pci: introduce VFIODevice
  hw/vfio/pci: Introduce VFIORegion
  hw/vfio/pci: split vfio_get_device
  hw/vfio/pci: rename group_list into vfio_group_list
  hw/vfio/pci: use name field in format strings
  hw/vfio: create common module
  hw/vfio/platform: add vfio-platform support
  hw/vfio: calxeda xgmac device
  hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic
    instantiation
  vfio/platform: add fake injection modality
  hw/vfio/platform: Add irqfd support
  linux-headers: Update KVM headers from linux-next tag ToBeFilled
  VFIO: COMMON: vfio_kvm_device_fd moved in the common header
  VFIO: PLATFORM: add forwarded irq support

Kim Phillips (1):
  vfio: move hw/misc/vfio.c to hw/vfio/pci.c     Move vfio.h into
    include/hw/vfio

 LICENSE                              |    2 +-
 MAINTAINERS                          |    2 +-
 hw/Makefile.objs                     |    1 +
 hw/arm/dyn_sysbus_devtree.c          |  141 +++
 hw/misc/Makefile.objs                |    1 -
 hw/ppc/spapr_pci_vfio.c              |    2 +-
 hw/vfio/Makefile.objs                |    6 +
 hw/vfio/calxeda_xgmac.c              |   57 ++
 hw/vfio/common.c                     |  959 +++++++++++++++++++
 hw/{misc/vfio.c => vfio/pci.c}       | 1670 +++++++---------------------------
 hw/vfio/platform.c                   |  874 ++++++++++++++++++
 include/hw/vfio/vfio-calxeda-xgmac.h |   41 +
 include/hw/vfio/vfio-common.h        |  157 ++++
 include/hw/vfio/vfio-platform.h      |   95 ++
 include/hw/{misc => vfio}/vfio.h     |    0
 linux-headers/linux/kvm.h            |    9 +
 trace-events                         |  136 +--
 17 files changed, 2739 insertions(+), 1414 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 create mode 100644 hw/vfio/calxeda_xgmac.c
 create mode 100644 hw/vfio/common.c
 rename hw/{misc/vfio.c => vfio/pci.c} (65%)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h
 create mode 100644 include/hw/vfio/vfio-common.h
 create mode 100644 include/hw/vfio/vfio-platform.h
 rename include/hw/{misc => vfio}/vfio.h (100%)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, Kim Phillips, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

From: Kim Phillips <kim.phillips@linaro.org>

This is done in preparation for the addition of VFIO platform
device support.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 LICENSE                          | 2 +-
 MAINTAINERS                      | 2 +-
 hw/Makefile.objs                 | 1 +
 hw/misc/Makefile.objs            | 1 -
 hw/ppc/spapr_pci_vfio.c          | 2 +-
 hw/vfio/Makefile.objs            | 3 +++
 hw/{misc/vfio.c => vfio/pci.c}   | 2 +-
 include/hw/{misc => vfio}/vfio.h | 0
 8 files changed, 8 insertions(+), 5 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 rename hw/{misc/vfio.c => vfio/pci.c} (99%)
 rename include/hw/{misc => vfio}/vfio.h (100%)

diff --git a/LICENSE b/LICENSE
index da70e94..0e0b4b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -11,7 +11,7 @@ option) any later version.
 
 As of July 2013, contributions under version 2 of the GNU General Public
 License (and no later version) are only accepted for the following files
-or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
+or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*.
 
 3) The Tiny Code Generator (TCG) is released under the BSD license
    (see license headers in files).
diff --git a/MAINTAINERS b/MAINTAINERS
index 206bf7e..8683f62 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -625,7 +625,7 @@ F: tests/usb-*-test.c
 VFIO
 M: Alex Williamson <alex.williamson@redhat.com>
 S: Supported
-F: hw/misc/vfio.c
+F: hw/vfio/*
 
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 52a1464..73afa41 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/
 devices-dirs-$(CONFIG_SOFTMMU) += timer/
 devices-dirs-$(CONFIG_TPM) += tpm/
 devices-dirs-$(CONFIG_SOFTMMU) += usb/
+devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 86f6243..9b77554 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/
 
 ifeq ($(CONFIG_PCI), y)
 obj-$(CONFIG_KVM) += ivshmem.o
-obj-$(CONFIG_LINUX) += vfio.o
 endif
 
 obj-$(CONFIG_REALVIEW) += arm_sysctl.o
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..144912b 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,7 +20,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "linux/vfio.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
 
 static Property spapr_phb_vfio_properties[] = {
     DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
new file mode 100644
index 0000000..31c7dab
--- /dev/null
+++ b/hw/vfio/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_PCI) += pci.o
+endif
diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c
similarity index 99%
rename from hw/misc/vfio.c
rename to hw/vfio/pci.c
index 3d32657..7e6a1bc 100644
--- a/hw/misc/vfio.c
+++ b/hw/vfio/pci.c
@@ -39,8 +39,8 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
-#include "hw/misc/vfio.h"
 #include "trace.h"
+#include "hw/vfio/vfio.h"
 
 /* Extra debugging, trap acceleration paths for more logging */
 #define VFIO_ALLOW_MMAP 1
diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h
similarity index 100%
rename from include/hw/misc/vfio.h
rename to include/hw/vfio/vfio.h
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

This prepares for the introduction of VFIOPlatformDevice

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 209 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 105 insertions(+), 104 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7e6a1bc..ad5da4b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,11 +48,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
-struct VFIODevice;
+struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
     MemoryRegion mem;
-    struct VFIODevice *vdev;
+    struct VFIOPCIDevice *vdev;
     QLIST_ENTRY(VFIOQuirk) next;
     struct {
         uint32_t base_offset:TARGET_PAGE_BITS;
@@ -123,7 +123,7 @@ typedef struct VFIOMSIVector {
      */
     EventNotifier interrupt;
     EventNotifier kvm_interrupt;
-    struct VFIODevice *vdev; /* back pointer to device */
+    struct VFIOPCIDevice *vdev; /* back pointer to device */
     int virq;
     bool use;
 } VFIOMSIVector;
@@ -185,7 +185,7 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODevice {
+typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     int fd;
     VFIOINTx intx;
@@ -203,7 +203,7 @@ typedef struct VFIODevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIODevice) next;
+    QLIST_ENTRY(VFIOPCIDevice) next;
     struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
@@ -218,13 +218,13 @@ typedef struct VFIODevice {
     bool has_pm_reset;
     bool needs_reset;
     bool rom_read_failed;
-} VFIODevice;
+} VFIOPCIDevice;
 
 typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_HEAD(, VFIOPCIDevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -268,16 +268,16 @@ static QLIST_HEAD(, VFIOGroup)
 static int vfio_kvm_device_fd = -1;
 #endif
 
-static void vfio_disable_interrupts(VFIODevice *vdev);
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIODevice *vdev, int index)
+static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -293,7 +293,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int index)
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIODevice *vdev)
+static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -307,7 +307,7 @@ static void vfio_unmask_intx(VFIODevice *vdev)
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIODevice *vdev)
+static void vfio_mask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -338,7 +338,7 @@ static void vfio_mask_intx(VFIODevice *vdev)
  */
 static void vfio_intx_mmap_enable(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (vdev->intx.pending) {
         timer_mod(vdev->intx.mmap_timer,
@@ -351,7 +351,7 @@ static void vfio_intx_mmap_enable(void *opaque)
 
 static void vfio_intx_interrupt(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
         return;
@@ -370,7 +370,7 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIODevice *vdev)
+static void vfio_eoi(VFIOPCIDevice *vdev)
 {
     if (!vdev->intx.pending) {
         return;
@@ -384,7 +384,7 @@ static void vfio_eoi(VFIODevice *vdev)
     vfio_unmask_intx(vdev);
 }
 
-static void vfio_enable_intx_kvm(VFIODevice *vdev)
+static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -462,7 +462,7 @@ fail:
 #endif
 }
 
-static void vfio_disable_intx_kvm(VFIODevice *vdev)
+static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -506,7 +506,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev)
 
 static void vfio_update_irq(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     PCIINTxRoute route;
 
     if (vdev->interrupt != VFIO_INT_INTx) {
@@ -537,7 +537,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_eoi(vdev);
 }
 
-static int vfio_enable_intx(VFIODevice *vdev)
+static int vfio_enable_intx(VFIOPCIDevice *vdev)
 {
     uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
     int ret, argsz;
@@ -602,7 +602,7 @@ static int vfio_enable_intx(VFIODevice *vdev)
     return 0;
 }
 
-static void vfio_disable_intx(VFIODevice *vdev)
+static void vfio_disable_intx(VFIOPCIDevice *vdev)
 {
     int fd;
 
@@ -629,7 +629,7 @@ static void vfio_disable_intx(VFIODevice *vdev)
 static void vfio_msi_interrupt(void *opaque)
 {
     VFIOMSIVector *vector = opaque;
-    VFIODevice *vdev = vector->vdev;
+    VFIOPCIDevice *vdev = vector->vdev;
     int nr = vector - vdev->msi_vectors;
 
     if (!event_notifier_test_and_clear(&vector->interrupt)) {
@@ -661,7 +661,7 @@ static void vfio_msi_interrupt(void *opaque)
     }
 }
 
-static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
+static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
     struct vfio_irq_set *irq_set;
     int ret = 0, i, argsz;
@@ -752,7 +752,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector;
     int ret;
 
@@ -841,7 +841,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
     trace_vfio_msix_vector_release(vdev->host.domain, vdev->host.bus,
@@ -880,7 +880,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
     }
 }
 
-static void vfio_enable_msix(VFIODevice *vdev)
+static void vfio_enable_msix(VFIOPCIDevice *vdev)
 {
     vfio_disable_interrupts(vdev);
 
@@ -913,7 +913,7 @@ static void vfio_enable_msix(VFIODevice *vdev)
                            vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_enable_msi(VFIODevice *vdev)
+static void vfio_enable_msi(VFIOPCIDevice *vdev)
 {
     int ret, i;
 
@@ -991,7 +991,7 @@ retry:
                           vdev->nr_vectors);
 }
 
-static void vfio_disable_msi_common(VFIODevice *vdev)
+static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1015,7 +1015,7 @@ static void vfio_disable_msi_common(VFIODevice *vdev)
     vfio_enable_intx(vdev);
 }
 
-static void vfio_disable_msix(VFIODevice *vdev)
+static void vfio_disable_msix(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1042,7 +1042,7 @@ static void vfio_disable_msix(VFIODevice *vdev)
                             vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_disable_msi(VFIODevice *vdev)
+static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
     vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
@@ -1051,7 +1051,7 @@ static void vfio_disable_msi(VFIODevice *vdev)
                            vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_update_msi(VFIODevice *vdev)
+static void vfio_update_msi(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1104,7 +1104,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         trace_vfio_bar_write(vdev->host.domain, vdev->host.bus,
                              vdev->host.slot, vdev->host.function,
@@ -1120,7 +1120,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 }
 
 static uint64_t vfio_bar_read(void *opaque,
@@ -1158,7 +1158,7 @@ static uint64_t vfio_bar_read(void *opaque,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         trace_vfio_bar_read(vdev->host.domain, vdev->host.bus,
                             vdev->host.slot, vdev->host.function,
@@ -1167,7 +1167,7 @@ static uint64_t vfio_bar_read(void *opaque,
 #endif
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 
     return data;
 }
@@ -1178,7 +1178,7 @@ static const MemoryRegionOps vfio_bar_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void vfio_pci_load_rom(VFIODevice *vdev)
+static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
         .argsz = sizeof(reg_info),
@@ -1236,7 +1236,7 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
 
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1286,7 +1286,7 @@ static const MemoryRegionOps vfio_rom_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
+static bool vfio_blacklist_opt_rom(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t vendor_id, device_id;
@@ -1306,7 +1306,7 @@ static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
     return false;
 }
 
-static void vfio_pci_size_rom(VFIODevice *vdev)
+static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
@@ -1484,7 +1484,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1520,7 +1520,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     if (ranges_overlap(addr, size,
                        quirk->data.address_offset, quirk->data.address_size)) {
@@ -1578,7 +1578,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
     uint64_t data;
@@ -1611,7 +1611,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                                      uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
 
@@ -1659,7 +1659,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data = vfio_pci_read_config(&vdev->pdev,
                                          PCI_BASE_ADDRESS_0 + (4 * 4) + 1,
                                          size);
@@ -1673,7 +1673,7 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1715,7 +1715,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
  * that only read-only access is provided, but we drop writes when the window
  * is enabled to config space nonetheless.
  */
-static void vfio_probe_ati_bar4_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1778,7 +1778,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1824,7 +1824,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1873,7 +1873,7 @@ static const MemoryRegionOps vfio_rtl8168_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1902,7 +1902,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
 /*
  * Trap the BAR2 MMIO window to config space as well.
  */
-static void vfio_probe_ati_bar2_4000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1971,7 +1971,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
                                            hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
                                   addr + quirk->data.base_offset, size);
@@ -1990,7 +1990,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, hwaddr addr,
                                         uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
 
     switch (quirk->data.flags) {
@@ -2037,7 +2037,7 @@ static const MemoryRegionOps vfio_nvidia_3d0_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_nvidia_3d0_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2130,7 +2130,7 @@ static const MemoryRegionOps vfio_nvidia_bar5_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2166,7 +2166,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
                                           uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
 
@@ -2199,7 +2199,7 @@ static const MemoryRegionOps vfio_nvidia_88000_quirk = {
  *
  * Here's offset 0x88000...
  */
-static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2238,7 +2238,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
 /*
  * And here's the same for BAR0 offset 0x1800...
  */
-static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2283,13 +2283,13 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
 /*
  * Common quirk probe entry points.
  */
-static void vfio_vga_quirk_setup(VFIODevice *vdev)
+static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 {
     vfio_vga_probe_ati_3c3_quirk(vdev);
     vfio_vga_probe_nvidia_3d0_quirk(vdev);
 }
 
-static void vfio_vga_quirk_teardown(VFIODevice *vdev)
+static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -2304,7 +2304,7 @@ static void vfio_vga_quirk_teardown(VFIODevice *vdev)
     }
 }
 
-static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
 {
     vfio_probe_ati_bar4_window_quirk(vdev, nr);
     vfio_probe_ati_bar2_4000_quirk(vdev, nr);
@@ -2314,7 +2314,7 @@ static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
     vfio_probe_rtl8168_bar2_window_quirk(vdev, nr);
 }
 
-static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2332,7 +2332,7 @@ static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
  */
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
     memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -2367,7 +2367,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t val_le = cpu_to_le32(val);
 
     trace_vfio_pci_write_config(vdev->host.domain, vdev->host.bus,
@@ -2722,7 +2722,7 @@ static void vfio_listener_release(VFIOContainer *container)
 /*
  * Interrupt setup
  */
-static void vfio_disable_interrupts(VFIODevice *vdev)
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
 {
     switch (vdev->interrupt) {
     case VFIO_INT_INTx:
@@ -2737,7 +2737,7 @@ static void vfio_disable_interrupts(VFIODevice *vdev)
     }
 }
 
-static int vfio_setup_msi(VFIODevice *vdev, int pos)
+static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
 {
     uint16_t ctrl;
     bool msi_64bit, msi_maskbit;
@@ -2777,7 +2777,7 @@ static int vfio_setup_msi(VFIODevice *vdev, int pos)
  * need to first look for where the MSI-X table lives.  So we
  * unfortunately split MSI-X setup across two functions.
  */
-static int vfio_early_setup_msix(VFIODevice *vdev)
+static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
 {
     uint8_t pos;
     uint16_t ctrl;
@@ -2823,7 +2823,7 @@ static int vfio_early_setup_msix(VFIODevice *vdev)
     return 0;
 }
 
-static int vfio_setup_msix(VFIODevice *vdev, int pos)
+static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
 {
     int ret;
 
@@ -2843,7 +2843,7 @@ static int vfio_setup_msix(VFIODevice *vdev, int pos)
     return 0;
 }
 
-static void vfio_teardown_msi(VFIODevice *vdev)
+static void vfio_teardown_msi(VFIOPCIDevice *vdev)
 {
     msi_uninit(&vdev->pdev);
 
@@ -2856,7 +2856,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
 /*
  * Resource setup
  */
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
 {
     int i;
 
@@ -2874,7 +2874,7 @@ static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
     }
 }
 
-static void vfio_unmap_bar(VFIODevice *vdev, int nr)
+static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2893,7 +2893,7 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
     }
 }
 
-static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
                          MemoryRegion *mem, MemoryRegion *submem,
                          void **map, size_t size, off_t offset,
                          const char *name)
@@ -2931,7 +2931,7 @@ empty_region:
     return ret;
 }
 
-static void vfio_map_bar(VFIODevice *vdev, int nr)
+static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
     unsigned size = bar->size;
@@ -3000,7 +3000,7 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
     vfio_bar_quirk_setup(vdev, nr);
 }
 
-static void vfio_map_bars(VFIODevice *vdev)
+static void vfio_map_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3032,7 +3032,7 @@ static void vfio_map_bars(VFIODevice *vdev)
     }
 }
 
-static void vfio_unmap_bars(VFIODevice *vdev)
+static void vfio_unmap_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3068,7 +3068,7 @@ static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
     pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_word(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_word(VFIOPCIDevice *vdev, int pos,
                                    uint16_t val, uint16_t mask)
 {
     vfio_set_word_bits(vdev->pdev.config + pos, val, mask);
@@ -3081,7 +3081,7 @@ static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask)
     pci_set_long(buf, (pci_get_long(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_long(VFIOPCIDevice *vdev, int pos,
                                    uint32_t val, uint32_t mask)
 {
     vfio_set_long_bits(vdev->pdev.config + pos, val, mask);
@@ -3089,7 +3089,7 @@ static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
     vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
 }
 
-static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size)
 {
     uint16_t flags;
     uint8_t type;
@@ -3181,7 +3181,7 @@ static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
     return pos;
 }
 
-static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
 
@@ -3192,7 +3192,7 @@ static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
 
@@ -3203,7 +3203,7 @@ static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
 
@@ -3214,7 +3214,7 @@ static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint8_t cap_id, next, size;
@@ -3289,7 +3289,7 @@ static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
     return 0;
 }
 
-static int vfio_add_capabilities(VFIODevice *vdev)
+static int vfio_add_capabilities(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
 
@@ -3301,7 +3301,7 @@ static int vfio_add_capabilities(VFIODevice *vdev)
     return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
 }
 
-static void vfio_pci_pre_reset(VFIODevice *vdev)
+static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t cmd;
@@ -3338,7 +3338,7 @@ static void vfio_pci_pre_reset(VFIODevice *vdev)
     vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
-static void vfio_pci_post_reset(VFIODevice *vdev)
+static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
     vfio_enable_intx(vdev);
 }
@@ -3350,7 +3350,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
             host1->slot == host2->slot && host1->function == host2->function);
 }
 
-static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
     VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info;
@@ -3401,7 +3401,7 @@ static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3495,7 +3495,7 @@ out:
     /* Re-enable INTx on affected devices */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3545,12 +3545,12 @@ out_single:
  * _one() will only do a hot reset for the one in-use devices case, calling
  * _multi() will do nothing if a _one() would have been sufficient.
  */
-static int vfio_pci_hot_reset_one(VFIODevice *vdev)
+static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, false);
 }
@@ -3558,7 +3558,7 @@ static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
 static void vfio_pci_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIODevice *vdev;
+    VFIOPCIDevice *vdev;
 
     QLIST_FOREACH(group, &group_list, next) {
         QLIST_FOREACH(vdev, &group->device_list, next) {
@@ -3896,7 +3896,8 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev)
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIOPCIDevice *vdev)
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
@@ -4049,7 +4050,7 @@ error:
     return ret;
 }
 
-static void vfio_put_device(VFIODevice *vdev)
+static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     QLIST_REMOVE(vdev, next);
     vdev->group = NULL;
@@ -4063,7 +4064,7 @@ static void vfio_put_device(VFIODevice *vdev)
 
 static void vfio_err_notifier_handler(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
         return;
@@ -4092,7 +4093,7 @@ static void vfio_err_notifier_handler(void *opaque)
  * and continue after disabling error recovery support for the
  * device.
  */
-static void vfio_register_err_notifier(VFIODevice *vdev)
+static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     int ret;
     int argsz;
@@ -4133,7 +4134,7 @@ static void vfio_register_err_notifier(VFIODevice *vdev)
     g_free(irq_set);
 }
 
-static void vfio_unregister_err_notifier(VFIODevice *vdev)
+static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 {
     int argsz;
     struct vfio_irq_set *irq_set;
@@ -4168,7 +4169,7 @@ static void vfio_unregister_err_notifier(VFIODevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4322,7 +4323,7 @@ out_put:
 
 static void vfio_exitfn(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group = vdev->group;
 
     vfio_unregister_err_notifier(vdev);
@@ -4342,7 +4343,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 static void vfio_pci_reset(DeviceState *dev)
 {
     PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
     trace_vfio_pci_reset(vdev->host.domain, vdev->host.bus,
                          vdev->host.slot, vdev->host.function);
@@ -4374,16 +4375,16 @@ post_reset:
 }
 
 static Property vfio_pci_dev_properties[] = {
-    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIODevice, host),
-    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIODevice,
+    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
+    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
                        intx.mmap_timeout, 1100),
-    DEFINE_PROP_BIT("x-vga", VFIODevice, features,
+    DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
                     VFIO_FEATURE_ENABLE_VGA_BIT, false),
-    DEFINE_PROP_INT32("bootindex", VFIODevice, bootindex, -1),
+    DEFINE_PROP_INT32("bootindex", VFIOPCIDevice, bootindex, -1),
     /*
      * TODO - support passed fds... is this necessary?
-     * DEFINE_PROP_STRING("vfiofd", VFIODevice, vfiofd_name),
-     * DEFINE_PROP_STRING("vfiogroupfd, VFIODevice, vfiogroupfd_name),
+     * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
+     * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
      */
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -4413,7 +4414,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 static const TypeInfo vfio_pci_dev_info = {
     .name = "vfio-pci",
     .parent = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(VFIODevice),
+    .instance_size = sizeof(VFIOPCIDevice),
     .class_init = vfio_pci_dev_class_init,
 };
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 03/16] hw/vfio/pci: introduce VFIODevice
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

Introduce the VFIODevice struct that is going to be shared by
VFIOPCIDevice and VFIOPlatformDevice.

Additional fields will be added there later on for review
convenience.

the group's device_list becomes a list of VFIODevice

This obliges to rework the reset_handler which becomes generic and
calls VFIODevice ops that are specialized in each parent object.
Also functions that iterate on this list must take care that the
devices can be something else than VFIOPCIDevice. The type is used
to discriminate them.

we profit from this step to change the prototype of
vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
apply to VFIODevice. They are renamed as *_irqindex.
The index is passed as parameter to anticipate their usage for
platform IRQs

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- fix style issues
- in vfio_initfn, rework allocation of vdev->vbasedev.name and
  replace snprintf by g_strdup_printf
---
 hw/vfio/pci.c | 241 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 147 insertions(+), 94 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ad5da4b..e2caa08 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,6 +48,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
 struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
@@ -185,9 +190,27 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+};
+
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
-    int fd;
+    VFIODevice vbasedev;
     VFIOINTx intx;
     unsigned int config_size;
     uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
@@ -203,20 +226,16 @@ typedef struct VFIOPCIDevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIOPCIDevice) next;
-    struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
 #define VFIO_FEATURE_ENABLE_VGA_BIT 0
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
     int32_t bootindex;
     uint8_t pm_cap;
-    bool reset_works;
     bool has_vga;
     bool pci_aer;
     bool has_flr;
     bool has_pm_reset;
-    bool needs_reset;
     bool rom_read_failed;
 } VFIOPCIDevice;
 
@@ -224,7 +243,7 @@ typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIOPCIDevice) device_list;
+    QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -277,7 +296,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -287,37 +306,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
         .count = 0,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIOPCIDevice *vdev)
+static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIOPCIDevice *vdev)
+static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 #endif
 
@@ -381,7 +400,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -404,7 +423,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     /* Get to a known interrupt state */
     qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -434,7 +453,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     *pfd = irqfd.resamplefd;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
@@ -442,7 +461,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
     }
 
     /* Let'em rip */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     vdev->intx.kvm_accel = true;
 
@@ -458,7 +477,7 @@ fail_irqfd:
     event_notifier_cleanup(&vdev->intx.unmask);
 fail:
     qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 #endif
 }
 
@@ -479,7 +498,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
      * Get to a known state, hardware masked, QEMU ready to accept new
      * interrupts, QEMU IRQ de-asserted.
      */
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -497,7 +516,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
     vdev->intx.kvm_accel = false;
 
     /* If we've missed an event, let it re-fire through QEMU */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
                                 vdev->host.slot, vdev->host.function);
@@ -583,7 +602,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
     qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx fd: %m");
@@ -608,7 +627,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
     timer_del(vdev->intx.mmap_timer);
     vfio_disable_intx_kvm(vdev);
-    vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
     vfio_mmap_set_enabled(vdev, true);
@@ -698,7 +717,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
         fds[i] = fd;
     }
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
     g_free(irq_set);
 
@@ -795,7 +814,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
      * increase them as needed.
      */
     if (vdev->nr_vectors < nr + 1) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
         vdev->nr_vectors = nr + 1;
         ret = vfio_enable_vectors(vdev, true);
         if (ret) {
@@ -823,7 +842,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             *pfd = event_notifier_get_fd(&vector->interrupt);
         }
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
         g_free(irq_set);
         if (ret) {
             error_report("vfio: failed to modify vector, %d", ret);
@@ -874,7 +893,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 
         *pfd = event_notifier_get_fd(&vector->interrupt);
 
-        ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
         g_free(irq_set);
     }
@@ -1033,7 +1052,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
     }
 
     if (vdev->nr_vectors) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
     }
 
     vfio_disable_msi_common(vdev);
@@ -1044,7 +1063,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
 
 static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
-    vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
 
     trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
@@ -1188,7 +1207,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     off_t off = 0;
     size_t bytes;
 
-    if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
+    if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
         error_report("vfio: Error getting ROM info: %m");
         return;
     }
@@ -1218,7 +1237,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     memset(vdev->rom, 0xff, size);
 
     while (size) {
-        bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
+        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+                      size, vdev->rom_offset + off);
         if (bytes == 0) {
             break;
         } else if (bytes > 0) {
@@ -1312,6 +1332,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
     DeviceState *dev = DEVICE(vdev);
     char name[32];
+    int fd = vdev->vbasedev.fd;
 
     if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
         /* Since pci handles romfile, just print a message and return */
@@ -1330,10 +1351,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
      * Use the same size ROM BAR as the physical device.  The contents
      * will get filled in later when the guest tries to read it.
      */
-    if (pread(vdev->fd, &orig, 4, offset) != 4 ||
-        pwrite(vdev->fd, &size, 4, offset) != 4 ||
-        pread(vdev->fd, &size, 4, offset) != 4 ||
-        pwrite(vdev->fd, &orig, 4, offset) != 4) {
+    if (pread(fd, &orig, 4, offset) != 4 ||
+        pwrite(fd, &size, 4, offset) != 4 ||
+        pread(fd, &size, 4, offset) != 4 ||
+        pwrite(fd, &orig, 4, offset) != 4) {
         error_report("%s(%04x:%02x:%02x.%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function);
@@ -2345,7 +2366,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
     if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
         ssize_t ret;
 
-        ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
+        ret = pread(vdev->vbasedev.fd, &phys_val, len,
+                    vdev->config_offset + addr);
         if (ret != len) {
             error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
                          __func__, vdev->host.domain, vdev->host.bus,
@@ -2375,7 +2397,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                 addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
-    if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
+    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
+                != len) {
         error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function, addr, val, len);
@@ -2743,7 +2766,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
     bool msi_64bit, msi_maskbit;
     int ret, entries;
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
@@ -2782,23 +2805,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
     uint8_t pos;
     uint16_t ctrl;
     uint32_t table, pba;
+    int fd = vdev->vbasedev.fd;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
     if (!pos) {
         return 0;
     }
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &table, sizeof(table),
+    if (pread(fd, &table, sizeof(table),
               vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &pba, sizeof(pba),
+    if (pread(fd, &pba, sizeof(pba),
               vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
         return -errno;
     }
@@ -2950,7 +2974,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
              vdev->host.function, nr);
 
     /* Determine what type of BAR this is for registration */
-    ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
+    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
                 vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
     if (ret != sizeof(pci_bar)) {
         error_report("vfio: Failed to read BAR %d (%m)", nr);
@@ -3365,12 +3389,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                              single ? "one" : "multi");
 
     vfio_pci_pre_reset(vdev);
-    vdev->needs_reset = false;
+    vdev->vbasedev.needs_reset = false;
 
     info = g_malloc0(sizeof(*info));
     info->argsz = sizeof(*info);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret && errno != ENOSPC) {
         ret = -errno;
         if (!vdev->has_pm_reset) {
@@ -3386,7 +3410,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     info->argsz = sizeof(*info) + (count * sizeof(*devices));
     devices = &info->devices[0];
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret) {
         ret = -errno;
         error_report("vfio: hot reset info failed: %m");
@@ -3402,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3433,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         }
 
         /* Prep dependent devices for reset and clear our marker. */
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 if (single) {
                     error_report("vfio: found another in-use device "
@@ -3443,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                     goto out_single;
                 }
                 vfio_pci_pre_reset(tmp);
-                tmp->needs_reset = false;
+                tmp->vbasedev.needs_reset = false;
                 multi = true;
                 break;
             }
@@ -3482,7 +3511,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     }
 
     /* Bus reset! */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
     g_free(reset);
 
     trace_vfio_pci_hot_reset_result(vdev->host.domain,
@@ -3496,6 +3525,7 @@ out:
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3516,7 +3546,11 @@ out:
             break;
         }
 
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 vfio_pci_post_reset(tmp);
                 break;
@@ -3550,28 +3584,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     return vfio_pci_hot_reset(vdev, false);
 }
 
-static void vfio_pci_reset_handler(void *opaque)
+static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+    if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+        vbasedev->needs_reset = true;
+    }
+    return vbasedev->needs_reset;
+}
+
+static VFIODeviceOps vfio_pci_ops = {
+    .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+};
+
+static void vfio_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIOPCIDevice *vdev;
+    VFIODevice *vbasedev;
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
-                vdev->needs_reset = true;
-            }
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
         }
     }
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (vdev->needs_reset) {
-                vfio_pci_hot_reset_multi(vdev);
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
             }
         }
     }
@@ -3860,7 +3907,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
     }
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_register_reset(vfio_pci_reset_handler, NULL);
+        qemu_register_reset(vfio_reset_handler, NULL);
     }
 
     QLIST_INSERT_HEAD(&group_list, group, next);
@@ -3892,7 +3939,7 @@ static void vfio_put_group(VFIOGroup *group)
     g_free(group);
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_unregister_reset(vfio_pci_reset_handler, NULL);
+        qemu_unregister_reset(vfio_reset_handler, NULL);
     }
 }
 
@@ -3913,12 +3960,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         return ret;
     }
 
-    vdev->fd = ret;
-    vdev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vdev, next);
+    vdev->vbasedev.fd = ret;
+    vdev->vbasedev.group = group;
+    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
 
     /* Sanity check device */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
     if (ret) {
         error_report("vfio: error getting device info: %m");
         goto error;
@@ -3932,7 +3979,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
 
     if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
         error_report("vfio: unexpected number of io regions %u",
@@ -3948,7 +3995,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
         if (ret) {
             error_report("vfio: Error getting region %d info: %m", i);
             goto error;
@@ -3962,14 +4009,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         vdev->bars[i].flags = reg_info.flags;
         vdev->bars[i].size = reg_info.size;
         vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->fd;
+        vdev->bars[i].fd = vdev->vbasedev.fd;
         vdev->bars[i].nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
     reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
     if (ret) {
         error_report("vfio: Error getting config info: %m");
         goto error;
@@ -3992,7 +4039,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
             .index = VFIO_PCI_VGA_REGION_INDEX,
          };
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
         if (ret) {
             error_report(
                 "vfio: Device does not support requested feature x-vga");
@@ -4009,7 +4056,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         }
 
         vdev->vga.fd_offset = vga_info.offset;
-        vdev->vga.fd = vdev->fd;
+        vdev->vga.fd = vdev->vbasedev.fd;
 
         vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
         vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
@@ -4027,7 +4074,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     }
     irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
         trace_vfio_get_device_get_irq_info_failure();
@@ -4043,19 +4090,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
 
 error:
     if (ret) {
-        QLIST_REMOVE(vdev, next);
-        vdev->group = NULL;
-        close(vdev->fd);
+        QLIST_REMOVE(&vdev->vbasedev, next);
+        vdev->vbasedev.group = NULL;
+        close(vdev->vbasedev.fd);
     }
     return ret;
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(vdev, next);
-    vdev->group = NULL;
-    trace_vfio_put_device(vdev->fd);
-    close(vdev->fd);
+    QLIST_REMOVE(&vdev->vbasedev, next);
+    vdev->vbasedev.group = NULL;
+    trace_vfio_put_device(vdev->vbasedev.fd);
+    close(vdev->vbasedev.fd);
+    g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
@@ -4124,7 +4172,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->err_notifier);
     qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to set up error notification");
         qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
@@ -4157,7 +4205,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
     pfd = (int32_t *)&irq_set->data;
     *pfd = -1;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to de-assign error fd: %m");
     }
@@ -4169,7 +4217,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIODevice *vbasedev_iter;
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4187,6 +4236,13 @@ static int vfio_initfn(PCIDevice *pdev)
         return -errno;
     }
 
+    vdev->vbasedev.ops = &vfio_pci_ops;
+
+    vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
+    g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
+            vdev->host.domain, vdev->host.bus, vdev->host.slot,
+            vdev->host.function);
+
     strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
 
     len = readlink(path, iommu_group_path, sizeof(path));
@@ -4216,12 +4272,8 @@ static int vfio_initfn(PCIDevice *pdev)
             vdev->host.domain, vdev->host.bus, vdev->host.slot,
             vdev->host.function);
 
-    QLIST_FOREACH(pvdev, &group->device_list, next) {
-        if (pvdev->host.domain == vdev->host.domain &&
-            pvdev->host.bus == vdev->host.bus &&
-            pvdev->host.slot == vdev->host.slot &&
-            pvdev->host.function == vdev->host.function) {
-
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
             error_report("vfio: error: device %s is already attached", path);
             vfio_put_group(group);
             return -EBUSY;
@@ -4236,7 +4288,7 @@ static int vfio_initfn(PCIDevice *pdev)
     }
 
     /* Get a copy of config space */
-    ret = pread(vdev->fd, vdev->pdev.config,
+    ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
                 vdev->config_offset);
     if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
@@ -4324,7 +4376,7 @@ out_put:
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
-    VFIOGroup *group = vdev->group;
+    VFIOGroup *group = vdev->vbasedev.group;
 
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
@@ -4350,8 +4402,9 @@ static void vfio_pci_reset(DeviceState *dev)
 
     vfio_pci_pre_reset(vdev);
 
-    if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works &&
+        (vdev->has_flr || !vdev->has_pm_reset) &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
                                   vdev->host.slot, vdev->host.function);
         goto post_reset;
@@ -4363,8 +4416,8 @@ static void vfio_pci_reset(DeviceState *dev)
     }
 
     /* If nothing else works and the device supports PM reset, use it */
-    if (vdev->reset_works && vdev->has_pm_reset &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
                                 vdev->host.slot, vdev->host.function);
         goto post_reset;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 04/16] hw/vfio/pci: Introduce VFIORegion
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (2 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.

vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.

vfio_mmap_bar becomes vfio_map_region

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- remove fd field from VFIORegion
- change error_report format string in vfio_region_write/read
- remove #ifdef DEBUG_VFIO in the same function
- correct missing initialization of bar region's vbasedev field
- change Object * parameter name of vfio_mmap_region and remove
  useless OBJECT()
---
 hw/vfio/pci.c | 193 ++++++++++++++++++++++++++++++----------------------------
 trace-events  |   4 +-
 2 files changed, 103 insertions(+), 94 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e2caa08..5e34504 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -78,15 +78,19 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIOBAR {
-    off_t fd_offset; /* offset of BAR within device fd */
-    int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
     MemoryRegion mem; /* slow, read/write access */
     MemoryRegion mmap_mem; /* direct mapped access */
     void *mmap;
     size_t size;
     uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the BAR number for debug */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOBAR {
+    VFIORegion region;
     bool ioport;
     bool mem64;
     QLIST_HEAD(, VFIOQuirk) quirks;
@@ -206,6 +210,7 @@ typedef struct VFIODevice {
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -389,8 +394,10 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIOPCIDevice *vdev)
+static void vfio_eoi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
     if (!vdev->intx.pending) {
         return;
     }
@@ -400,7 +407,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -553,7 +560,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_enable_intx_kvm(vdev);
 
     /* Re-enable the interrupt in cased we missed an EOI */
-    vfio_eoi(vdev);
+    vfio_eoi(&vdev->vbasedev);
 }
 
 static int vfio_enable_intx(VFIOPCIDevice *vdev)
@@ -1090,10 +1097,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
-static void vfio_bar_write(void *opaque, hwaddr addr,
-                           uint64_t data, unsigned size)
+static void vfio_region_write(void *opaque, hwaddr addr,
+                              uint64_t data, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1116,20 +1124,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
         break;
     }
 
-    if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
-                     __func__, addr, data, size);
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        trace_vfio_bar_write(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             region->nr, addr, data, size);
-    }
-#endif
+    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
 
     /*
      * A read or write to a BAR always signals an INTx EOI.  This will
@@ -1139,13 +1141,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
 }
 
-static uint64_t vfio_bar_read(void *opaque,
-                              hwaddr addr, unsigned size)
+static uint64_t vfio_region_read(void *opaque,
+                                 hwaddr addr, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1154,9 +1157,10 @@ static uint64_t vfio_bar_read(void *opaque,
     } buf;
     uint64_t data = 0;
 
-    if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, addr, size);
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
         return (uint64_t)-1;
     }
 
@@ -1175,25 +1179,17 @@ static uint64_t vfio_bar_read(void *opaque,
         break;
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        trace_vfio_bar_read(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            region->nr, addr, size, data);
-    }
-#endif
+    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
 
     return data;
 }
 
-static const MemoryRegionOps vfio_bar_ops = {
-    .read = vfio_bar_read,
-    .write = vfio_bar_write,
+static const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
@@ -1530,8 +1526,8 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                              quirk->data.bar,
                                              addr, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar],
-                             addr + quirk->data.base_offset, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + quirk->data.base_offset, size);
     }
 
     return data;
@@ -1585,7 +1581,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
         return;
     }
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar],
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
                    addr + quirk->data.base_offset, data, size);
 }
 
@@ -1622,7 +1618,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                                       quirk->data.bar,
                                       addr + base, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + base, size);
     }
 
     return data;
@@ -1654,7 +1651,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                                        quirk->data.bar,
                                        addr + base, data, size);
     } else {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -1707,7 +1705,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
      * As long as the BAR is >= 256 bytes it will be aligned such that the
      * lower byte is always zero.  Filter out anything else, if it exists.
      */
-    if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
+    if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
         return;
     }
 
@@ -1759,7 +1757,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_generic_window_quirk, quirk,
                           "vfio-ati-bar4-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.base_offset, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1838,7 +1836,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
                         vdev->host.domain, vdev->host.bus,
                         vdev->host.slot, vdev->host.function);
 
-    return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
+    return vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                            addr + 0x70, size);
 }
 
 static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
@@ -1880,7 +1879,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
             vdev->host.domain, vdev->host.bus,
             vdev->host.slot, vdev->host.function);
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                      addr + 0x70, data, size);
 }
 
 static const MemoryRegionOps vfio_rtl8168_window_quirk = {
@@ -1910,7 +1910,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
                           quirk, "vfio-rtl8168-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                                         0x70, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1944,7 +1944,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-ati-bar2-4000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2064,7 +2064,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
     VFIOQuirk *quirk;
 
     if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
-        !vdev->bars[1].size) {
+        !vdev->bars[1].region.size) {
         return;
     }
 
@@ -2173,7 +2173,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_nvidia_bar5_window_quirk, quirk,
                           "vfio-nvidia-bar5-window-quirk", 16);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+                                        0, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
@@ -2201,7 +2202,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
      */
     if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
         vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -2244,7 +2246,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2271,7 +2273,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     /* Log the chipset ID */
     trace_vfio_probe_nvidia_bar0_1800_quirk_id(
-            (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
+            (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
+            & 0xff);
 
     quirk = g_malloc0(sizeof(*quirk));
     quirk->vdev = vdev;
@@ -2283,7 +2286,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-nvidia-bar0-1800-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2341,7 +2344,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 
     while (!QLIST_EMPTY(&bar->quirks)) {
         VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
-        memory_region_del_subregion(&bar->mem, &quirk->mem);
+        memory_region_del_subregion(&bar->region.mem, &quirk->mem);
         object_unparent(OBJECT(&quirk->mem));
         QLIST_REMOVE(quirk, next);
         g_free(quirk);
@@ -2852,9 +2855,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
     int ret;
 
     ret = msix_init(&vdev->pdev, vdev->msix->entries,
-                    &vdev->bars[vdev->msix->table_bar].mem,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
                     vdev->msix->table_bar, vdev->msix->table_offset,
-                    &vdev->bars[vdev->msix->pba_bar].mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem,
                     vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
@@ -2872,8 +2875,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
     msi_uninit(&vdev->pdev);
 
     if (vdev->msix) {
-        msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
-                    &vdev->bars[vdev->msix->pba_bar].mem);
+        msix_uninit(&vdev->pdev,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem);
     }
 }
 
@@ -2887,11 +2891,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
     for (i = 0; i < PCI_ROM_SLOT; i++) {
         VFIOBAR *bar = &vdev->bars[i];
 
-        if (!bar->size) {
+        if (!bar->region.size) {
             continue;
         }
 
-        memory_region_set_enabled(&bar->mmap_mem, enabled);
+        memory_region_set_enabled(&bar->region.mmap_mem, enabled);
         if (vdev->msix && vdev->msix->table_bar == i) {
             memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
         }
@@ -2902,52 +2906,54 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
-    if (!bar->size) {
+    if (!bar->region.size) {
         return;
     }
 
     vfio_bar_quirk_teardown(vdev, nr);
 
-    memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
-    munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
+    memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
+    munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
 
     if (vdev->msix && vdev->msix->table_bar == nr) {
-        memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
+        memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
         munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
     }
 }
 
-static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
-                         MemoryRegion *mem, MemoryRegion *submem,
-                         void **map, size_t size, off_t offset,
-                         const char *name)
+static int vfio_mmap_region(Object *obj, VFIORegion *region,
+                            MemoryRegion *mem, MemoryRegion *submem,
+                            void **map, size_t size, off_t offset,
+                            const char *name)
 {
     int ret = 0;
+    VFIODevice *vbasedev = region->vbasedev;
 
-    if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
         int prot = 0;
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
             prot |= PROT_READ;
         }
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
             prot |= PROT_WRITE;
         }
 
         *map = mmap(NULL, size, prot, MAP_SHARED,
-                    bar->fd, bar->fd_offset + offset);
+                    vbasedev->fd, region->fd_offset + offset);
         if (*map == MAP_FAILED) {
             *map = NULL;
             ret = -errno;
             goto empty_region;
         }
 
-        memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
     } else {
 empty_region:
         /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, OBJECT(vdev), name, 0);
+        memory_region_init(submem, obj, name, 0);
     }
 
     memory_region_add_subregion(mem, offset, submem);
@@ -2958,7 +2964,7 @@ empty_region:
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
-    unsigned size = bar->size;
+    unsigned size = bar->region.size;
     char name[64];
     uint32_t pci_bar;
     uint8_t type;
@@ -2988,9 +2994,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
                                     ~PCI_BASE_ADDRESS_MEM_MASK);
 
     /* A "slow" read/write mapping underlies all BARs */
-    memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
+    memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
                           bar, name, size);
-    pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
+    pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
 
     /*
      * We can't mmap areas overlapping the MSIX vector table, so we
@@ -3001,8 +3007,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
     }
 
     strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
-    if (vfio_mmap_bar(vdev, bar, &bar->mem,
-                      &bar->mmap_mem, &bar->mmap, size, 0, name)) {
+    if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                      &bar->region.mmap_mem, &bar->region.mmap,
+                      size, 0, name)) {
         error_report("%s unsupported. Performance may be slow", name);
     }
 
@@ -3012,10 +3019,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
         start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
                                 (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
 
-        size = start < bar->size ? bar->size - start : 0;
+        size = start < bar->region.size ? bar->region.size - start : 0;
         strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
         /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
-        if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
+        if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                          &vdev->msix->mmap_mem,
                           &vdev->msix->mmap, size, start, name)) {
             error_report("%s unsupported. Performance may be slow", name);
         }
@@ -3602,6 +3610,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
 static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+    .vfio_eoi = vfio_eoi,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -4006,11 +4015,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
                                      (unsigned long)reg_info.offset,
                                      (unsigned long)reg_info.flags);
 
-        vdev->bars[i].flags = reg_info.flags;
-        vdev->bars[i].size = reg_info.size;
-        vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->vbasedev.fd;
-        vdev->bars[i].nr = i;
+        vdev->bars[i].region.vbasedev = &vdev->vbasedev;
+        vdev->bars[i].region.flags = reg_info.flags;
+        vdev->bars[i].region.size = reg_info.size;
+        vdev->bars[i].region.fd_offset = reg_info.offset;
+        vdev->bars[i].region.nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
diff --git a/trace-events b/trace-events
index 7931760..0b46a42 100644
--- a/trace-events
+++ b/trace-events
@@ -1365,8 +1365,8 @@ vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
 vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
 vfio_pci_reset_pm(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x PCI PM Reset"
 
-vfio_bar_write(int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, unsigned size) " (%04x:%02x:%02x.%x:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
-vfio_bar_read(int domain, int bus, int slot, int fn, int index, uint64_t addr, unsigned size, uint64_t data) " (%04x:%02x:%02x.%x:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
+vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
+vfio_region_read(const char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add %"PRIx64" - %"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] %"PRIx64" - %"PRIx64
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 05/16] hw/vfio/pci: split vfio_get_device
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (3 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

vfio_get_device now takes a VFIODevice as argument. The function is split
into 2 parts: vfio_get_device which is generic and vfio_populate_device
which is bus specific.

3 new fields are introduced in VFIODevice to store dev_info.

vfio_put_base_device is created.

---

v5->v6:
- simplifies the split for vfio_get_device:
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts
  are now gathered into a unique specialization function dubbed
  vfio_populate_device

v4->v5:
- cleanup up of error handling and get/put operations in
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts and
  vfio_get_device.
  - correct misuse of errno
  - vfio_populate_regions always returns 0
  - VFIODevice .name deallocation done in vfio_put_device instead of
    vfio_put_base_device
  - vfio_put_base_device done at vfio_get_device level.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 130 +++++++++++++++++++++++++++++++++++-----------------------
 trace-events  |  10 ++---
 2 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5e34504..d48ca04 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -205,12 +205,16 @@ typedef struct VFIODevice {
     bool reset_works;
     bool needs_reset;
     VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
 } VFIODevice;
 
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
     void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_populate_device)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -297,6 +301,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_put_base_device(VFIODevice *vbasedev);
+static int vfio_populate_device(VFIODevice *vbasedev);
 
 /*
  * Common VFIO interrupt disable
@@ -3611,6 +3617,7 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
     .vfio_eoi = vfio_eoi,
+    .vfio_populate_device = vfio_populate_device,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -3952,70 +3959,45 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIOPCIDevice *vdev)
+static int vfio_populate_device(VFIODevice *vbasedev)
 {
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
     struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-    int ret, i;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-pci "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
-    }
-
-    vdev->vbasedev.fd = ret;
-    vdev->vbasedev.group = group;
-    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
+    int i, ret = -1;
 
     /* Sanity check device */
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
-        goto error;
-    }
-
-    trace_vfio_get_device_irq(name, dev_info.flags,
-                              dev_info.num_regions, dev_info.num_irqs);
-
-    if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
+    if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
         error_report("vfio: Um, this isn't a PCI device");
         goto error;
     }
 
-    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-    if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
+    if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
         error_report("vfio: unexpected number of io regions %u",
-                     dev_info.num_regions);
+                     vbasedev->num_regions);
         goto error;
     }
 
-    if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
-        error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
+    if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
+        error_report("vfio: unexpected number of irqs %u", vbasedev->num_irqs);
         goto error;
     }
 
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
 
-        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
         if (ret) {
             error_report("vfio: Error getting region %d info: %m", i);
             goto error;
         }
 
-        trace_vfio_get_device_region(name, i,
-                                     (unsigned long)reg_info.size,
-                                     (unsigned long)reg_info.offset,
-                                     (unsigned long)reg_info.flags);
+        trace_vfio_populate_device_region(vbasedev->name, i,
+                                          (unsigned long)reg_info.size,
+                                          (unsigned long)reg_info.offset,
+                                          (unsigned long)reg_info.flags);
 
-        vdev->bars[i].region.vbasedev = &vdev->vbasedev;
+        vdev->bars[i].region.vbasedev = vbasedev;
         vdev->bars[i].region.flags = reg_info.flags;
         vdev->bars[i].region.size = reg_info.size;
         vdev->bars[i].region.fd_offset = reg_info.offset;
@@ -4031,9 +4013,10 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    trace_vfio_get_device_config(name, (unsigned long)reg_info.size,
-                                 (unsigned long)reg_info.offset,
-                                 (unsigned long)reg_info.flags);
+    trace_vfio_populate_device_config(vdev->vbasedev.name,
+                                      (unsigned long)reg_info.size,
+                                      (unsigned long)reg_info.offset,
+                                      (unsigned long)reg_info.flags);
 
     vdev->config_size = reg_info.size;
     if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
@@ -4042,7 +4025,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     vdev->config_offset = reg_info.offset;
 
     if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
-        dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+        vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
         struct vfio_region_info vga_info = {
             .argsz = sizeof(vga_info),
             .index = VFIO_PCI_VGA_REGION_INDEX,
@@ -4086,7 +4069,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
-        trace_vfio_get_device_get_irq_info_failure();
+        trace_vfio_populate_device_get_irq_info_failure();
         ret = 0;
     } else if (irq_info.count == 1) {
         vdev->pci_aer = true;
@@ -4098,25 +4081,68 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     }
 
 error:
+    return ret;
+}
+
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    int ret;
+
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    trace_vfio_get_device(name, dev_info.flags,
+                          dev_info.num_regions, dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    ret = vbasedev->ops->vfio_populate_device(vbasedev);
+
+error:
     if (ret) {
-        QLIST_REMOVE(&vdev->vbasedev, next);
-        vdev->vbasedev.group = NULL;
-        close(vdev->vbasedev.fd);
+        vfio_put_base_device(vbasedev);
     }
     return ret;
 }
 
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    trace_vfio_put_base_device(vbasedev->fd);
+    close(vbasedev->fd);
+}
+
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(&vdev->vbasedev, next);
-    vdev->vbasedev.group = NULL;
-    trace_vfio_put_device(vdev->vbasedev.fd);
-    close(vdev->vbasedev.fd);
     g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
     }
+    vfio_put_base_device(&vdev->vbasedev);
 }
 
 static void vfio_err_notifier_handler(void *opaque)
@@ -4289,7 +4315,7 @@ static int vfio_initfn(PCIDevice *pdev)
         }
     }
 
-    ret = vfio_get_device(group, path, vdev);
+    ret = vfio_get_device(group, path, &vdev->vbasedev);
     if (ret) {
         error_report("vfio: failed to get device %s", path);
         vfio_put_group(group);
diff --git a/trace-events b/trace-events
index 0b46a42..bcdffac 100644
--- a/trace-events
+++ b/trace-events
@@ -1356,10 +1356,10 @@ vfio_pci_hot_reset(int domain, int bus, int slot, int fn, const char *type) " (%
 vfio_pci_hot_reset_has_dep_devices(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x: hot reset dependent devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
 vfio_pci_hot_reset_result(int domain, int bus, int slot, int fn, const char *result) "%04x:%02x:%02x.%x hot reset: %s"
-vfio_get_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_get_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_get_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
-vfio_get_device_irq(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
+vfio_populate_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
+vfio_get_device(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
 vfio_initfn(int domain, int bus, int slot, int fn, int group_id) " (%04x:%02x:%02x.%x) group %d"
 vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
 vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
@@ -1375,7 +1375,7 @@ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del %"PRIx64" - %"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
-vfio_put_device(int fd) "close vdev->fd=%d"
+vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 06/16] hw/vfio/pci: rename group_list into vfio_group_list
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (4 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 07/16] hw/vfio/pci: use name field in format strings Eric Auger
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

better fit in the rest of the namespace

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d48ca04..5623539 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -283,7 +283,7 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 #define MSIX_CAP_LENGTH 12
 
 static QLIST_HEAD(, VFIOGroup)
-    group_list = QLIST_HEAD_INITIALIZER(group_list);
+    vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list);
 
 #ifdef CONFIG_KVM
 /*
@@ -3454,7 +3454,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
             continue;
         }
 
-        QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(group, &vfio_group_list, next) {
             if (group->groupid == devices[i].group_id) {
                 break;
             }
@@ -3501,7 +3501,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 
     /* Determine how many group fds need to be passed */
     count = 0;
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         for (i = 0; i < info->count; i++) {
             if (group->groupid == devices[i].group_id) {
                 count++;
@@ -3515,7 +3515,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     fds = &reset->group_fds[0];
 
     /* Fill in group fds */
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         for (i = 0; i < info->count; i++) {
             if (group->groupid == devices[i].group_id) {
                 fds[reset->count++] = group->fd;
@@ -3550,7 +3550,7 @@ out:
             continue;
         }
 
-        QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(group, &vfio_group_list, next) {
             if (group->groupid == devices[i].group_id) {
                 break;
             }
@@ -3625,13 +3625,13 @@ static void vfio_reset_handler(void *opaque)
     VFIOGroup *group;
     VFIODevice *vbasedev;
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             vbasedev->ops->vfio_compute_needs_reset(vbasedev);
         }
     }
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             if (vbasedev->needs_reset) {
                 vbasedev->ops->vfio_hot_reset_multi(vbasedev);
@@ -3880,7 +3880,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
     char path[32];
     struct vfio_group_status status = { .argsz = sizeof(status) };
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
             if (group->container->space->as == as) {
@@ -3922,11 +3922,11 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
         goto close_fd_exit;
     }
 
-    if (QLIST_EMPTY(&group_list)) {
+    if (QLIST_EMPTY(&vfio_group_list)) {
         qemu_register_reset(vfio_reset_handler, NULL);
     }
 
-    QLIST_INSERT_HEAD(&group_list, group, next);
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
     vfio_kvm_device_add_group(group);
 
@@ -3954,7 +3954,7 @@ static void vfio_put_group(VFIOGroup *group)
     close(group->fd);
     g_free(group);
 
-    if (QLIST_EMPTY(&group_list)) {
+    if (QLIST_EMPTY(&vfio_group_list)) {
         qemu_unregister_reset(vfio_reset_handler, NULL);
     }
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 07/16] hw/vfio/pci: use name field in format strings
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (5 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module Eric Auger
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 213 ++++++++++++++++------------------------------------------
 trace-events  | 105 ++++++++++++++---------------
 2 files changed, 111 insertions(+), 207 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5623539..c617b79 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -387,9 +387,7 @@ static void vfio_intx_interrupt(void *opaque)
         return;
     }
 
-    trace_vfio_intx_interrupt(vdev->host.domain, vdev->host.bus,
-                              vdev->host.slot, vdev->host.function,
-                              'A' + vdev->intx.pin);
+    trace_vfio_intx_interrupt(vdev->vbasedev.name, 'A' + vdev->intx.pin);
 
     vdev->intx.pending = true;
     pci_irq_assert(&vdev->pdev);
@@ -408,8 +406,7 @@ static void vfio_eoi(VFIODevice *vbasedev)
         return;
     }
 
-    trace_vfio_eoi(vdev->host.domain, vdev->host.bus,
-                   vdev->host.slot, vdev->host.function);
+    trace_vfio_eoi(vbasedev->name);
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
@@ -478,8 +475,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     vdev->intx.kvm_accel = true;
 
-    trace_vfio_enable_intx_kvm(vdev->host.domain, vdev->host.bus,
-                               vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_intx_kvm(vdev->vbasedev.name);
 
     return;
 
@@ -531,8 +527,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
     /* If we've missed an event, let it re-fire through QEMU */
     vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
-    trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_intx_kvm(vdev->vbasedev.name);
 #endif
 }
 
@@ -551,8 +546,7 @@ static void vfio_update_irq(PCIDevice *pdev)
         return; /* Nothing changed */
     }
 
-    trace_vfio_update_irq(vdev->host.domain, vdev->host.bus,
-                          vdev->host.slot, vdev->host.function,
+    trace_vfio_update_irq(vdev->vbasedev.name,
                           vdev->intx.route.irq, route.irq);
 
     vfio_disable_intx_kvm(vdev);
@@ -628,8 +622,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
 
     vdev->interrupt = VFIO_INT_INTx;
 
-    trace_vfio_enable_intx(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_intx(vdev->vbasedev.name);
 
     return 0;
 }
@@ -651,8 +644,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
     vdev->interrupt = VFIO_INT_NONE;
 
-    trace_vfio_disable_intx(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_intx(vdev->vbasedev.name);
 }
 
 /*
@@ -679,9 +671,7 @@ static void vfio_msi_interrupt(void *opaque)
         abort();
     }
 
-    trace_vfio_msi_interrupt(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             nr, msg.address, msg.data);
+    trace_vfio_msi_interrupt(vbasedev->name, nr, msg.address, msg.data);
 #endif
 
     if (vdev->interrupt == VFIO_INT_MSIX) {
@@ -788,9 +778,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     VFIOMSIVector *vector;
     int ret;
 
-    trace_vfio_msix_vector_do_use(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function,
-                                  nr);
+    trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
 
     vector = &vdev->msi_vectors[nr];
 
@@ -876,9 +864,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
-    trace_vfio_msix_vector_release(vdev->host.domain, vdev->host.bus,
-                                   vdev->host.slot, vdev->host.function,
-                                   nr);
+    trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
 
     /*
      * There are still old guests that mask and unmask vectors on every
@@ -941,8 +927,7 @@ static void vfio_enable_msix(VFIOPCIDevice *vdev)
         error_report("vfio: msix_set_vector_notifiers failed");
     }
 
-    trace_vfio_enable_msix(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_msix(vdev->vbasedev.name);
 }
 
 static void vfio_enable_msi(VFIOPCIDevice *vdev)
@@ -1018,9 +1003,7 @@ retry:
         return;
     }
 
-    trace_vfio_enable_msi(vdev->host.domain, vdev->host.bus,
-                          vdev->host.slot, vdev->host.function,
-                          vdev->nr_vectors);
+    trace_vfio_enable_msi(vdev->vbasedev.name, vdev->nr_vectors);
 }
 
 static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
@@ -1070,8 +1053,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
 
     vfio_disable_msi_common(vdev);
 
-    trace_vfio_disable_msix(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_msix(vdev->vbasedev.name);
 }
 
 static void vfio_disable_msi(VFIOPCIDevice *vdev)
@@ -1079,8 +1061,7 @@ static void vfio_disable_msi(VFIOPCIDevice *vdev)
     vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
 
-    trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_msi(vdev->vbasedev.name);
 }
 
 static void vfio_update_msi(VFIOPCIDevice *vdev)
@@ -1214,9 +1195,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
         return;
     }
 
-    trace_vfio_pci_load_rom(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            (unsigned long)reg_info.size,
+    trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info.size,
                             (unsigned long)reg_info.offset,
                             (unsigned long)reg_info.flags);
 
@@ -1226,9 +1205,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     if (!vdev->rom_size) {
         vdev->rom_read_failed = true;
         error_report("vfio-pci: Cannot read device rom at "
-                    "%04x:%02x:%02x.%x",
-                    vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                    vdev->host.function);
+                    "%s", vdev->vbasedev.name);
         error_printf("Device option ROM contents are probably invalid "
                     "(check dmesg).\nSkip option ROM probe with rombar=0, "
                     "or load from file with romfile=\n");
@@ -1290,9 +1267,7 @@ static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
         break;
     }
 
-    trace_vfio_rom_read(vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function,
-                        addr, size, data);
+    trace_vfio_rom_read(vdev->vbasedev.name, addr, size, data);
 
     return data;
 }
@@ -1389,9 +1364,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
         }
     }
 
-    trace_vfio_pci_size_rom(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            size);
+    trace_vfio_pci_size_rom(vdev->vbasedev.name, size);
 
     snprintf(name, sizeof(name), "vfio[%04x:%02x:%02x.%x].rom",
              vdev->host.domain, vdev->host.bus, vdev->host.slot,
@@ -1525,10 +1498,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                     quirk->data.address_val + offset, size);
 
         trace_vfio_generic_window_quirk_read(memory_region_name(&quirk->mem),
-                                             vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function,
+                                             vdev->vbasedev.name,
                                              quirk->data.bar,
                                              addr, size, data);
     } else {
@@ -1576,14 +1546,10 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
 
         vfio_pci_write_config(&vdev->pdev,
                               quirk->data.address_val + offset, data, size);
-
         trace_vfio_generic_window_quirk_write(memory_region_name(&quirk->mem),
-                                             vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function,
-                                             quirk->data.bar,
-                                             addr, data, size);
+                                              vdev->vbasedev.name,
+                                              quirk->data.bar,
+                                              addr, data, size);
         return;
     }
 
@@ -1617,11 +1583,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
         data = vfio_pci_read_config(&vdev->pdev, addr - offset, size);
 
         trace_vfio_generic_quirk_read(memory_region_name(&quirk->mem),
-                                      vdev->host.domain,
-                                      vdev->host.bus,
-                                      vdev->host.slot,
-                                      vdev->host.function,
-                                      quirk->data.bar,
+                                      vdev->vbasedev.name, quirk->data.bar,
                                       addr + base, size, data);
     } else {
         data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
@@ -1650,11 +1612,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
         vfio_pci_write_config(&vdev->pdev, addr - offset, data, size);
 
         trace_vfio_generic_quirk_write(memory_region_name(&quirk->mem),
-                                       vdev->host.domain,
-                                       vdev->host.bus,
-                                       vdev->host.slot,
-                                       vdev->host.function,
-                                       quirk->data.bar,
+                                       vdev->vbasedev.name, quirk->data.bar,
                                        addr + base, data, size);
     } else {
         vfio_region_write(&vdev->bars[quirk->data.bar].region,
@@ -1726,8 +1684,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
     QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
                       quirk, next);
 
-    trace_vfio_vga_probe_ati_3c3_quirk(vdev->host.domain, vdev->host.bus,
-                                       vdev->host.slot, vdev->host.function);
+    trace_vfio_vga_probe_ati_3c3_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -1768,10 +1725,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_ati_bar4_window_quirk(vdev->host.domain,
-                                           vdev->host.bus,
-                                           vdev->host.slot,
-                                           vdev->host.function);
+    trace_vfio_probe_ati_bar4_window_quirk(vdev->vbasedev.name);
 }
 
 #define PCI_VENDOR_ID_REALTEK 0x10ec
@@ -1810,8 +1764,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
         if (quirk->data.flags) {
             trace_vfio_rtl8168_window_quirk_read_fake(
                     memory_region_name(&quirk->mem),
-                    vdev->host.domain, vdev->host.bus,
-                    vdev->host.slot, vdev->host.function);
+                    vdev->vbasedev.name);
 
             return quirk->data.address_match ^ 0x10000000U;
         }
@@ -1822,9 +1775,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
 
             trace_vfio_rtl8168_window_quirk_read_table(
                     memory_region_name(&quirk->mem),
-                    vdev->host.domain, vdev->host.bus,
-                    vdev->host.slot, vdev->host.function
-               );
+                    vdev->vbasedev.name);
 
             if (!(vdev->pdev.cap_present & QEMU_PCI_CAP_MSIX)) {
                 return 0;
@@ -1837,10 +1788,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
         }
     }
 
-    trace_vfio_rtl8168_window_quirk_read_direct(
-                        memory_region_name(&quirk->mem),
-                        vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function);
+    trace_vfio_rtl8168_window_quirk_read_direct(memory_region_name(&quirk->mem),
+                                                vdev->vbasedev.name);
 
     return vfio_region_read(&vdev->bars[quirk->data.bar].region,
                             addr + 0x70, size);
@@ -1860,8 +1809,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
 
                 trace_vfio_rtl8168_window_quirk_write_table(
                         memory_region_name(&quirk->mem),
-                        vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function);
+                        vdev->vbasedev.name);
 
                 io_mem_write(&vdev->pdev.msix_table_mmio,
                              (hwaddr)(quirk->data.address_match & 0xfff),
@@ -1882,8 +1830,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
 
     trace_vfio_rtl8168_window_quirk_write_direct(
             memory_region_name(&quirk->mem),
-            vdev->host.domain, vdev->host.bus,
-            vdev->host.slot, vdev->host.function);
+            vdev->vbasedev.name);
 
     vfio_region_write(&vdev->bars[quirk->data.bar].region,
                       addr + 0x70, data, size);
@@ -1921,10 +1868,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_rtl8168_bar2_window_quirk(vdev->host.domain,
-                                               vdev->host.bus,
-                                               vdev->host.slot,
-                                               vdev->host.function);
+    trace_vfio_probe_rtl8168_bar2_window_quirk(vdev->vbasedev.name);
 }
 /*
  * Trap the BAR2 MMIO window to config space as well.
@@ -1956,10 +1900,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_ati_bar2_4000_quirk(vdev->host.domain,
-                                         vdev->host.bus,
-                                         vdev->host.slot,
-                                         vdev->host.function);
+    trace_vfio_probe_ati_bar2_4000_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2092,10 +2033,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
     QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
                       quirk, next);
 
-    trace_vfio_vga_probe_nvidia_3d0_quirk(vdev->host.domain,
-                                          vdev->host.bus,
-                                          vdev->host.slot,
-                                          vdev->host.function);
+    trace_vfio_vga_probe_nvidia_3d0_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2184,10 +2122,7 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar5_window_quirk(vdev->host.domain,
-                                              vdev->host.bus,
-                                              vdev->host.slot,
-                                              vdev->host.function);
+    trace_vfio_probe_nvidia_bar5_window_quirk(vdev->vbasedev.name);
 }
 
 static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
@@ -2258,10 +2193,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar0_88000_quirk(vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function);
+    trace_vfio_probe_nvidia_bar0_88000_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2298,10 +2230,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar0_1800_quirk(vdev->host.domain,
-                                            vdev->host.bus,
-                                            vdev->host.slot,
-                                            vdev->host.function);
+    trace_vfio_probe_nvidia_bar0_1800_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2388,9 +2317,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 
     val = (emu_val & emu_bits) | (phys_val & ~emu_bits);
 
-    trace_vfio_pci_read_config(vdev->host.domain, vdev->host.bus,
-                               vdev->host.slot, vdev->host.function,
-                               addr, len, val);
+    trace_vfio_pci_read_config(vdev->vbasedev.name, addr, len, val);
 
     return val;
 }
@@ -2401,9 +2328,7 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t val_le = cpu_to_le32(val);
 
-    trace_vfio_pci_write_config(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function,
-                                addr, val, len);
+    trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
     if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
@@ -2540,7 +2465,7 @@ static void vfio_iommu_map_notify(Notifier *n, void *data)
                                  &xlat, &len, iotlb->perm & IOMMU_WO);
     if (!memory_region_is_ram(mr)) {
         error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
-                xlat);
+                     xlat);
         return;
     }
     /*
@@ -2785,8 +2710,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
     msi_maskbit = !!(ctrl & PCI_MSI_FLAGS_MASKBIT);
     entries = 1 << ((ctrl & PCI_MSI_FLAGS_QMASK) >> 1);
 
-    trace_vfio_setup_msi(vdev->host.domain, vdev->host.bus,
-                         vdev->host.slot, vdev->host.function, pos);
+    trace_vfio_setup_msi(vdev->vbasedev.name, pos);
 
     ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit);
     if (ret < 0) {
@@ -2847,9 +2771,8 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
     vdev->msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
     vdev->msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
 
-    trace_vfio_early_setup_msix(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function,
-                                pos, vdev->msix->table_bar,
+    trace_vfio_early_setup_msix(vdev->vbasedev.name, pos,
+                                vdev->msix->table_bar,
                                 vdev->msix->table_offset,
                                 vdev->msix->entries);
 
@@ -3224,8 +3147,7 @@ static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
     uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
 
     if (cap & PCI_EXP_DEVCAP_FLR) {
-        trace_vfio_check_pcie_flr(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_check_pcie_flr(vdev->vbasedev.name);
         vdev->has_flr = true;
     }
 }
@@ -3235,8 +3157,7 @@ static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
     uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
 
     if (!(csr & PCI_PM_CTRL_NO_SOFT_RESET)) {
-        trace_vfio_check_pm_reset(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_check_pm_reset(vdev->vbasedev.name);
         vdev->has_pm_reset = true;
     }
 }
@@ -3246,8 +3167,7 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
     uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
 
     if ((cap & PCI_AF_CAP_TP) && (cap & PCI_AF_CAP_FLR)) {
-        trace_vfio_check_af_flr(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+        trace_vfio_check_af_flr(vdev->vbasedev.name);
         vdev->has_flr = true;
     }
 }
@@ -3398,9 +3318,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     int ret, i, count;
     bool multi = false;
 
-    trace_vfio_pci_hot_reset(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             single ? "one" : "multi");
+    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
     vfio_pci_pre_reset(vdev);
     vdev->vbasedev.needs_reset = false;
@@ -3431,10 +3349,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         goto out_single;
     }
 
-    trace_vfio_pci_hot_reset_has_dep_devices(vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function);
+    trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
@@ -3462,10 +3377,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 
         if (!group) {
             if (!vdev->has_pm_reset) {
-                error_report("vfio: Cannot reset device %04x:%02x:%02x.%x, "
+                error_report("vfio: Cannot reset device %s, "
                              "depends on group %d which is not owned.",
-                             vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                             vdev->host.function, devices[i].group_id);
+                             vdev->vbasedev.name, devices[i].group_id);
             }
             ret = -EPERM;
             goto out;
@@ -3480,8 +3394,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 if (single) {
                     error_report("vfio: found another in-use device "
-                            "%04x:%02x:%02x.%x\n", host.domain, host.bus,
-                            host.slot, host.function);
+                            "%s\n", vbasedev_iter->name);
                     ret = -EINVAL;
                     goto out_single;
                 }
@@ -3528,10 +3441,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
     g_free(reset);
 
-    trace_vfio_pci_hot_reset_result(vdev->host.domain,
-                                    vdev->host.bus,
-                                    vdev->host.slot,
-                                    vdev->host.function,
+    trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
                                     ret ? "%m" : "Success");
 
 out:
@@ -4074,10 +3984,9 @@ static int vfio_populate_device(VFIODevice *vbasedev)
     } else if (irq_info.count == 1) {
         vdev->pci_aer = true;
     } else {
-        error_report("vfio: %04x:%02x:%02x.%x "
+        error_report("vfio: %s "
                      "Could not enable error recovery for the device",
-                     vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                     vdev->host.function);
+                     vbasedev->name);
     }
 
 error:
@@ -4294,8 +4203,7 @@ static int vfio_initfn(PCIDevice *pdev)
         return -errno;
     }
 
-    trace_vfio_initfn(vdev->host.domain, vdev->host.bus,
-                      vdev->host.slot, vdev->host.function, groupid);
+    trace_vfio_initfn(vdev->vbasedev.name, groupid);
 
     group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev));
     if (!group) {
@@ -4432,16 +4340,14 @@ static void vfio_pci_reset(DeviceState *dev)
     PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
-    trace_vfio_pci_reset(vdev->host.domain, vdev->host.bus,
-                         vdev->host.slot, vdev->host.function);
+    trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
 
     if (vdev->vbasedev.reset_works &&
         (vdev->has_flr || !vdev->has_pm_reset) &&
         !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
-        trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_pci_reset_flr(vdev->vbasedev.name);
         goto post_reset;
     }
 
@@ -4453,8 +4359,7 @@ static void vfio_pci_reset(DeviceState *dev)
     /* If nothing else works and the device supports PM reset, use it */
     if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
         !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
-        trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+        trace_vfio_pci_reset_pm(vdev->vbasedev.name);
         goto post_reset;
     }
 
diff --git a/trace-events b/trace-events
index bcdffac..151d5bd 100644
--- a/trace-events
+++ b/trace-events
@@ -1299,74 +1299,72 @@ pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsi
 pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
 
 # hw/vfio/vfio-pci.c
-#forced to add a white space before ( due to parsing error
-vfio_intx_interrupt(int domain, int bus, int slot, int fn, char line) " (%04x:%02x:%02x.%x) Pin %c"
-vfio_eoi(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x) EOI"
-vfio_enable_intx_kvm(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x) KVM INTx accel enabled"
-vfio_disable_intx_kvm(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x) KVM INTx accel disabled"
-vfio_update_irq(int domain, int bus, int slot, int fn, int new_irq, int target_irq) " (%04x:%02x:%02x.%x) IRQ moved %d -> %d"
-vfio_enable_intx(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_disable_intx(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_msi_interrupt(int domain, int bus, int slot, int fn, int index, uint64_t addr, int data) " (%04x:%02x:%02x.%x) vector %d 0x%"PRIx64"/0x%x"
-vfio_msix_vector_do_use(int domain, int bus, int slot, int fn, int index) " (%04x:%02x:%02x.%x) vector %d used"
-vfio_msix_vector_release(int domain, int bus, int slot, int fn, int index) " (%04x:%02x:%02x.%x) vector %d released"
-vfio_enable_msix(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_enable_msi(int domain, int bus, int slot, int fn, int nr_vectors) " (%04x:%02x:%02x.%x) Enabled %d MSI vectors"
-vfio_disable_msix(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_disable_msi(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_pci_load_rom(int domain, int bus, int slot, int fn, unsigned long size, unsigned long offset, unsigned long flags) "Device %04x:%02x:%02x.%x ROM:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_rom_read(int domain, int bus, int slot, int fn, uint64_t addr, int size, uint64_t data) " (%04x:%02x:%02x.%x, 0x%"PRIx64", 0x%x) = 0x%"PRIx64
-vfio_pci_size_rom(int domain, int bus, int slot, int fn, int size) "%04x:%02x:%02x.%x ROM size 0x%x"
+vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
+vfio_eoi(const char *name) " (%s) EOI"
+vfio_enable_intx_kvm(const char *name) " (%s) KVM INTx accel enabled"
+vfio_disable_intx_kvm(const char *name) " (%s) KVM INTx accel disabled"
+vfio_update_irq(const char *name, int new_irq, int target_irq) " (%s) IRQ moved %d -> %d"
+vfio_enable_intx(const char *name) " (%s)"
+vfio_disable_intx(const char *name) " (%s)"
+vfio_msi_interrupt(const char *name, int index, uint64_t addr, int data) " (%s) vector %d 0x%"PRIx64"/0x%x"
+vfio_msix_vector_do_use(const char *name, int index) " (%s) vector %d used"
+vfio_msix_vector_release(const char *name, int index) " (%s) vector %d released"
+vfio_enable_msix(const char *name) " (%s)"
+vfio_enable_msi(const char *name, int nr_vectors) " (%s) Enabled %d MSI vectors"
+vfio_disable_msix(const char *name) " (%s)"
+vfio_disable_msi(const char *name) " (%s)"
+vfio_pci_load_rom(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s ROM:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_rom_read(const char *name, uint64_t addr, int size, uint64_t data) " (%s, 0x%"PRIx64", 0x%x) = 0x%"PRIx64
+vfio_pci_size_rom(const char *name, int size) "%s ROM size 0x%x"
 vfio_vga_write(uint64_t addr, uint64_t data, int size) " (0x%"PRIx64", 0x%"PRIx64", %d)"
 vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"PRIx64
-# remove ) = due to parser error
-vfio_generic_window_quirk_read(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, int size, uint64_t data) "%s read(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
-# remove )
-vfio_generic_window_quirk_write(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, int size) "%s write(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
 # remove ) =
-vfio_generic_quirk_read(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, int size, uint64_t data) "%s read(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
+vfio_generic_window_quirk_read(const char * region_name, const char *name, int index, uint64_t addr, int size, uint64_t data) "%s read(%s:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
+## remove )
+vfio_generic_window_quirk_write(const char * region_name, const char *name, int index, uint64_t addr, uint64_t data, int size) "%s write(%s:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
+# remove ) =
+vfio_generic_quirk_read(const char * region_name, const char *name, int index, uint64_t addr, int size, uint64_t data) "%s read(%s:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
 # remove )
-vfio_generic_quirk_write(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, int size) "%s write(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
+vfio_generic_quirk_write(const char * region_name, const char *name, int index, uint64_t addr, uint64_t data, int size) "%s write(%s:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
 vfio_ati_3c3_quirk_read(uint64_t data) " (0x3c3, 1) = 0x%"PRIx64
-vfio_vga_probe_ati_3c3_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD quirk 0x3c3 BAR4 for device %04x:%02x:%02x.%x"
-vfio_probe_ati_bar4_window_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD BAR4 window quirk for device %04x:%02x:%02x.%x"
+vfio_vga_probe_ati_3c3_quirk(const char *name) "Enabled ATI/AMD quirk 0x3c3 BAR4for device %s"
+vfio_probe_ati_bar4_window_quirk(const char *name) "Enabled ATI/AMD BAR4 window quirk for device %s"
 #issue with )
-vfio_rtl8168_window_quirk_read_fake(const char *region_name, int domain, int bus, int slot, int fn) "%s fake read(%04x:%02x:%02x.%d"
-vfio_rtl8168_window_quirk_read_table(const char *region_name, int domain, int bus, int slot, int fn) "%s MSI-X table read(%04x:%02x:%02x.%d"
-vfio_rtl8168_window_quirk_read_direct(const char *region_name, int domain, int bus, int slot, int fn) "%s direct read(%04x:%02x:%02x.%d"
-vfio_rtl8168_window_quirk_write_table(const char *region_name, int domain, int bus, int slot, int fn) "%s MSI-X table write(%04x:%02x:%02x.%d"
-vfio_rtl8168_window_quirk_write_direct(const char *region_name, int domain, int bus, int slot, int fn) "%s direct write(%04x:%02x:%02x.%d"
-vfio_probe_rtl8168_bar2_window_quirk(int domain, int bus, int slot, int fn) "Enabled RTL8168 BAR2 window quirk for device %04x:%02x:%02x.%x"
-vfio_probe_ati_bar2_4000_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD BAR2 0x4000 quirk for device %04x:%02x:%02x.%x"
+vfio_rtl8168_window_quirk_read_fake(const char *region_name, const char *name) "%s fake read(%s"
+vfio_rtl8168_window_quirk_read_table(const char *region_name, const char *name) "%s MSI-X table read(%s"
+vfio_rtl8168_window_quirk_read_direct(const char *region_name, const char *name) "%s direct read(%s"
+vfio_rtl8168_window_quirk_write_table(const char *region_name, const char *name) "%s MSI-X table write(%s"
+vfio_rtl8168_window_quirk_write_direct(const char *region_name, const char *name) "%s direct write(%s"
+vfio_probe_rtl8168_bar2_window_quirk(const char *name) "Enabled RTL8168 BAR2 window quirk for device %s"
+vfio_probe_ati_bar2_4000_quirk(const char *name) "Enabled ATI/AMD BAR2 0x4000 quirk for device %s"
 vfio_nvidia_3d0_quirk_read(int size, uint64_t data) " (0x3d0, %d) = 0x%"PRIx64
 vfio_nvidia_3d0_quirk_write(uint64_t data, int size) " (0x3d0, 0x%"PRIx64", %d)"
-vfio_vga_probe_nvidia_3d0_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA VGA 0x3d0 quirk for device %04x:%02x:%02x.%x"
-vfio_probe_nvidia_bar5_window_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR5 window quirk for device %04x:%02x:%02x.%x"
-vfio_probe_nvidia_bar0_88000_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR0 0x88000 quirk for device %04x:%02x:%02x.%x"
+vfio_vga_probe_nvidia_3d0_quirk(const char *name) "Enabled NVIDIA VGA 0x3d0 quirk for device %s"
+vfio_probe_nvidia_bar5_window_quirk(const char *name) "Enabled NVIDIA BAR5 window quirk for device %s"
+vfio_probe_nvidia_bar0_88000_quirk(const char *name) "Enabled NVIDIA BAR0 0x88000 quirk for device %s"
 vfio_probe_nvidia_bar0_1800_quirk_id(int id) "Nvidia NV%02x"
-vfio_probe_nvidia_bar0_1800_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR0 0x1800 quirk for device %04x:%02x:%02x.%x"
-vfio_pci_read_config(int domain, int bus, int slot, int fn, int addr, int len, int val) " (%04x:%02x:%02x.%x, @0x%x, len=0x%x) %x"
-vfio_pci_write_config(int domain, int bus, int slot, int fn, int addr, int val, int len) " (%04x:%02x:%02x.%x, @0x%x, 0x%x, len=0x%x)"
-vfio_setup_msi(int domain, int bus, int slot, int fn, int pos) "%04x:%02x:%02x.%x PCI MSI CAP @0x%x"
-vfio_early_setup_msix(int domain, int bus, int slot, int fn, int pos, int table_bar, int offset, int entries) "%04x:%02x:%02x.%x PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
-vfio_check_pcie_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports FLR via PCIe cap"
-vfio_check_pm_reset(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports PM reset"
-vfio_check_af_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports FLR via AF cap"
-vfio_pci_hot_reset(int domain, int bus, int slot, int fn, const char *type) " (%04x:%02x:%02x.%x) %s"
-vfio_pci_hot_reset_has_dep_devices(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x: hot reset dependent devices:"
+vfio_probe_nvidia_bar0_1800_quirk(const char *name) "Enabled NVIDIA BAR0 0x1800 quirk for device %s"
+vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) %x"
+vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
+vfio_setup_msi(const char *name, int pos) "%s PCI MSI CAP @0x%x"
+vfio_early_setup_msix(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
+vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
+vfio_check_pm_reset(const char *name) "%s Supports PM reset"
+vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"
+vfio_pci_hot_reset(const char *name, const char *type) " (%s) %s"
+vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset dependent devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
-vfio_pci_hot_reset_result(int domain, int bus, int slot, int fn, const char *result) "%04x:%02x:%02x.%x hot reset: %s"
+vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
 vfio_populate_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
-vfio_get_device(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
-vfio_initfn(int domain, int bus, int slot, int fn, int group_id) " (%04x:%02x:%02x.%x) group %d"
-vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
-vfio_pci_reset_pm(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x PCI PM Reset"
+vfio_initfn(const char *name, int group_id) " (%s) group %d"
+vfio_pci_reset(const char *name) " (%s)"
+vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
+vfio_pci_reset_pm(const char *name) "%s PCI PM Reset"
 
 vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
-vfio_region_read(const char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
+vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add %"PRIx64" - %"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] %"PRIx64" - %"PRIx64
@@ -1375,6 +1373,7 @@ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del %"PRIx64" - %"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
+vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 #hw/acpi/memory_hotplug.c
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (6 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 07/16] hw/vfio/pci: use name field in format strings Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-10 13:09   ` Alexander Graf
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, Kim Phillips, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).

This patch only consists in move (no functional changes)

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v5 -> v6:
- follow all evolutions of original PCI code from v5 to V6
- move declaration of vfio_region_ops, vfio_memory_listener,
  vfio_group_list, vfio_address_spaces into vfio-common.h

v4 -> v5:
- integrate "sPAPR/IOMMU: Fix TCE entry permission"
- VFIOdevice .name dealloc removed from vfio_put_base_device
- add some includes according to vfio inclusion policy

v3 -> v4:
[Eric Auger]
move done after all PCI modifications to anticipate for
VFIO Platform needs. Purpose is to alleviate the whole
review process.

<= v3
First split done by Kim Phillips
---
 hw/vfio/Makefile.objs         |    1 +
 hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 1028 +----------------------------------------
 include/hw/vfio/vfio-common.h |  152 ++++++
 trace-events                  |    1 +
 5 files changed, 1113 insertions(+), 1027 deletions(-)
 create mode 100644 hw/vfio/common.c
 create mode 100644 include/hw/vfio/vfio-common.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 31c7dab..e31f30e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,3 +1,4 @@
 ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
new file mode 100644
index 0000000..252c0b8
--- /dev/null
+++ b/hw/vfio/common.c
@@ -0,0 +1,958 @@
+/*
+ * generic functions used by VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/vfio.h>
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/hw.h"
+#include "qemu/error-report.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+struct vfio_group_head vfio_group_list =
+    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+struct vfio_as_head vfio_address_spaces =
+    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+
+#ifdef CONFIG_KVM
+/*
+ * We have a single VFIO pseudo device per KVM VM.  Once created it lives
+ * for the life of the VM.  Closing the file descriptor only drops our
+ * reference to it and the device's reference to kvm.  Therefore once
+ * initialized, this file descriptor is only released on QEMU exit and
+ * we'll re-use it should another vfio device be attached before then.
+ */
+static int vfio_kvm_device_fd = -1;
+#endif
+
+/*
+ * Common VFIO interrupt disable
+ */
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+        .index = index,
+        .start = 0,
+        .count = 0,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+/*
+ * IO Port/MMIO - Beware of the endians, VFIO is always little endian
+ */
+void vfio_region_write(void *opaque, hwaddr addr,
+                       uint64_t data, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+
+    switch (size) {
+    case 1:
+        buf.byte = data;
+        break;
+    case 2:
+        buf.word = data;
+        break;
+    case 4:
+        buf.dword = data;
+        break;
+    default:
+        hw_error("vfio: unsupported write size, %d bytes", size);
+        break;
+    }
+
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
+    }
+
+    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
+
+    /*
+     * A read or write to a BAR always signals an INTx EOI.  This will
+     * do nothing if not pending (including not in INTx mode).  We assume
+     * that a BAR access is in response to an interrupt and that BAR
+     * accesses will service the interrupt.  Unfortunately, we don't know
+     * which access will service the interrupt, so we're potentially
+     * getting quite a few host interrupts per guest interrupt.
+     */
+    vbasedev->ops->vfio_eoi(vbasedev);
+}
+
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+    uint64_t data = 0;
+
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
+        return (uint64_t)-1;
+    }
+    switch (size) {
+    case 1:
+        data = buf.byte;
+        break;
+    case 2:
+        data = buf.word;
+        break;
+    case 4:
+        data = buf.dword;
+        break;
+    default:
+        hw_error("vfio: unsupported read size, %d bytes", size);
+        break;
+    }
+
+    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
+
+    /* Same as write above */
+    vbasedev->ops->vfio_eoi(vbasedev);
+
+    return data;
+}
+
+const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_dma_unmap(VFIOContainer *container,
+                          hwaddr iova, ram_addr_t size)
+{
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = 0,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        error_report("VFIO_UNMAP_DMA: %d\n", -errno);
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+                        ram_addr_t size, void *vaddr, bool readonly)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_READ,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (!readonly) {
+        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
+    }
+
+    /*
+     * Try the mapping, if it fails with EBUSY, unmap the region and try
+     * again.  This shouldn't be necessary, but we sometimes see it in
+     * the the VGA ROM space.
+     */
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
+        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+        return 0;
+    }
+
+    error_report("VFIO_MAP_DMA: %d\n", -errno);
+    return -errno;
+}
+
+static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+{
+    return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_iommu(section->mr)) ||
+           /*
+            * Sizing an enabled 64-bit BAR can cause spurious mappings to
+            * addresses in the upper part of the 64-bit address space.  These
+            * are never accessed by the CPU and beyond the address width of
+            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
+            */
+           section->offset_within_address_space & (1ULL << 63);
+}
+
+static void vfio_iommu_map_notify(Notifier *n, void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUTLBEntry *iotlb = data;
+    MemoryRegion *mr;
+    hwaddr xlat;
+    hwaddr len = iotlb->addr_mask + 1;
+    void *vaddr;
+    int ret;
+
+    trace_vfio_iommu_map_notify(iotlb->iova,
+                                iotlb->iova + iotlb->addr_mask);
+
+    /*
+     * The IOMMU TLB entry we have just covers translation through
+     * this IOMMU to its immediate target.  We need to translate
+     * it the rest of the way through to memory.
+     */
+    mr = address_space_translate(&address_space_memory,
+                                 iotlb->translated_addr,
+                                 &xlat, &len, iotlb->perm & IOMMU_WO);
+    if (!memory_region_is_ram(mr)) {
+        error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
+                     xlat);
+        return;
+    }
+    /*
+     * Translation truncates length to the IOMMU page size,
+     * check that it did not truncate too much.
+     */
+    if (len & iotlb->addr_mask) {
+        error_report("iommu has granularity incompatible with target AS\n");
+        return;
+    }
+
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        vaddr = memory_region_get_ram_ptr(mr) + xlat;
+        ret = vfio_dma_map(container, iotlb->iova,
+                           iotlb->addr_mask + 1, vaddr,
+                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, vaddr, ret);
+        }
+    } else {
+        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
+        if (ret) {
+            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, ret);
+        }
+    }
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    Int128 llend;
+    void *vaddr;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        trace_vfio_listener_region_add_skip(
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return;
+    }
+
+    memory_region_ref(section->mr);
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        trace_vfio_listener_region_add_iommu(iova,
+                    int128_get64(int128_sub(llend, int128_one())));
+        /*
+         * FIXME: We should do some checking to see if the
+         * capabilities of the host VFIO IOMMU are adequate to model
+         * the guest IOMMU
+         *
+         * FIXME: For VFIO iommu types which have KVM acceleration to
+         * avoid bouncing all map/unmaps through qemu this way, this
+         * would be the right place to wire that up (tell the KVM
+         * device emulation the VFIO iommu handles to use).
+         */
+        /*
+         * This assumes that the guest IOMMU is empty of
+         * mappings at this point.
+         *
+         * One way of doing this is:
+         * 1. Avoid sharing IOMMUs between emulated devices or different
+         * IOMMU groups.
+         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
+         * there are some mappings in IOMMU.
+         *
+         * VFIO on SPAPR does that. Other IOMMU models may do that different,
+         * they must make sure there are no existing mappings or
+         * loop through existing mappings to map them into VFIO.
+         */
+        giommu = g_malloc0(sizeof(*giommu));
+        giommu->iommu = section->mr;
+        giommu->container = container;
+        giommu->n.notify = vfio_iommu_map_notify;
+        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+        return;
+    }
+
+    /* Here we assume that memory_region_is_ram(section->mr)==true */
+
+    end = int128_get64(llend);
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+
+    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+
+    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+    if (ret) {
+        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                     container, iova, end - iova, vaddr, ret);
+
+        /*
+         * On the initfn path, store the first error in the container so we
+         * can gracefully fail.  Runtime, there's not much we can do other
+         * than throw a hardware error.
+         */
+        if (!container->iommu_data.type1.initialized) {
+            if (!container->iommu_data.type1.error) {
+                container->iommu_data.type1.error = ret;
+            }
+        } else {
+            hw_error("vfio: DMA mapping failed, unable to continue");
+        }
+    }
+}
+
+static void vfio_listener_region_del(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        trace_vfio_listener_region_del_skip(
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+            if (giommu->iommu == section->mr) {
+                memory_region_unregister_iommu_notifier(&giommu->n);
+                QLIST_REMOVE(giommu, giommu_next);
+                g_free(giommu);
+                break;
+            }
+        }
+
+        /*
+         * FIXME: We assume the one big unmap below is adequate to
+         * remove any individual page mappings in the IOMMU which
+         * might have been copied into VFIO. This works for a page table
+         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
+         * That may not be true for all IOMMU types.
+         */
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    end = (section->offset_within_address_space + int128_get64(section->size)) &
+          TARGET_PAGE_MASK;
+
+    if (iova >= end) {
+        return;
+    }
+
+    trace_vfio_listener_region_del(iova, end - 1);
+
+    ret = vfio_dma_unmap(container, iova, end - iova);
+    memory_region_unref(section->mr);
+    if (ret) {
+        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx") = %d (%m)",
+                     container, iova, end - iova, ret);
+    }
+}
+
+const MemoryListener vfio_memory_listener = {
+    .region_add = vfio_listener_region_add,
+    .region_del = vfio_listener_region_del,
+};
+
+void vfio_listener_release(VFIOContainer *container)
+{
+    memory_listener_unregister(&container->iommu_data.type1.listener);
+}
+
+int vfio_mmap_region(Object *obj, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name)
+{
+    int ret = 0;
+    VFIODevice *vbasedev = region->vbasedev;
+
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
+        int prot = 0;
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
+            prot |= PROT_READ;
+        }
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+            prot |= PROT_WRITE;
+        }
+
+        *map = mmap(NULL, size, prot, MAP_SHARED,
+                    vbasedev->fd,
+                    region->fd_offset + offset);
+        if (*map == MAP_FAILED) {
+            *map = NULL;
+            ret = -errno;
+            goto empty_region;
+        }
+
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
+    } else {
+empty_region:
+        /* Create a zero sized sub-region to make cleanup easy. */
+        memory_region_init(submem, obj, name, 0);
+    }
+
+    memory_region_add_subregion(mem, offset, submem);
+
+    return ret;
+}
+
+void vfio_reset_handler(void *opaque)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
+        }
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
+            }
+        }
+    }
+}
+
+static void vfio_kvm_device_add_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_ADD,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (!kvm_enabled()) {
+        return;
+    }
+
+    if (vfio_kvm_device_fd < 0) {
+        struct kvm_create_device cd = {
+            .type = KVM_DEV_TYPE_VFIO,
+        };
+
+        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+            error_report("KVM_CREATE_DEVICE: %m\n");
+            return;
+        }
+
+        vfio_kvm_device_fd = cd.fd;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to add group %d to KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static void vfio_kvm_device_del_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_DEL,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (vfio_kvm_device_fd < 0) {
+        return;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to remove group %d from KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
+{
+    VFIOAddressSpace *space;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        if (space->as == as) {
+            return space;
+        }
+    }
+
+    /* No suitable VFIOAddressSpace, create a new one */
+    space = g_malloc0(sizeof(*space));
+    space->as = as;
+    QLIST_INIT(&space->containers);
+
+    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
+
+    return space;
+}
+
+static void vfio_put_address_space(VFIOAddressSpace *space)
+{
+    if (QLIST_EMPTY(&space->containers)) {
+        QLIST_REMOVE(space, list);
+        g_free(space);
+    }
+}
+
+static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
+{
+    VFIOContainer *container;
+    int ret, fd;
+    VFIOAddressSpace *space;
+
+    space = vfio_get_address_space(as);
+
+    QLIST_FOREACH(container, &space->containers, next) {
+        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            group->container = container;
+            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            return 0;
+        }
+    }
+
+    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    if (fd < 0) {
+        error_report("vfio: failed to open /dev/vfio/vfio: %m");
+        ret = -errno;
+        goto put_space_exit;
+    }
+
+    ret = ioctl(fd, VFIO_GET_API_VERSION);
+    if (ret != VFIO_API_VERSION) {
+        error_report("vfio: supported vfio version: %d, "
+                     "reported version: %d", VFIO_API_VERSION, ret);
+        ret = -EINVAL;
+        goto close_fd_exit;
+    }
+
+    container = g_malloc0(sizeof(*container));
+    container->space = space;
+    container->fd = fd;
+    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 &address_space_memory);
+
+        if (container->iommu_data.type1.error) {
+            ret = container->iommu_data.type1.error;
+            error_report("vfio: memory listener initialization failed for container");
+            goto listener_release_exit;
+        }
+
+        container->iommu_data.type1.initialized = true;
+
+    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        /*
+         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
+         * when container fd is closed so we do not call it explicitly
+         * in this file.
+         */
+        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+        if (ret) {
+            error_report("vfio: failed to enable container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 container->space->as);
+
+    } else {
+        error_report("vfio: No available IOMMU models");
+        ret = -EINVAL;
+        goto free_container_exit;
+    }
+
+    QLIST_INIT(&container->group_list);
+    QLIST_INSERT_HEAD(&space->containers, container, next);
+
+    group->container = container;
+    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
+    return 0;
+listener_release_exit:
+    vfio_listener_release(container);
+
+free_container_exit:
+    g_free(container);
+
+close_fd_exit:
+    close(fd);
+
+put_space_exit:
+    vfio_put_address_space(space);
+
+    return ret;
+}
+
+static void vfio_disconnect_container(VFIOGroup *group)
+{
+    VFIOContainer *container = group->container;
+
+    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
+        error_report("vfio: error disconnecting group %d from container",
+                     group->groupid);
+    }
+
+    QLIST_REMOVE(group, container_next);
+    group->container = NULL;
+
+    if (QLIST_EMPTY(&container->group_list)) {
+        VFIOAddressSpace *space = container->space;
+
+        if (container->iommu_data.release) {
+            container->iommu_data.release(container);
+        }
+        QLIST_REMOVE(container, next);
+        trace_vfio_disconnect_container(container->fd);
+        close(container->fd);
+        g_free(container);
+
+        vfio_put_address_space(space);
+    }
+}
+
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
+{
+    VFIOGroup *group;
+    char path[32];
+    struct vfio_group_status status = { .argsz = sizeof(status) };
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if (group->container->space->as == as) {
+                return group;
+            } else {
+                error_report("vfio: group %d used in multiple address spaces",
+                             group->groupid);
+                return NULL;
+            }
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
+    group->fd = qemu_open(path, O_RDWR);
+    if (group->fd < 0) {
+        error_report("vfio: error opening %s: %m", path);
+        goto free_group_exit;
+    }
+
+    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
+        error_report("vfio: error getting group status: %m");
+        goto close_fd_exit;
+    }
+
+    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+        error_report("vfio: error, group %d is not viable, please ensure "
+                     "all devices within the iommu_group are bound to their "
+                     "vfio bus driver.", groupid);
+        goto close_fd_exit;
+    }
+
+    group->groupid = groupid;
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as)) {
+        error_report("vfio: failed to setup container for group %d", groupid);
+        goto close_fd_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    vfio_kvm_device_add_group(group);
+
+    return group;
+
+close_fd_exit:
+    close(group->fd);
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+void vfio_put_group(VFIOGroup *group)
+{
+    if (!QLIST_EMPTY(&group->device_list)) {
+        return;
+    }
+
+    vfio_kvm_device_del_group(group);
+    vfio_disconnect_container(group);
+    QLIST_REMOVE(group, next);
+    trace_vfio_put_group(group->fd);
+    close(group->fd);
+    g_free(group);
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_unregister_reset(vfio_reset_handler, NULL);
+    }
+}
+
+int vfio_get_device(VFIOGroup *group, const char *name,
+                       VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    int ret;
+
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
+                          dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    ret = vbasedev->ops->vfio_populate_device(vbasedev);
+
+error:
+    if (ret) {
+        vfio_put_base_device(vbasedev);
+    }
+    return ret;
+}
+
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    trace_vfio_put_base_device(vbasedev->fd);
+    close(vbasedev->fd);
+}
+
+static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
+                                   int req, void *param)
+{
+    VFIOGroup *group;
+    VFIOContainer *container;
+    int ret = -1;
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        error_report("vfio: group %d not registered", groupid);
+        return ret;
+    }
+
+    container = group->container;
+    if (group->container) {
+        ret = ioctl(container->fd, req, param);
+        if (ret < 0) {
+            error_report("vfio: failed to ioctl container: ret=%d, %s",
+                         ret, strerror(errno));
+        }
+    }
+
+    vfio_put_group(group);
+
+    return ret;
+}
+
+int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+                         int req, void *param)
+{
+    /* We allow only certain ioctls to the container */
+    switch (req) {
+    case VFIO_CHECK_EXTENSION:
+    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+        break;
+    default:
+        /* Return an error on unknown requests */
+        error_report("vfio: unsupported ioctl %X", req);
+        return -1;
+    }
+
+    return vfio_container_do_ioctl(as, groupid, req, param);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c617b79..fae5f25 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,17 +41,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "hw/vfio/vfio.h"
-
-/* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
-#define VFIO_ALLOW_KVM_INTX 1
-#define VFIO_ALLOW_KVM_MSI 1
-#define VFIO_ALLOW_KVM_MSIX 1
-
-enum {
-    VFIO_DEVICE_TYPE_PCI = 0,
-    VFIO_DEVICE_TYPE_PLATFORM = 1,
-};
+#include "hw/vfio/vfio-common.h"
 
 struct VFIOPCIDevice;
 
@@ -78,17 +68,6 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIORegion {
-    struct VFIODevice *vbasedev;
-    off_t fd_offset; /* offset of region within device fd */
-    MemoryRegion mem; /* slow, read/write access */
-    MemoryRegion mmap_mem; /* direct mapped access */
-    void *mmap;
-    size_t size;
-    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the region number for debug */
-} VFIORegion;
-
 typedef struct VFIOBAR {
     VFIORegion region;
     bool ioport;
@@ -144,45 +123,6 @@ enum {
     VFIO_INT_MSIX = 3,
 };
 
-typedef struct VFIOAddressSpace {
-    AddressSpace *as;
-    QLIST_HEAD(, VFIOContainer) containers;
-    QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
-    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
-
-struct VFIOGroup;
-
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
-typedef struct VFIOContainer {
-    VFIOAddressSpace *space;
-    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
-    } iommu_data;
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
-    QLIST_HEAD(, VFIOGroup) group_list;
-    QLIST_ENTRY(VFIOContainer) next;
-} VFIOContainer;
-
-typedef struct VFIOGuestIOMMU {
-    VFIOContainer *container;
-    MemoryRegion *iommu;
-    Notifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
 /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
 typedef struct VFIOMSIXInfo {
     uint8_t table_bar;
@@ -194,29 +134,6 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODeviceOps VFIODeviceOps;
-
-typedef struct VFIODevice {
-    QLIST_ENTRY(VFIODevice) next;
-    struct VFIOGroup *group;
-    char *name;
-    int fd;
-    int type;
-    bool reset_works;
-    bool needs_reset;
-    VFIODeviceOps *ops;
-    unsigned int num_irqs;
-    unsigned int num_regions;
-    unsigned int flags;
-} VFIODevice;
-
-struct VFIODeviceOps {
-    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
-    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
-    void (*vfio_eoi)(VFIODevice *vdev);
-    int (*vfio_populate_device)(VFIODevice *vdev);
-};
-
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     VFIODevice vbasedev;
@@ -248,15 +165,6 @@ typedef struct VFIOPCIDevice {
     bool rom_read_failed;
 } VFIOPCIDevice;
 
-typedef struct VFIOGroup {
-    int fd;
-    int groupid;
-    VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
-    QLIST_ENTRY(VFIOGroup) next;
-    QLIST_ENTRY(VFIOGroup) container_next;
-} VFIOGroup;
-
 typedef struct VFIORomBlacklistEntry {
     uint16_t vendor_id;
     uint16_t device_id;
@@ -282,76 +190,14 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 
 #define MSIX_CAP_LENGTH 12
 
-static QLIST_HEAD(, VFIOGroup)
-    vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list);
-
-#ifdef CONFIG_KVM
-/*
- * We have a single VFIO pseudo device per KVM VM.  Once created it lives
- * for the life of the VM.  Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm.  Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
- * we'll re-use it should another vfio device be attached before then.
- */
-static int vfio_kvm_device_fd = -1;
-#endif
-
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
-static void vfio_put_base_device(VFIODevice *vbasedev);
 static int vfio_populate_device(VFIODevice *vbasedev);
 
 /*
- * Common VFIO interrupt disable
- */
-static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
-        .index = index,
-        .start = 0,
-        .count = 0,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-/*
- * INTx
- */
-static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-#endif
-
-/*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
  * also be a huge overhead.  We try to get the best of both worlds by
  * waiting until an interrupt to disable mmaps (subsequent transitions
@@ -1081,105 +927,6 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
     }
 }
 
-/*
- * IO Port/MMIO - Beware of the endians, VFIO is always little endian
- */
-static void vfio_region_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-
-    switch (size) {
-    case 1:
-        buf.byte = data;
-        break;
-    case 2:
-        buf.word = data;
-        break;
-    case 4:
-        buf.dword = data;
-        break;
-    default:
-        hw_error("vfio: unsupported write size, %d bytes", size);
-        break;
-    }
-
-    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-                     ",%d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, data, size);
-    }
-
-    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
-
-    /*
-     * A read or write to a BAR always signals an INTx EOI.  This will
-     * do nothing if not pending (including not in INTx mode).  We assume
-     * that a BAR access is in response to an interrupt and that BAR
-     * accesses will service the interrupt.  Unfortunately, we don't know
-     * which access will service the interrupt, so we're potentially
-     * getting quite a few host interrupts per guest interrupt.
-     */
-    vbasedev->ops->vfio_eoi(vbasedev);
-}
-
-static uint64_t vfio_region_read(void *opaque,
-                                 hwaddr addr, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-    uint64_t data = 0;
-
-    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, size);
-        return (uint64_t)-1;
-    }
-
-    switch (size) {
-    case 1:
-        data = buf.byte;
-        break;
-    case 2:
-        data = buf.word;
-        break;
-    case 4:
-        data = buf.dword;
-        break;
-    default:
-        hw_error("vfio: unsupported read size, %d bytes", size);
-        break;
-    }
-
-    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
-
-    /* Same as write above */
-    vbasedev->ops->vfio_eoi(vbasedev);
-
-    return data;
-}
-
-static const MemoryRegionOps vfio_region_ops = {
-    .read = vfio_region_read,
-    .write = vfio_region_write,
-    .endianness = DEVICE_NATIVE_ENDIAN,
-};
-
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
@@ -2378,305 +2125,6 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
 }
 
 /*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_dma_unmap(VFIOContainer *container,
-                          hwaddr iova, ram_addr_t size)
-{
-    struct vfio_iommu_type1_dma_unmap unmap = {
-        .argsz = sizeof(unmap),
-        .flags = 0,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        error_report("VFIO_UNMAP_DMA: %d\n", -errno);
-        return -errno;
-    }
-
-    return 0;
-}
-
-static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                        ram_addr_t size, void *vaddr, bool readonly)
-{
-    struct vfio_iommu_type1_dma_map map = {
-        .argsz = sizeof(map),
-        .flags = VFIO_DMA_MAP_FLAG_READ,
-        .vaddr = (__u64)(uintptr_t)vaddr,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (!readonly) {
-        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
-    }
-
-    /*
-     * Try the mapping, if it fails with EBUSY, unmap the region and try
-     * again.  This shouldn't be necessary, but we sometimes see it in
-     * the the VGA ROM space.
-     */
-    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
-         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
-        return 0;
-    }
-
-    error_report("VFIO_MAP_DMA: %d\n", -errno);
-    return -errno;
-}
-
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
-{
-    return (!memory_region_is_ram(section->mr) &&
-            !memory_region_is_iommu(section->mr)) ||
-           /*
-            * Sizing an enabled 64-bit BAR can cause spurious mappings to
-            * addresses in the upper part of the 64-bit address space.  These
-            * are never accessed by the CPU and beyond the address width of
-            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
-            */
-           section->offset_within_address_space & (1ULL << 63);
-}
-
-static void vfio_iommu_map_notify(Notifier *n, void *data)
-{
-    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainer *container = giommu->container;
-    IOMMUTLBEntry *iotlb = data;
-    MemoryRegion *mr;
-    hwaddr xlat;
-    hwaddr len = iotlb->addr_mask + 1;
-    void *vaddr;
-    int ret;
-
-    trace_vfio_iommu_map_notify(iotlb->iova,
-                                iotlb->iova + iotlb->addr_mask);
-
-    /*
-     * The IOMMU TLB entry we have just covers translation through
-     * this IOMMU to its immediate target.  We need to translate
-     * it the rest of the way through to memory.
-     */
-    mr = address_space_translate(&address_space_memory,
-                                 iotlb->translated_addr,
-                                 &xlat, &len, iotlb->perm & IOMMU_WO);
-    if (!memory_region_is_ram(mr)) {
-        error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
-                     xlat);
-        return;
-    }
-    /*
-     * Translation truncates length to the IOMMU page size,
-     * check that it did not truncate too much.
-     */
-    if (len & iotlb->addr_mask) {
-        error_report("iommu has granularity incompatible with target AS\n");
-        return;
-    }
-
-    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-        vaddr = memory_region_get_ram_ptr(mr) + xlat;
-
-        ret = vfio_dma_map(container, iotlb->iova,
-                           iotlb->addr_mask + 1, vaddr,
-                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
-        if (ret) {
-            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, vaddr, ret);
-        }
-    } else {
-        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
-        if (ret) {
-            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, ret);
-        }
-    }
-}
-
-static void vfio_listener_region_add(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    Int128 llend;
-    void *vaddr;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_add_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    llend = int128_make64(section->offset_within_address_space);
-    llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
-
-    if (int128_ge(int128_make64(iova), llend)) {
-        return;
-    }
-
-    memory_region_ref(section->mr);
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        trace_vfio_listener_region_add_iommu(iova,
-                    int128_get64(int128_sub(llend, int128_one())));
-        /*
-         * FIXME: We should do some checking to see if the
-         * capabilities of the host VFIO IOMMU are adequate to model
-         * the guest IOMMU
-         *
-         * FIXME: For VFIO iommu types which have KVM acceleration to
-         * avoid bouncing all map/unmaps through qemu this way, this
-         * would be the right place to wire that up (tell the KVM
-         * device emulation the VFIO iommu handles to use).
-         */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->container = container;
-        giommu->n.notify = vfio_iommu_map_notify;
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-
-        return;
-    }
-
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-    end = int128_get64(llend);
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
-
-    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
-
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
-
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
-        }
-    }
-}
-
-static void vfio_listener_region_del(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_del_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-            if (giommu->iommu == section->mr) {
-                memory_region_unregister_iommu_notifier(&giommu->n);
-                QLIST_REMOVE(giommu, giommu_next);
-                g_free(giommu);
-                break;
-            }
-        }
-
-        /*
-         * FIXME: We assume the one big unmap below is adequate to
-         * remove any individual page mappings in the IOMMU which
-         * might have been copied into VFIO. This works for a page table
-         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
-         * That may not be true for all IOMMU types.
-         */
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    end = (section->offset_within_address_space + int128_get64(section->size)) &
-          TARGET_PAGE_MASK;
-
-    if (iova >= end) {
-        return;
-    }
-
-    trace_vfio_listener_region_del(iova, end - 1);
-
-    ret = vfio_dma_unmap(container, iova, end - iova);
-    memory_region_unref(section->mr);
-    if (ret) {
-        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx") = %d (%m)",
-                     container, iova, end - iova, ret);
-    }
-}
-
-static MemoryListener vfio_memory_listener = {
-    .region_add = vfio_listener_region_add,
-    .region_del = vfio_listener_region_del,
-};
-
-static void vfio_listener_release(VFIOContainer *container)
-{
-    memory_listener_unregister(&container->iommu_data.type1.listener);
-}
-
-/*
  * Interrupt setup
  */
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
@@ -2850,46 +2298,6 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
     }
 }
 
-static int vfio_mmap_region(Object *obj, VFIORegion *region,
-                            MemoryRegion *mem, MemoryRegion *submem,
-                            void **map, size_t size, off_t offset,
-                            const char *name)
-{
-    int ret = 0;
-    VFIODevice *vbasedev = region->vbasedev;
-
-    if (VFIO_ALLOW_MMAP && size && region->flags &
-        VFIO_REGION_INFO_FLAG_MMAP) {
-        int prot = 0;
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
-            prot |= PROT_READ;
-        }
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
-            prot |= PROT_WRITE;
-        }
-
-        *map = mmap(NULL, size, prot, MAP_SHARED,
-                    vbasedev->fd, region->fd_offset + offset);
-        if (*map == MAP_FAILED) {
-            *map = NULL;
-            ret = -errno;
-            goto empty_region;
-        }
-
-        memory_region_init_ram_ptr(submem, obj, name, size, *map);
-    } else {
-empty_region:
-        /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, obj, name, 0);
-    }
-
-    memory_region_add_subregion(mem, offset, submem);
-
-    return ret;
-}
-
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
@@ -3530,345 +2938,6 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_populate_device = vfio_populate_device,
 };
 
-static void vfio_reset_handler(void *opaque)
-{
-    VFIOGroup *group;
-    VFIODevice *vbasedev;
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
-        }
-    }
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            if (vbasedev->needs_reset) {
-                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
-            }
-        }
-    }
-}
-
-static void vfio_kvm_device_add_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_ADD,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (!kvm_enabled()) {
-        return;
-    }
-
-    if (vfio_kvm_device_fd < 0) {
-        struct kvm_create_device cd = {
-            .type = KVM_DEV_TYPE_VFIO,
-        };
-
-        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
-            error_report("KVM_CREATE_DEVICE: %m\n");
-            return;
-        }
-
-        vfio_kvm_device_fd = cd.fd;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to add group %d to KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static void vfio_kvm_device_del_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_DEL,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (vfio_kvm_device_fd < 0) {
-        return;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to remove group %d from KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
-{
-    VFIOAddressSpace *space;
-
-    QLIST_FOREACH(space, &vfio_address_spaces, list) {
-        if (space->as == as) {
-            return space;
-        }
-    }
-
-    /* No suitable VFIOAddressSpace, create a new one */
-    space = g_malloc0(sizeof(*space));
-    space->as = as;
-    QLIST_INIT(&space->containers);
-
-    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
-
-    return space;
-}
-
-static void vfio_put_address_space(VFIOAddressSpace *space)
-{
-    if (QLIST_EMPTY(&space->containers)) {
-        QLIST_REMOVE(space, list);
-        g_free(space);
-    }
-}
-
-static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
-{
-    VFIOContainer *container;
-    int ret, fd;
-    VFIOAddressSpace *space;
-
-    space = vfio_get_address_space(as);
-
-    QLIST_FOREACH(container, &space->containers, next) {
-        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-            group->container = container;
-            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-            return 0;
-        }
-    }
-
-    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
-    if (fd < 0) {
-        error_report("vfio: failed to open /dev/vfio/vfio: %m");
-        ret = -errno;
-        goto put_space_exit;
-    }
-
-    ret = ioctl(fd, VFIO_GET_API_VERSION);
-    if (ret != VFIO_API_VERSION) {
-        error_report("vfio: supported vfio version: %d, "
-                     "reported version: %d", VFIO_API_VERSION, ret);
-        ret = -EINVAL;
-        goto close_fd_exit;
-    }
-
-    container = g_malloc0(sizeof(*container));
-    container->space = space;
-    container->fd = fd;
-
-    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 &address_space_memory);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
-    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        /*
-         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
-         * when container fd is closed so we do not call it explicitly
-         * in this file.
-         */
-        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-        if (ret) {
-            error_report("vfio: failed to enable container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-    } else {
-        error_report("vfio: No available IOMMU models");
-        ret = -EINVAL;
-        goto free_container_exit;
-    }
-
-    QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
-
-    group->container = container;
-    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
-    return 0;
-
-listener_release_exit:
-    vfio_listener_release(container);
-
-free_container_exit:
-    g_free(container);
-
-close_fd_exit:
-    close(fd);
-
-put_space_exit:
-    vfio_put_address_space(space);
-
-    return ret;
-}
-
-static void vfio_disconnect_container(VFIOGroup *group)
-{
-    VFIOContainer *container = group->container;
-
-    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
-        error_report("vfio: error disconnecting group %d from container",
-                     group->groupid);
-    }
-
-    QLIST_REMOVE(group, container_next);
-    group->container = NULL;
-
-    if (QLIST_EMPTY(&container->group_list)) {
-        VFIOAddressSpace *space = container->space;
-
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
-        QLIST_REMOVE(container, next);
-        trace_vfio_disconnect_container(container->fd);
-        close(container->fd);
-        g_free(container);
-
-        vfio_put_address_space(space);
-    }
-}
-
-static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
-{
-    VFIOGroup *group;
-    char path[32];
-    struct vfio_group_status status = { .argsz = sizeof(status) };
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        if (group->groupid == groupid) {
-            /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
-                return group;
-            } else {
-                error_report("vfio: group %d used in multiple address spaces",
-                             group->groupid);
-                return NULL;
-            }
-        }
-    }
-
-    group = g_malloc0(sizeof(*group));
-
-    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open(path, O_RDWR);
-    if (group->fd < 0) {
-        error_report("vfio: error opening %s: %m", path);
-        goto free_group_exit;
-    }
-
-    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
-        error_report("vfio: error getting group status: %m");
-        goto close_fd_exit;
-    }
-
-    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
-        error_report("vfio: error, group %d is not viable, please ensure "
-                     "all devices within the iommu_group are bound to their "
-                     "vfio bus driver.", groupid);
-        goto close_fd_exit;
-    }
-
-    group->groupid = groupid;
-    QLIST_INIT(&group->device_list);
-
-    if (vfio_connect_container(group, as)) {
-        error_report("vfio: failed to setup container for group %d", groupid);
-        goto close_fd_exit;
-    }
-
-    if (QLIST_EMPTY(&vfio_group_list)) {
-        qemu_register_reset(vfio_reset_handler, NULL);
-    }
-
-    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
-
-    vfio_kvm_device_add_group(group);
-
-    return group;
-
-close_fd_exit:
-    close(group->fd);
-
-free_group_exit:
-    g_free(group);
-
-    return NULL;
-}
-
-static void vfio_put_group(VFIOGroup *group)
-{
-    if (!QLIST_EMPTY(&group->device_list)) {
-        return;
-    }
-
-    vfio_kvm_device_del_group(group);
-    vfio_disconnect_container(group);
-    QLIST_REMOVE(group, next);
-    trace_vfio_put_group(group->fd);
-    close(group->fd);
-    g_free(group);
-
-    if (QLIST_EMPTY(&vfio_group_list)) {
-        qemu_unregister_reset(vfio_reset_handler, NULL);
-    }
-}
-
 static int vfio_populate_device(VFIODevice *vbasedev)
 {
     VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
@@ -3993,57 +3062,6 @@ error:
     return ret;
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIODevice *vbasedev)
-{
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
-    int ret;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
-    }
-
-    vbasedev->fd = ret;
-    vbasedev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
-
-    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
-        goto error;
-    }
-
-    vbasedev->num_irqs = dev_info.num_irqs;
-    vbasedev->num_regions = dev_info.num_regions;
-    vbasedev->flags = dev_info.flags;
-
-    trace_vfio_get_device(name, dev_info.flags,
-                          dev_info.num_regions, dev_info.num_irqs);
-
-    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-    ret = vbasedev->ops->vfio_populate_device(vbasedev);
-
-error:
-    if (ret) {
-        vfio_put_base_device(vbasedev);
-    }
-    return ret;
-}
-
-void vfio_put_base_device(VFIODevice *vbasedev)
-{
-    QLIST_REMOVE(vbasedev, next);
-    vbasedev->group = NULL;
-    trace_vfio_put_base_device(vbasedev->fd);
-    close(vbasedev->fd);
-}
-
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     g_free(vdev->vbasedev.name);
@@ -4417,47 +3435,3 @@ static void register_vfio_pci_dev_type(void)
 }
 
 type_init(register_vfio_pci_dev_type)
-
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
-                                   int req, void *param)
-{
-    VFIOGroup *group;
-    VFIOContainer *container;
-    int ret = -1;
-
-    group = vfio_get_group(groupid, as);
-    if (!group) {
-        error_report("vfio: group %d not registered", groupid);
-        return ret;
-    }
-
-    container = group->container;
-    if (group->container) {
-        ret = ioctl(container->fd, req, param);
-        if (ret < 0) {
-            error_report("vfio: failed to ioctl container: ret=%d, %s",
-                         ret, strerror(errno));
-        }
-    }
-
-    vfio_put_group(group);
-
-    return ret;
-}
-
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
-                         int req, void *param)
-{
-    /* We allow only certain ioctls to the container */
-    switch (req) {
-    case VFIO_CHECK_EXTENSION:
-    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
-        break;
-    default:
-        /* Return an error on unknown requests */
-        error_report("vfio: unsupported ioctl %X", req);
-        return -1;
-    }
-
-    return vfio_container_do_ioctl(as, groupid, req, param);
-}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
new file mode 100644
index 0000000..83c7876
--- /dev/null
+++ b/include/hw/vfio/vfio-common.h
@@ -0,0 +1,152 @@
+/*
+ * common header for vfio based device assignment support
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+#ifndef HW_VFIO_VFIO_COMMON_H
+#define HW_VFIO_VFIO_COMMON_H
+
+#include "qemu-common.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "qemu/notify.h"
+
+/*#define DEBUG_VFIO*/
+#ifdef DEBUG_VFIO
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Extra debugging, trap acceleration paths for more logging */
+#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_KVM_INTX 1
+#define VFIO_ALLOW_KVM_MSI 1
+#define VFIO_ALLOW_KVM_MSIX 1
+
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
+    MemoryRegion mem; /* slow, read/write access */
+    MemoryRegion mmap_mem; /* direct mapped access */
+    void *mmap;
+    size_t size;
+    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOAddressSpace {
+    AddressSpace *as;
+    QLIST_HEAD(, VFIOContainer) containers;
+    QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
+struct VFIOGroup;
+
+typedef struct VFIOType1 {
+    MemoryListener listener;
+    int error;
+    bool initialized;
+} VFIOType1;
+
+typedef struct VFIOContainer {
+    VFIOAddressSpace *space;
+    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
+    struct {
+        /* enable abstraction to support various iommu backends */
+        union {
+            VFIOType1 type1;
+        };
+        void (*release)(struct VFIOContainer *);
+    } iommu_data;
+    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOGroup) group_list;
+    QLIST_ENTRY(VFIOContainer) next;
+} VFIOContainer;
+
+typedef struct VFIOGuestIOMMU {
+    VFIOContainer *container;
+    MemoryRegion *iommu;
+    Notifier n;
+    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_populate_device)(VFIODevice *vdev);
+};
+
+typedef struct VFIOGroup {
+    int fd;
+    int groupid;
+    VFIOContainer *container;
+    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_ENTRY(VFIOGroup) next;
+    QLIST_ENTRY(VFIOGroup) container_next;
+} VFIOGroup;
+
+void vfio_put_base_device(VFIODevice *vbasedev);
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index);
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index);
+void vfio_region_write(void *opaque, hwaddr addr,
+                           uint64_t data, unsigned size);
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size);
+void vfio_listener_release(VFIOContainer *container);
+int vfio_mmap_region(Object *vdev, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name);
+void vfio_reset_handler(void *opaque);
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
+void vfio_put_group(VFIOGroup *group);
+int vfio_get_device(VFIOGroup *group, const char *name,
+                    VFIODevice *vbasedev);
+
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
+extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
+
+#endif /* !HW_VFIO_VFIO_COMMON_H */
diff --git a/trace-events b/trace-events
index 151d5bd..c67ebc5 100644
--- a/trace-events
+++ b/trace-events
@@ -1363,6 +1363,7 @@ vfio_pci_reset(const char *name) " (%s)"
 vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
 vfio_pci_reset_pm(const char *name) "%s PCI PM Reset"
 
+# hw/vfio/vfio-common.c
 vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
 vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 09/16] hw/vfio/platform: add vfio-platform support
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (7 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 10/16] hw/vfio: calxeda xgmac device Eric Auger
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, Kim Phillips, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Minimal VFIO platform implementation supporting
- register space user mapping,
- IRQ assignment based on eventfds handled on qemu side.

irqfd kernel acceleration comes in a subsequent patch.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5 -> v6:
- vfio_device property renamed into host property
- correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
  and remove PCI related comment
- remove declaration of vfio_setup_irqfd and irqfd_allowed
  property.Both belong to next patch (irqfd)
- remove declaration of vfio_intp_interrupt in vfio-platform.h
- functions that can be static get this characteristic
- remove declarations of vfio_region_ops, vfio_memory_listener,
  group_list, vfio_address_spaces. All are moved to vfio-common.h
- remove vfio_put_device declaration and definition
- print_regions removed. code moved into vfio_populate_regions
- replace DPRINTF by trace events
- new helper routine to set the trigger eventfd
- dissociate intp init from the injection enablement:
  vfio_enable_intp renamed into vfio_init_intp and new function
  named vfio_start_eventfd_injection
- injection start moved to vfio_start_irq_injection (not anymore
  in vfio_populate_interrupt)
- new start_irq_fn field in VFIOPlatformDevice corresponding to
  the function that will be used for starting injection
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs thanks to fake interrupt modality
- VFIOPlatformDeviceClass becomes abstract
- add error_setg in vfio_platform_realize

v4 -> v5:
- vfio-plaform.h included first
- cleanup error handling in *populate*, vfio_get_device,
  vfio_enable_intp
- vfio_put_device not called anymore
- add some includes to follow vfio policy

v3 -> v4:
[Eric Auger]
- merge of "vfio: Add initial IRQ support in platform device"
  to get a full functional patch although perfs are limited.
- removal of unrealize function since I currently understand
  it is only used with device hot-plug feature.

v2 -> v3:
[Eric Auger]
- further factorization between PCI and platform (VFIORegion,
  VFIODevice). same level of functionality.

<= v2:
[Kim Philipps]
- Initial Creation of the device supporting register space mapping
---
 hw/vfio/Makefile.objs           |   1 +
 hw/vfio/platform.c              | 599 ++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  79 ++++++
 trace-events                    |  12 +
 4 files changed, 691 insertions(+)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-platform.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..c5c76fe 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,5 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_SOFTMMU) += platform.o
 endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
new file mode 100644
index 0000000..9987b25
--- /dev/null
+++ b/hw/vfio/platform.c
@@ -0,0 +1,599 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#include <linux/vfio.h>
+#include <sys/ioctl.h>
+
+#include "hw/vfio/vfio-platform.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "hw/sysbus.h"
+#include "trace.h"
+
+static void vfio_intp_interrupt(VFIOINTp *intp);
+typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
+static int vfio_set_trigger_eventfd(VFIOINTp *intp,
+                                    eventfd_user_side_handler_t handler);
+
+/*
+ * Functions only used when eventfd are handled on user-side
+ * ie. without irqfd
+ */
+
+/**
+ * vfio_platform_eoi - IRQ completion routine
+ * @vbasedev: the VFIO device
+ *
+ * de-asserts the active virtual IRQ and unmask the physical IRQ
+ * (masked by the  VFIO driver). Handle pending IRQs if any.
+ * eoi function is called on the first access to any MMIO region
+ * after an IRQ was triggered. It is assumed this access corresponds
+ * to the IRQ status register reset. With such a mechanism, a single
+ * IRQ can be handled at a time since there is no way to know which
+ * IRQ was completed by the guest (we would need additional details
+ * about the IRQ status register mask)
+ */
+static void vfio_platform_eoi(VFIODevice *vbasedev)
+{
+    VFIOINTp *intp;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->state == VFIO_IRQ_ACTIVE) {
+            trace_vfio_platform_eoi(intp->pin,
+                                event_notifier_get_fd(&intp->interrupt));
+            intp->state = VFIO_IRQ_INACTIVE;
+
+            /* deassert the virtual IRQ and unmask physical one */
+            qemu_set_irq(intp->qemuirq, 0);
+            vfio_unmask_irqindex(vbasedev, intp->pin);
+
+            /* a single IRQ can be active at a time */
+            break;
+        }
+    }
+    /* in case there are pending IRQs, handle them one at a time */
+    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
+        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
+        trace_vfio_platform_eoi_handle_pending(intp->pin);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+        vfio_intp_interrupt(intp);
+        qemu_mutex_lock(&vdev->intp_mutex);
+        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+    } else {
+        qemu_mutex_unlock(&vdev->intp_mutex);
+    }
+}
+
+/**
+ * vfio_mmap_set_enabled - enable/disable the fast path mode
+ * @vdev: the VFIO platform device
+ * @enabled: the target mmap state
+ *
+ * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
+ * false ~ slow path = MMIO region is trapped and region callbacks
+ * are called slow path enables to trap the IRQ status register
+ * guest reset
+*/
+
+static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
+{
+    VFIORegion *region;
+    int i;
+
+    trace_vfio_platform_mmap_set_enabled(enabled);
+
+    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+        region = vdev->regions[i];
+
+        /* register space is unmapped to trap EOI */
+        memory_region_set_enabled(&region->mmap_mem, enabled);
+    }
+}
+
+/**
+ * vfio_intp_mmap_enable - timer function, restores the fast path
+ * if there is no more active IRQ
+ * @opaque: actually points to the VFIO platform device
+ *
+ * Called on mmap timer timout, this function checks whether the
+ * IRQ is still active and in the negative restores the fast path.
+ * by construction a single eventfd is handled at a time.
+ * if the IRQ is still active, the timer is restarted.
+ */
+static void vfio_intp_mmap_enable(void *opaque)
+{
+    VFIOINTp *tmp;
+    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+        if (tmp->state == VFIO_IRQ_ACTIVE) {
+            trace_vfio_platform_intp_mmap_enable(tmp->pin);
+            /* re-program the timer to check active status later */
+            timer_mod(vdev->mmap_timer,
+                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                          vdev->mmap_timeout);
+            qemu_mutex_unlock(&vdev->intp_mutex);
+            return;
+        }
+    }
+    vfio_mmap_set_enabled(vdev, true);
+    qemu_mutex_unlock(&vdev->intp_mutex);
+}
+
+/**
+ * vfio_intp_interrupt - The user-side eventfd handler
+ * @opaque: opaque pointer which in practice is the VFIOINTp*
+ *
+ * the function can be entered
+ * - in event handler context: this IRQ is inactive
+ *   in that case, the vIRQ is injected into the guest if there
+ *   is no other active or pending IRQ.
+ * - in IOhandler context: this IRQ is pending.
+ *   there is no ACTIVE IRQ
+ */
+static void vfio_intp_interrupt(VFIOINTp *intp)
+{
+    int ret;
+    VFIOINTp *tmp;
+    VFIOPlatformDevice *vdev = intp->vdev;
+    bool delay_handling = false;
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    if (intp->state == VFIO_IRQ_INACTIVE) {
+        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+            if (tmp->state == VFIO_IRQ_ACTIVE ||
+                tmp->state == VFIO_IRQ_PENDING) {
+                delay_handling = true;
+                break;
+            }
+        }
+    }
+    if (delay_handling) {
+        /*
+         * the new IRQ gets a pending status and is pushed in
+         * the pending queue
+         */
+        intp->state = VFIO_IRQ_PENDING;
+        trace_vfio_intp_interrupt_set_pending(intp->pin);
+        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
+                             intp, pqnext);
+        ret = event_notifier_test_and_clear(&intp->interrupt);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+        return;
+    }
+
+    /* no active IRQ, the new IRQ can be forwarded to the guest */
+    trace_vfio_platform_intp_interrupt(intp->pin,
+                              event_notifier_get_fd(&intp->interrupt));
+
+    if (intp->state == VFIO_IRQ_INACTIVE) {
+        ret = event_notifier_test_and_clear(&intp->interrupt);
+        if (!ret) {
+            error_report("Error when clearing fd=%d (ret = %d)\n",
+                         event_notifier_get_fd(&intp->interrupt), ret);
+        }
+    } /* else this is a pending IRQ that moves to ACTIVE state */
+
+    intp->state = VFIO_IRQ_ACTIVE;
+
+    /* sets slow path */
+    vfio_mmap_set_enabled(vdev, false);
+
+    /* trigger the virtual IRQ */
+    qemu_set_irq(intp->qemuirq, 1);
+
+    /* schedule the mmap timer which will restore mmap path after EOI*/
+    if (vdev->mmap_timeout) {
+        timer_mod(vdev->mmap_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                      vdev->mmap_timeout);
+    }
+    qemu_mutex_unlock(&vdev->intp_mutex);
+}
+
+/**
+ * vfio_start_eventfd_injection - starts the virtual IRQ injection using
+ * user-side handled eventfds
+ * @intp: the IRQ struct pointer
+ */
+
+static int vfio_start_eventfd_injection(VFIOINTp *intp)
+{
+    int ret;
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+
+    vfio_mask_irqindex(vbasedev, intp->pin);
+
+    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
+    if (ret) {
+        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
+        vfio_unmask_irqindex(vbasedev, intp->pin);
+        return ret;
+    }
+    vfio_unmask_irqindex(vbasedev, intp->pin);
+    return 0;
+}
+
+/*
+ * Functions used whatever the injection method
+ */
+
+/**
+ * vfio_set_trigger_eventfd - set VFIO eventfd handling
+ * ie. program the VFIO driver to associates a given IRQ index
+ * with a fd handler
+ *
+ * @intp: IRQ struct pointer
+ * @handler: handler to be called on eventfd trigger
+ */
+static int vfio_set_trigger_eventfd(VFIOINTp *intp,
+                                    eventfd_user_side_handler_t handler)
+{
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+    struct vfio_irq_set *irq_set;
+    int argsz, ret;
+    int32_t *pfd;
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+    *pfd = event_notifier_get_fd(&intp->interrupt);
+    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret < 0) {
+        error_report("vfio: Failed to set trigger eventfd: %m");
+        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+    }
+    return ret;
+}
+
+/* not implemented yet */
+static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
+{
+return false;
+}
+
+/* not implemented yet */
+static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
+{
+return 0;
+}
+
+/**
+ * vfio_init_intp - allocate, initialize the IRQ struct pointer
+ * and add it into the list of IRQ
+ * @vbasedev: the VFIO device
+ * @index: VFIO device IRQ index
+ */
+static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
+{
+    int ret;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
+    VFIOINTp *intp;
+
+    /* allocate and populate a new VFIOINTp structure put in a queue list */
+    intp = g_malloc0(sizeof(*intp));
+    intp->vdev = vdev;
+    intp->pin = index;
+    intp->state = VFIO_IRQ_INACTIVE;
+    sysbus_init_irq(sbdev, &intp->qemuirq);
+
+    /* Get an eventfd for trigger */
+    ret = event_notifier_init(&intp->interrupt, 0);
+    if (ret) {
+        g_free(intp);
+        error_report("vfio: Error: trigger event_notifier_init failed ");
+        return NULL;
+    }
+
+    /* store the new intp in qlist */
+    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
+    return intp;
+}
+
+/**
+ * vfio_populate_device - initialize MMIO region and IRQ
+ * @vbasedev: the VFIO device
+ *
+ * query the VFIO device for exposed MMIO regions and IRQ and
+ * populate the associated fields in the device struct
+ */
+static int vfio_populate_device(VFIODevice *vbasedev)
+{
+    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    VFIOINTp *intp;
+    int i, ret = 0;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
+        reg_info.index = i;
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        if (ret) {
+            error_report("vfio: Error getting region %d info: %m", i);
+            goto error;
+        }
+        vdev->regions[i]->flags = reg_info.flags;
+        vdev->regions[i]->size = reg_info.size;
+        vdev->regions[i]->fd_offset = reg_info.offset;
+        vdev->regions[i]->nr = i;
+        vdev->regions[i]->vbasedev = vbasedev;
+
+        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
+                            (unsigned long)vdev->regions[i]->flags,
+                            (unsigned long)vdev->regions[i]->size,
+                            vdev->regions[i]->vbasedev->fd,
+                            (unsigned long)vdev->regions[i]->fd_offset);
+    }
+
+    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                    vfio_intp_mmap_enable, vdev);
+
+    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        irq.index = i;
+
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+        if (ret) {
+            error_printf("vfio: error getting device %s irq info",
+                         vbasedev->name);
+            return ret;
+        } else {
+            trace_vfio_platform_populate_interrupts(irq.index,
+                                                    irq.count,
+                                                    irq.flags);
+            intp = vfio_init_intp(vbasedev, irq.index);
+            if (!intp) {
+                error_report("vfio: Error installing IRQ %d up", i);
+                return ret;
+            }
+        }
+    }
+    return 0;
+error:
+    return ret;
+}
+
+/*
+ * vfio_start_irq_injection - associates a virtual irq to a
+ * VFIO IRQ index and start the injection of this IRQ
+ * @s: SysBus Device
+ * @index: VFIO IRQ index
+ * @virq: the virtual IRQ number, aka gsi
+ *
+ * this function is called when the device tree is built
+ */
+void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
+{
+    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
+    VFIOINTp *intp;
+
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->pin == index) {
+            intp->virtualID = virq;
+            vdev->start_irq_fn(intp);
+        }
+    }
+}
+
+/* specialized functions ofr VFIO Platform devices */
+static VFIODeviceOps vfio_platform_ops = {
+    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
+    .vfio_eoi = vfio_platform_eoi,
+    .vfio_populate_device = vfio_populate_device,
+};
+
+/**
+ * vfio_base_device_init - implements some of the VFIO mechanics
+ * @vbasedev: the VFIO device
+ *
+ * retrieves the group the device belongs to and get the device fd
+ * returns the VFIO device fd
+ * precondition: the device name must be initialized
+ */
+static int vfio_base_device_init(VFIODevice *vbasedev)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev_iter;
+    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+    ssize_t len;
+    struct stat st;
+    int groupid;
+    int ret;
+
+    /* name must be set prior to the call */
+    if (!vbasedev->name) {
+        return -EINVAL;
+    }
+
+    /* Check that the host device exists */
+    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
+             vbasedev->name);
+
+    if (stat(path, &st) < 0) {
+        error_report("vfio: error: no such host device: %s", path);
+        return -errno;
+    }
+
+    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
+    len = readlink(path, iommu_group_path, sizeof(path));
+    if (len <= 0 || len >= sizeof(path)) {
+        error_report("vfio: error no iommu_group for device");
+        return len < 0 ? -errno : ENAMETOOLONG;
+    }
+
+    iommu_group_path[len] = 0;
+    group_name = basename(iommu_group_path);
+
+    if (sscanf(group_name, "%d", &groupid) != 1) {
+        error_report("vfio: error reading %s: %m", path);
+        return -errno;
+    }
+
+    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
+
+    group = vfio_get_group(groupid, &address_space_memory);
+    if (!group) {
+        error_report("vfio: failed to get group %d", groupid);
+        return -ENOENT;
+    }
+
+    snprintf(path, sizeof(path), "%s", vbasedev->name);
+
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
+            error_report("vfio: error: device %s is already attached", path);
+            vfio_put_group(group);
+            return -EBUSY;
+        }
+    }
+    ret = vfio_get_device(group, path, vbasedev);
+    if (ret) {
+        error_report("vfio: failed to get device %s", path);
+        vfio_put_group(group);
+    }
+    return ret;
+}
+
+/**
+ * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
+ * given index
+ * @vdev: the VFIO platform device
+ * @nr: the index of the region
+ *
+ * init the top memory region and the mmapped memroy region beneath
+ * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
+ * and could not be passed to memory region functions
+*/
+static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
+{
+    VFIORegion *region = vdev->regions[nr];
+    unsigned size = region->size;
+    char name[64];
+
+    if (!size) {
+        return;
+    }
+
+    snprintf(name, sizeof(name), "VFIO %s region %d",
+             vdev->vbasedev.name, nr);
+
+    /* A "slow" read/write mapping underlies all regions */
+    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
+                          region, name, size);
+
+    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
+
+    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
+                         &region->mmap_mem, &region->mmap, size, 0, name)) {
+        error_report("%s unsupported. Performance may be slow", name);
+    }
+}
+
+/**
+ * vfio_platform_realize  - the device realize function
+ * @dev: device state pointer
+ * @errp: error
+ *
+ * initialize the device, its memory regions and IRQ structures
+ * IRQ are started separately
+ */
+static void vfio_platform_realize(DeviceState *dev, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    int i, ret;
+
+    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
+    vbasedev->ops = &vfio_platform_ops;
+    vdev->start_irq_fn = vfio_start_eventfd_injection;
+
+    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
+
+    ret = vfio_base_device_init(vbasedev);
+    if (ret) {
+        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
+                   vbasedev->name);
+        return;
+    }
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vfio_map_region(vdev, i);
+        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
+    }
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+    .name = TYPE_VFIO_PLATFORM,
+    .unmigratable = 1,
+};
+
+static Property vfio_platform_dev_properties[] = {
+    DEFINE_PROP_STRING("host", VFIOPlatformDevice, vbasedev.name),
+    DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
+    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
+                       mmap_timeout, 1100),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vfio_platform_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = vfio_platform_realize;
+    dc->props = vfio_platform_dev_properties;
+    dc->vmsd = &vfio_platform_vmstate;
+    dc->desc = "VFIO-based platform device assignment";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vfio_platform_dev_info = {
+    .name = TYPE_VFIO_PLATFORM,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VFIOPlatformDevice),
+    .class_init = vfio_platform_class_init,
+    .class_size = sizeof(VFIOPlatformDeviceClass),
+    .abstract   = true,
+};
+
+static void register_vfio_platform_dev_type(void)
+{
+    type_register_static(&vfio_platform_dev_info);
+}
+
+type_init(register_vfio_platform_dev_type)
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
new file mode 100644
index 0000000..c7e10cc
--- /dev/null
+++ b/include/hw/vfio/vfio-platform.h
@@ -0,0 +1,79 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#ifndef HW_VFIO_VFIO_PLATFORM_H
+#define HW_VFIO_VFIO_PLATFORM_H
+
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio-common.h"
+#include "qemu/event_notifier.h"
+#include "qemu/queue.h"
+#include "hw/irq.h"
+
+#define TYPE_VFIO_PLATFORM "vfio-platform"
+
+enum {
+    VFIO_IRQ_INACTIVE = 0,
+    VFIO_IRQ_PENDING = 1,
+    VFIO_IRQ_ACTIVE = 2,
+    /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
+};
+
+typedef struct VFIOINTp {
+    QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
+    QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
+    EventNotifier interrupt; /* eventfd triggered on interrupt */
+    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
+    qemu_irq qemuirq;
+    struct VFIOPlatformDevice *vdev; /* back pointer to device */
+    int state; /* inactive, pending, active */
+    bool kvm_accel; /* set when QEMU bypass through KVM enabled */
+    uint8_t pin; /* index */
+    uint8_t virtualID; /* virtual IRQ */
+} VFIOINTp;
+
+typedef int (*start_irq_fn_t)(VFIOINTp *intp);
+
+typedef struct VFIOPlatformDevice {
+    SysBusDevice sbdev;
+    VFIODevice vbasedev; /* not a QOM object */
+    VFIORegion **regions;
+    QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
+    /* queue of pending IRQ */
+    QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
+    char *compat; /* compatibility string */
+    uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
+    QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
+    start_irq_fn_t start_irq_fn;
+    QemuMutex  intp_mutex;
+} VFIOPlatformDevice;
+
+
+typedef struct VFIOPlatformDeviceClass {
+    /*< private >*/
+    SysBusDeviceClass parent_class;
+    /*< public >*/
+} VFIOPlatformDeviceClass;
+
+#define VFIO_PLATFORM_DEVICE(obj) \
+     OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
+
+void vfio_start_irq_injection(SysBusDevice *dev, int index, int virq);
+
+#endif /*HW_VFIO_VFIO_PLATFORM_H*/
diff --git a/trace-events b/trace-events
index c67ebc5..b0411e9 100644
--- a/trace-events
+++ b/trace-events
@@ -1377,6 +1377,18 @@ vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
+# hw/vfio/platform.c
+vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
+vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
+vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow path"
+vfio_platform_intp_interrupt(int pin, int fd) "Handle IRQ #%d (fd = %d)"
+vfio_platform_populate_interrupts(int pin, int count, int flags) "- IRQ index %d: count %d, flags=0x%x"
+vfio_platform_populate_regions(int region_index, unsigned long flag, unsigned long size, int fd, unsigned long offset) "- region %d flags = 0x%lx, size = 0x%lx, fd= %d, offset = 0x%lx"
+vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
+vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
+vfio_intp_interrupt_set_pending(int index) "irq %d is set PENDING"
+vfio_platform_eoi_handle_pending(int index) "handle PENDING IRQ %d"
+
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
 mhp_acpi_read_addr_lo(uint32_t slot, uint32_t addr) "slot[0x%"PRIx32"] addr lo: 0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 10/16] hw/vfio: calxeda xgmac device
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (8 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

The platform device class has become abstract. The device can be be
instantiated on command line using such option.

-device vfio-calxeda-xgmac,host="fff51000.ethernet"
compat string is hardcoded in the code except if user overrides it

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5 -> v6
- back again following Alex Graf advises
- fix a bug related to compat override

v4 -> v5:
removed since device tree was moved to hw/arm/dyn_sysbus_devtree.c

v4: creation for device tree specialization
---
 hw/vfio/Makefile.objs                |  1 +
 hw/vfio/calxeda_xgmac.c              | 57 ++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-calxeda-xgmac.h | 41 ++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)
 create mode 100644 hw/vfio/calxeda_xgmac.c
 create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index c5c76fe..913ab14 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 obj-$(CONFIG_SOFTMMU) += platform.o
+obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o
 endif
diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c
new file mode 100644
index 0000000..5e655ae
--- /dev/null
+++ b/hw/vfio/calxeda_xgmac.c
@@ -0,0 +1,57 @@
+/*
+ * calxeda xgmac example VFIO device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger <eric.auger@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/vfio/vfio-calxeda-xgmac.h"
+
+static void calxeda_xgmac_realize(DeviceState *dev, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+    VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev);
+    const char compat[] = "calxeda,hb-xgmac";
+
+    if (vdev->compat == NULL) {
+        vdev->compat = g_strdup(compat);
+    } /* else use user-provided compat string */
+
+    k->parent_realize(dev, errp);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+    .name = TYPE_VFIO_CALXEDA_XGMAC,
+    .unmigratable = 1,
+};
+
+static void vfio_calxeda_xgmac_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VFIOCalxedaXgmacDeviceClass *vcxc =
+        VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass);
+    vcxc->parent_realize = dc->realize;
+    dc->realize = calxeda_xgmac_realize;
+    dc->desc = "VFIO Calxeda XGMAC";
+}
+
+static const TypeInfo vfio_calxeda_xgmac_dev_info = {
+    .name = TYPE_VFIO_CALXEDA_XGMAC,
+    .parent = TYPE_VFIO_PLATFORM,
+    .instance_size = sizeof(VFIOCalxedaXgmacDevice),
+    .class_init = vfio_calxeda_xgmac_class_init,
+    .class_size = sizeof(VFIOCalxedaXgmacDeviceClass),
+};
+
+static void register_calxeda_xgmac_dev_type(void)
+{
+    type_register_static(&vfio_calxeda_xgmac_dev_info);
+}
+
+type_init(register_calxeda_xgmac_dev_type)
diff --git a/include/hw/vfio/vfio-calxeda-xgmac.h b/include/hw/vfio/vfio-calxeda-xgmac.h
new file mode 100644
index 0000000..1529cf5
--- /dev/null
+++ b/include/hw/vfio/vfio-calxeda-xgmac.h
@@ -0,0 +1,41 @@
+/*
+ * VFIO calxeda xgmac device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger <eric.auger@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef HW_VFIO_VFIO_CALXEDA_XGMAC_H
+#define HW_VFIO_VFIO_CALXEDA_XGMAC_H
+
+#include "hw/vfio/vfio-platform.h"
+
+#define TYPE_VFIO_CALXEDA_XGMAC "vfio-calxeda-xgmac"
+
+typedef struct VFIOCalxedaXgmacDevice {
+    VFIOPlatformDevice vdev;
+} VFIOCalxedaXgmacDevice;
+
+typedef struct VFIOCalxedaXgmacDeviceClass {
+    /*< private >*/
+    VFIOPlatformDeviceClass parent_class;
+    /*< public >*/
+    DeviceRealize parent_realize;
+} VFIOCalxedaXgmacDeviceClass;
+
+#define VFIO_CALXEDA_XGMAC_DEVICE(obj) \
+     OBJECT_CHECK(VFIOCalxedaXgmacDevice, (obj), TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(VFIOCalxedaXgmacDeviceClass, (klass), \
+                        TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(VFIOCalxedaXgmacDeviceClass, (obj), \
+                      TYPE_VFIO_CALXEDA_XGMAC)
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (9 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 10/16] hw/vfio: calxeda xgmac device Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-10 13:12   ` Alexander Graf
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 12/16] vfio/platform: add fake injection modality Eric Auger
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

vfio-calxeda-xgmac now can be instantiated using the -device option

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v2 -> v3:
- correct bug of reg_attr[2*i] in vfio_fdt_add_device_node
- fix a bug related to compat_str_len computed on original compat
  instead of corrected compat
- wrap_vfio_fdt_add_node take a node creation function: this function
  needs to be specialized for each VFIO device. wrap function must be
  called in sysbus_device_create_devtree
---
 hw/arm/dyn_sysbus_devtree.c | 141 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
index 61e5b5f..3ef9430 100644
--- a/hw/arm/dyn_sysbus_devtree.c
+++ b/hw/arm/dyn_sysbus_devtree.c
@@ -20,6 +20,141 @@
 #include "hw/arm/dyn_sysbus_devtree.h"
 #include "qemu/error-report.h"
 #include "sysemu/device_tree.h"
+#include "hw/vfio/vfio-platform.h"
+#include "hw/vfio/vfio-calxeda-xgmac.h"
+
+typedef void (*vfio_fdt_add_device_node_t)(SysBusDevice *sbdev, void *opaque);
+
+static char *format_compat(char * compat)
+{
+    char *str_ptr, *corrected_compat;
+    /*
+     * process compatibility property string passed by end-user
+     * replaces / by , and ; by NUL character
+     */
+    corrected_compat = g_strdup(compat);
+
+    str_ptr = corrected_compat;
+    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
+        *str_ptr = ',';
+    }
+
+    /* substitute ";" with the NUL char */
+    str_ptr = corrected_compat;
+    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
+        *str_ptr = '\0';
+    }
+
+    /*
+     * corrected compat includes a "\0" before or at the same location
+     * as compat's one
+     */
+    return corrected_compat;
+}
+
+static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque,
+                                   vfio_fdt_add_device_node_t add_node_fn)
+{
+    PlatformDevtreeData *data = opaque;
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    gchar irq_number_prop[8];
+    Object *obj = OBJECT(sbdev);
+    char *corrected_compat;
+    uint64_t irq_number;
+    int corrected_compat_str_len, i;
+
+    corrected_compat = format_compat(vdev->compat);
+    corrected_compat_str_len = strlen(corrected_compat) + 1;
+    /* we copy the corrected_compat string + its "\0" */
+    snprintf(vdev->compat, corrected_compat_str_len, "%s", corrected_compat);
+    g_free(corrected_compat);
+
+    add_node_fn(sbdev, opaque);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+                                                 + data->irq_start;
+        /*
+         * for setting irqfd up we must provide the virtual IRQ number
+         * which is the sum of irq_start and actual platform bus irq
+         * index. At realize point we do not have this info.
+         */
+        vfio_start_irq_injection(sbdev, i, irq_number);
+    }
+}
+
+static void vfio_basic_fdt_add_device_node(SysBusDevice *sbdev,
+                                                    void *opaque)
+{
+    PlatformDevtreeData *data = opaque;
+    void *fdt = data->fdt;
+    const char *parent_node = data->node;
+    int compat_str_len;
+    char *nodename;
+    int i, ret;
+    uint32_t *irq_attr;
+    uint64_t *reg_attr;
+    uint64_t mmio_base;
+    uint64_t irq_number;
+    gchar mmio_base_prop[8];
+    gchar irq_number_prop[8];
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    Object *obj = OBJECT(sbdev);
+
+    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+                               vbasedev->name,
+                               mmio_base);
+
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    compat_str_len = strlen(vdev->compat) + 1;
+    qemu_fdt_setprop(fdt, nodename, "compatible",
+                          vdev->compat, compat_str_len);
+
+    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+        mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+        reg_attr[4*i] = 1;
+        reg_attr[4*i+1] = mmio_base;
+        reg_attr[4*i+2] = 1;
+        reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
+    }
+
+    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+                     vbasedev->num_regions*2, reg_attr);
+    if (ret < 0) {
+        error_report("could not set reg property of node %s", nodename);
+    }
+
+    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
+        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
+                                                 + data->irq_start;
+        irq_attr[3*i] = cpu_to_be32(0);
+        irq_attr[3*i+1] = cpu_to_be32(irq_number);
+        irq_attr[3*i+2] = cpu_to_be32(0x4);
+    }
+
+   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+    if (ret < 0) {
+        error_report("could not set interrupts property of node %s",
+                     nodename);
+    }
+
+    g_free(nodename);
+    g_free(irq_attr);
+    g_free(reg_attr);
+}
 
 /**
  * arm_sysbus_device_create_devtree - create the node of devices
@@ -41,6 +176,12 @@ static int arm_sysbus_device_create_devtree(Object *obj, void *opaque)
                                     arm_sysbus_device_create_devtree, data);
     }
 
+    if (object_dynamic_cast(obj, TYPE_VFIO_CALXEDA_XGMAC)) {
+        wrap_vfio_fdt_add_node(sbdev, data,
+                               vfio_basic_fdt_add_device_node);
+        matched = true;
+    }
+
     if (!matched) {
         error_report("Device %s is not supported by this machine yet.",
                      qdev_fw_name(DEVICE(dev)));
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 12/16] vfio/platform: add fake injection modality
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (10 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 13/16] hw/vfio/platform: Add irqfd support Eric Auger
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

This code is aimed at testing multiple IRQ injection with
user-side handled eventfds. Principle is a timer periodically
triggers an IRQ at VFIO driver level. Then this IRQ follows
regular VFIO driver -> eventfd trigger -> user-side eventfd handler.
The IRQ is not injected into the guest. the IRQ is completed
on another timer timeout to emulate eoi on write/read access.

for instance, following options
 x-fake-irq[0]=1,x-fake-period[0]=10,x-fake-duration[0]=50,
x-fake-irq[1]=2,x-fake-period[i]=20,x-fake-duration[1]=100
set vfio platform IRQ indexed #1 and #2 as fake IRQ

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

this modality was used to test calxeda xgmac assignment with
main IRQ generated by the HW and IRQ #1 and #2 as fake IRQs
---
 hw/vfio/platform.c              | 131 +++++++++++++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-platform.h |  13 ++++
 trace-events                    |   3 +
 3 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 9987b25..93aa94a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -25,6 +25,8 @@
 #include "hw/sysbus.h"
 #include "trace.h"
 
+#define MAX_FAKE_INTP 5
+
 static void vfio_intp_interrupt(VFIOINTp *intp);
 typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
 static int vfio_set_trigger_eventfd(VFIOINTp *intp,
@@ -141,6 +143,27 @@ static void vfio_intp_mmap_enable(void *opaque)
 }
 
 /**
+ * vfio_fake_intp_index - returns the fake IRQ index
+ *
+ * @intp the interrupt struct pointer
+ * if the IRQ is not fake, returns < 0
+ * if it is fake returns the index of the fake IRQ
+ * ie the index i for which x-fake-irq[i]=intp->pin
+ */
+static int vfio_fake_intp_index(VFIOINTp *intp)
+{
+    VFIOPlatformDevice *vdev = intp->vdev;
+    int i;
+
+    for (i = 0; i < MAX_FAKE_INTP; i++) {
+        if (intp->pin == vdev->fake_intp_index[i]) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+/**
  * vfio_intp_interrupt - The user-side eventfd handler
  * @opaque: opaque pointer which in practice is the VFIOINTp*
  *
@@ -199,8 +222,18 @@ static void vfio_intp_interrupt(VFIOINTp *intp)
     /* sets slow path */
     vfio_mmap_set_enabled(vdev, false);
 
-    /* trigger the virtual IRQ */
-    qemu_set_irq(intp->qemuirq, 1);
+    if (intp->fake_intp_index < 0) {
+        /* trigger the virtual IRQ */
+        qemu_set_irq(intp->qemuirq, 1);
+    } else {
+        /*
+         * the vIRQ is not triggered but we emulate a handling
+         * duration
+         */
+        timer_mod(intp->fake_eoi_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                      intp->fake_intp_duration);
+    }
 
     /* schedule the mmap timer which will restore mmap path after EOI*/
     if (vdev->mmap_timeout) {
@@ -231,9 +264,64 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp)
         return ret;
     }
     vfio_unmask_irqindex(vbasedev, intp->pin);
+
+    /* in case of fake irq, starts its injection */
+    if (intp->fake_intp_index >= 0) {
+        timer_mod(intp->fake_intp_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  intp->fake_intp_period);
+    }
     return 0;
 }
 
+/**
+ * vfio_fake_intp_eoi - fake interrupt completion routine
+ * @opaque: actually is an IRQ struct pointer
+ *
+ * called on timer handler context
+ */
+static void vfio_fake_intp_eoi(void *opaque)
+{
+    VFIOINTp *intp = (VFIOINTp *)opaque;
+    trace_vfio_fake_intp_eoi(intp->pin);
+    vfio_platform_eoi(&intp->vdev->vbasedev);
+}
+
+/**
+ * vfio_fake_intp_eoi - fake interrupt injection routine
+ * @opaque: actually is an IRQ struct pointer
+ *
+ * called on timer context
+ * use the VFIO loopback mode, ie. triggers the eventfd
+ * associated to the intp->pin although no physical IRQ hit.
+ */
+static void vfio_fake_intp_injection(void *opaque)
+{
+    VFIOINTp *intp = (VFIOINTp *)opaque;
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+    struct vfio_irq_set *irq_set;
+    int argsz, ret;
+    int32_t *pfd;
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret < 0) {
+        error_report("vfio: Failed to trigger fake IRQ: %m");
+    } else {
+        trace_vfio_fake_intp_injection(intp->pin);
+        timer_mod(intp->fake_intp_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                      intp->fake_intp_period);
+    }
+}
+
 /*
  * Functions used whatever the injection method
  */
@@ -304,6 +392,23 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
     intp->vdev = vdev;
     intp->pin = index;
     intp->state = VFIO_IRQ_INACTIVE;
+    intp->fake_intp_index = vfio_fake_intp_index(intp);
+
+    if (intp->fake_intp_index >= 0) {
+        intp->fake_intp_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                         vfio_fake_intp_injection,
+                                         intp);
+        intp->fake_eoi_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                         vfio_fake_intp_eoi,
+                                         intp);
+        intp->fake_intp_period  =
+            vdev->fake_intp_period[intp->fake_intp_index];
+        intp->fake_intp_duration  =
+            vdev->fake_intp_duration[intp->fake_intp_index];
+        trace_vfio_init_intp_fake(intp->fake_intp_index,
+                                  intp->fake_intp_period,
+                                  intp->fake_intp_duration);
+    }
     sysbus_init_irq(sbdev, &intp->qemuirq);
 
     /* Get an eventfd for trigger */
@@ -524,6 +629,20 @@ static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
     }
 }
 
+static void vfio_platform_initfn(Object *obj)
+{
+    int i;
+
+    qdev_prop_set_uint32(DEVICE(obj), "len-x-fake-irq", MAX_FAKE_INTP);
+    qdev_prop_set_uint32(DEVICE(obj), "len-x-fake-period", MAX_FAKE_INTP);
+    qdev_prop_set_uint32(DEVICE(obj), "len-x-fake-duration", MAX_FAKE_INTP);
+
+    for (i = 0; i < MAX_FAKE_INTP; i++) {
+        char *propname = g_strdup_printf("x-fake-irq[%d]", i);
+        qdev_prop_set_uint32(DEVICE(obj), propname, -1);
+    }
+}
+
 /**
  * vfio_platform_realize  - the device realize function
  * @dev: device state pointer
@@ -566,6 +685,13 @@ static const VMStateDescription vfio_platform_vmstate = {
 static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_STRING("host", VFIOPlatformDevice, vbasedev.name),
     DEFINE_PROP_STRING("compat", VFIOPlatformDevice, compat),
+    DEFINE_PROP_ARRAY("x-fake-irq", VFIOPlatformDevice, len_x_fake_irq,
+                      fake_intp_index, qdev_prop_uint32, uint32_t),
+    DEFINE_PROP_ARRAY("x-fake-period", VFIOPlatformDevice, len_x_fake_period,
+                      fake_intp_period, qdev_prop_uint32, uint32_t),
+    DEFINE_PROP_ARRAY("x-fake-duration", VFIOPlatformDevice,
+                      len_x_fake_duration, fake_intp_duration,
+                      qdev_prop_uint32, uint32_t),
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
     DEFINE_PROP_END_OF_LIST(),
@@ -587,6 +713,7 @@ static const TypeInfo vfio_platform_dev_info = {
     .parent = TYPE_SYS_BUS_DEVICE,
     .instance_size = sizeof(VFIOPlatformDevice),
     .class_init = vfio_platform_class_init,
+    .instance_init = vfio_platform_initfn,
     .class_size = sizeof(VFIOPlatformDeviceClass),
     .abstract   = true,
 };
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index c7e10cc..95ece9d 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -42,6 +42,12 @@ typedef struct VFIOINTp {
     bool kvm_accel; /* set when QEMU bypass through KVM enabled */
     uint8_t pin; /* index */
     uint8_t virtualID; /* virtual IRQ */
+    /* fake irq injection test modality */
+    int fake_intp_index;
+    QEMUTimer *fake_intp_timer; /* fake IRQ injection timer */
+    QEMUTimer *fake_eoi_timer; /* timer to handle fake IRQ completion */
+    uint32_t fake_intp_period; /* delay between fake IRQ injections */
+    uint32_t fake_intp_duration; /* duration of the IRQ */
 } VFIOINTp;
 
 typedef int (*start_irq_fn_t)(VFIOINTp *intp);
@@ -58,6 +64,13 @@ typedef struct VFIOPlatformDevice {
     QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
     start_irq_fn_t start_irq_fn;
     QemuMutex  intp_mutex;
+    /* fake irq injection test modality */
+    int32_t *fake_intp_index; /* array of fake IRQ indexes */
+    uint32_t *fake_intp_period; /* delay between fake IRQ injections */
+    uint32_t *fake_intp_duration; /* duration of the vIRQ handling*/
+    uint32_t len_x_fake_irq;
+    uint32_t len_x_fake_period;
+    uint32_t len_x_fake_duration;
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index b0411e9..61f3cba 100644
--- a/trace-events
+++ b/trace-events
@@ -1387,7 +1387,10 @@ vfio_platform_populate_regions(int region_index, unsigned long flag, unsigned lo
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
 vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
 vfio_intp_interrupt_set_pending(int index) "irq %d is set PENDING"
+vfio_fake_intp_injection(int index) "fake irq %d injected"
 vfio_platform_eoi_handle_pending(int index) "handle PENDING IRQ %d"
+vfio_fake_intp_eoi(int index) "eoi fake IRQ %d"
+vfio_init_intp_fake(int index, int period, int duration) "fake irq index = %d, duration = %d, period=%d"
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 13/16] hw/vfio/platform: Add irqfd support
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (11 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 12/16] vfio/platform: add fake injection modality Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

This patch aims at optimizing IRQ handling using irqfd framework.

Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.

the virtual IRQ completion is trapped at interrupt controller
This removes the need for fast/slow path swap.

Overall this brings significant performance improvements.

it depends on host kernel KVM irqfd.

Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5 -> v6
- rely on kvm_irqfds_enabled() and kvm_resamplefds_enabled()
- guard KVM code with #ifdef CONFIG_KVM

v3 -> v4:
[Alvise Rigo]
Use of VFIO Platform driver v6 unmask/virqfd feature and removal
of resamplefd handler. Physical IRQ unmasking is now done in
VFIO driver.

v3:
[Eric Auger]
initial support with resamplefd handled on QEMU side since the
unmask was not supported on VFIO platform driver v5.
---
 hw/vfio/platform.c              | 96 +++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  1 +
 trace-events                    |  2 +
 3 files changed, 99 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 93aa94a..a59a842 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -24,6 +24,7 @@
 #include "qemu/queue.h"
 #include "hw/sysbus.h"
 #include "trace.h"
+#include "sysemu/kvm.h"
 
 #define MAX_FAKE_INTP 5
 
@@ -323,6 +324,83 @@ static void vfio_fake_intp_injection(void *opaque)
 }
 
 /*
+ * Functions used for irqfd
+ */
+
+#ifdef CONFIG_KVM
+
+/**
+ * vfio_set_resample_eventfd - sets the resamplefd for an IRQ
+ * @intp: the IRQ struct pointer
+ * programs the VFIO driver to unmask this IRQ when the
+ * intp->unmask eventfd is triggered
+ */
+static int vfio_set_resample_eventfd(VFIOINTp *intp)
+{
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+    struct vfio_irq_set *irq_set;
+    int argsz, ret;
+    int32_t *pfd;
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+    *pfd = event_notifier_get_fd(&intp->unmask);
+    qemu_set_fd_handler(*pfd, NULL, NULL, intp);
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret < 0) {
+        error_report("vfio: Failed to set resample eventfd: %m");
+        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+    }
+    return ret;
+}
+
+/**
+ * vfio_start_irqfd_injection - starts irqfd injection for an IRQ
+ * programs VFIO driver with both the trigger and resamplefd
+ * programs KVM with the gsi, trigger & resample eventfds
+ */
+static int vfio_start_irqfd_injection(VFIOINTp *intp)
+{
+    struct kvm_irqfd irqfd = {
+        .fd = event_notifier_get_fd(&intp->interrupt),
+        .resamplefd = event_notifier_get_fd(&intp->unmask),
+        .gsi = intp->virtualID,
+        .flags = KVM_IRQFD_FLAG_RESAMPLE,
+    };
+
+    if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
+        error_report("vfio: Error: Failed to assign the irqfd: %m");
+        goto fail_irqfd;
+    }
+    if (vfio_set_trigger_eventfd(intp, NULL) < 0) {
+        goto fail_vfio;
+    }
+    if (vfio_set_resample_eventfd(intp) < 0) {
+        goto fail_vfio;
+    }
+
+    intp->kvm_accel = true;
+    trace_vfio_platform_start_irqfd_injection(intp->pin, intp->virtualID,
+                                     irqfd.fd, irqfd.resamplefd);
+    return 0;
+
+fail_vfio:
+    irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+    kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+fail_irqfd:
+    return -1;
+}
+
+#endif
+
+/*
  * Functions used whatever the injection method
  */
 
@@ -418,6 +496,13 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
         error_report("vfio: Error: trigger event_notifier_init failed ");
         return NULL;
     }
+    /* Get an eventfd for resample/unmask */
+    ret = event_notifier_init(&intp->unmask, 0);
+    if (ret) {
+        g_free(intp);
+        error_report("vfio: Error: resample event_notifier_init failed eoi");
+        return NULL;
+    }
 
     /* store the new intp in qlist */
     QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
@@ -660,7 +745,17 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
 
     vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
     vbasedev->ops = &vfio_platform_ops;
+
+#ifdef CONFIG_KVM
+    if (kvm_irqfds_enabled() && kvm_resamplefds_enabled() &&
+        vdev->irqfd_allowed) {
+        vdev->start_irq_fn = vfio_start_irqfd_injection;
+    } else {
+        vdev->start_irq_fn = vfio_start_eventfd_injection;
+    }
+#else
     vdev->start_irq_fn = vfio_start_eventfd_injection;
+#endif
 
     trace_vfio_platform_realize(vbasedev->name, vdev->compat);
 
@@ -694,6 +789,7 @@ static Property vfio_platform_dev_properties[] = {
                       qdev_prop_uint32, uint32_t),
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
+    DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index 95ece9d..e896c86 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -71,6 +71,7 @@ typedef struct VFIOPlatformDevice {
     uint32_t len_x_fake_irq;
     uint32_t len_x_fake_period;
     uint32_t len_x_fake_duration;
+    bool irqfd_allowed; /* debug option to force irqfd on/off */
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index 61f3cba..1b81b66 100644
--- a/trace-events
+++ b/trace-events
@@ -1391,6 +1391,8 @@ vfio_fake_intp_injection(int index) "fake irq %d injected"
 vfio_platform_eoi_handle_pending(int index) "handle PENDING IRQ %d"
 vfio_fake_intp_eoi(int index) "eoi fake IRQ %d"
 vfio_init_intp_fake(int index, int period, int duration) "fake irq index = %d, duration = %d, period=%d"
+vfio_platform_start_irqfd_injection(int index, int gsi, int fd, int resamplefd) "IRQ index=%d, gsi =%d, fd = %d, resamplefd = %d"
+vfio_start_eventfd_injection(int index, int fd) "IRQ index=%d, fd = %d"
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (12 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 13/16] hw/vfio/platform: Add irqfd support Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 15/16] VFIO: COMMON: vfio_kvm_device_fd moved in the common header Eric Auger
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

Syncup KVM related linux headers from linux-next tree using
scripts/update-linux-headers.sh.

Integrate updated KVM-VFIO API related to forwarded IRQ

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 linux-headers/linux/kvm.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f5d2c38..42128d5 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -940,6 +940,12 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
+struct kvm_arch_forwarded_irq {
+        __u32 fd; /* file desciptor of the VFIO device */
+        __u32 index; /* VFIO device IRQ index */
+        __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 #define KVM_DEV_TYPE_FSL_MPIC_20	1
 #define KVM_DEV_TYPE_FSL_MPIC_42	2
 #define KVM_DEV_TYPE_XICS		3
@@ -947,6 +953,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_DEVICE			2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
 #define KVM_DEV_TYPE_ARM_VGIC_V2	5
 #define KVM_DEV_TYPE_FLIC		6
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 15/16] VFIO: COMMON: vfio_kvm_device_fd moved in the common header
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (13 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 16/16] VFIO: PLATFORM: add forwarded irq support Eric Auger
  2014-09-11 22:14 ` [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Alex Williamson
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

the device is now used in platform for forwarded IRQ setup

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c              | 3 ++-
 include/hw/vfio/vfio-common.h | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 252c0b8..466b0e8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,9 +44,10 @@ struct vfio_as_head vfio_address_spaces =
  * initialized, this file descriptor is only released on QEMU exit and
  * we'll re-use it should another vfio device be attached before then.
  */
-static int vfio_kvm_device_fd = -1;
+int vfio_kvm_device_fd = -1;
 #endif
 
+
 /*
  * Common VFIO interrupt disable
  */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 83c7876..0ae0153 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -41,6 +41,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+#ifdef CONFIG_KVM
+extern int vfio_kvm_device_fd;
+#endif
+
+
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v6 16/16] VFIO: PLATFORM: add forwarded irq support
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (14 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 15/16] VFIO: COMMON: vfio_kvm_device_fd moved in the common header Eric Auger
@ 2014-09-09  7:31 ` Eric Auger
  2014-09-11 22:14 ` [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Alex Williamson
  16 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-09  7:31 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, a.rigo, kim.phillips,
	marc.zyngier, manish.jaggi, joel.schopp, agraf, peter.maydell,
	pbonzini, afaerber
  Cc: patches, eric.auger, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

Tests whether the forwarded IRQ modality is available.
In the positive device IRQs are forwarded. This control is
achieved with KVM-VFIO device. with such a modality injection
still is handled through irqfds. However end of interrupt is
not trapped anymore. As soon as the guest completes its virtual
IRQ, the corresponding physical IRQ is completed and the same
physical IRQ can hit again.

A new x-forward property enables to force forwarding off although
enabled by the kernel.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/platform.c              | 52 +++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  2 ++
 trace-events                    |  1 +
 3 files changed, 55 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a59a842..2d07d2f 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -324,6 +324,52 @@ static void vfio_fake_intp_injection(void *opaque)
 }
 
 /*
+ * Functions used with forwarding capability
+ */
+
+#ifdef CONFIG_KVM
+
+static bool has_kvm_vfio_forward_capability(void)
+{
+    struct kvm_device_attr attr = {
+         .group = KVM_DEV_VFIO_DEVICE,
+         .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+    if (ioctl(vfio_kvm_device_fd, KVM_HAS_DEVICE_ATTR, &attr) == 0) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+static int vfio_set_forwarding(VFIOINTp *intp)
+{
+    int ret;
+    struct kvm_device_attr attr = {
+         .group = KVM_DEV_VFIO_DEVICE,
+         .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+    intp->fwd_irq = g_malloc0(sizeof(*intp->fwd_irq));
+    intp->fwd_irq->fd = intp->vdev->vbasedev.fd;
+    intp->fwd_irq->index = intp->pin;
+    intp->fwd_irq->gsi = intp->virtualID;
+
+    attr.addr = (uint64_t)(unsigned long)intp->fwd_irq;
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+            error_report("Failed to forward IRQ %d through KVM VFIO device",
+                         intp->pin);
+            g_free(intp->fwd_irq);
+            return -errno;
+    }
+    trace_vfio_start_fwd_injection(intp->pin);
+
+    return ret;
+}
+
+#endif
+
+/*
  * Functions used for irqfd
  */
 
@@ -375,6 +421,11 @@ static int vfio_start_irqfd_injection(VFIOINTp *intp)
         .flags = KVM_IRQFD_FLAG_RESAMPLE,
     };
 
+    if (has_kvm_vfio_forward_capability() &&
+                 intp->vdev->forward_allowed) {
+        vfio_set_forwarding(intp);
+    }
+
     if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
         error_report("vfio: Error: Failed to assign the irqfd: %m");
         goto fail_irqfd;
@@ -790,6 +841,7 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
     DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+    DEFINE_PROP_BOOL("x-forward", VFIOPlatformDevice, forward_allowed, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index e896c86..6c46295 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -48,6 +48,7 @@ typedef struct VFIOINTp {
     QEMUTimer *fake_eoi_timer; /* timer to handle fake IRQ completion */
     uint32_t fake_intp_period; /* delay between fake IRQ injections */
     uint32_t fake_intp_duration; /* duration of the IRQ */
+    struct kvm_arch_forwarded_irq *fwd_irq;
 } VFIOINTp;
 
 typedef int (*start_irq_fn_t)(VFIOINTp *intp);
@@ -72,6 +73,7 @@ typedef struct VFIOPlatformDevice {
     uint32_t len_x_fake_period;
     uint32_t len_x_fake_duration;
     bool irqfd_allowed; /* debug option to force irqfd on/off */
+    bool forward_allowed; /* debug option to force forwarding on/off */
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index 1b81b66..29e03d2 100644
--- a/trace-events
+++ b/trace-events
@@ -1378,6 +1378,7 @@ vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions,
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 # hw/vfio/platform.c
+vfio_start_fwd_injection(int pin) "forwarding set for IRQ pin %d"
 vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
 vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
 vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow path"
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module Eric Auger
@ 2014-09-10 13:09   ` Alexander Graf
  2014-09-11 12:11     ` Eric Auger
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Graf @ 2014-09-10 13:09 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm



On 09.09.14 09:31, Eric Auger wrote:
> A new common module is created. It implements all functions
> that have no device specificity (PCI, Platform).
> 
> This patch only consists in move (no functional changes)
> 
> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v5 -> v6:
> - follow all evolutions of original PCI code from v5 to V6
> - move declaration of vfio_region_ops, vfio_memory_listener,
>   vfio_group_list, vfio_address_spaces into vfio-common.h
> 
> v4 -> v5:
> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
> - VFIOdevice .name dealloc removed from vfio_put_base_device
> - add some includes according to vfio inclusion policy
> 
> v3 -> v4:
> [Eric Auger]
> move done after all PCI modifications to anticipate for
> VFIO Platform needs. Purpose is to alleviate the whole
> review process.
> 
> <= v3
> First split done by Kim Phillips
> ---
>  hw/vfio/Makefile.objs         |    1 +
>  hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
>  hw/vfio/pci.c                 | 1028 +----------------------------------------
>  include/hw/vfio/vfio-common.h |  152 ++++++
>  trace-events                  |    1 +
>  5 files changed, 1113 insertions(+), 1027 deletions(-)
>  create mode 100644 hw/vfio/common.c
>  create mode 100644 include/hw/vfio/vfio-common.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index 31c7dab..e31f30e 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,3 +1,4 @@
>  ifeq ($(CONFIG_LINUX), y)
> +obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
>  endif
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> new file mode 100644
> index 0000000..252c0b8
> --- /dev/null
> +++ b/hw/vfio/common.c
> @@ -0,0 +1,958 @@
> +/*
> + * generic functions used by VFIO devices
> + *
> + * Copyright Red Hat, Inc. 2012
> + *
> + * Authors:
> + *  Alex Williamson <alex.williamson@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on qemu-kvm device-assignment:
> + *  Adapted for KVM by Qumranet.
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
> + */
> +
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <linux/vfio.h>
> +
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/vfio.h"
> +#include "exec/address-spaces.h"
> +#include "exec/memory.h"
> +#include "hw/hw.h"
> +#include "qemu/error-report.h"
> +#include "sysemu/kvm.h"
> +#include "trace.h"
> +
> +struct vfio_group_head vfio_group_list =
> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
> +struct vfio_as_head vfio_address_spaces =
> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
> +
> +#ifdef CONFIG_KVM
> +/*
> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
> + * for the life of the VM.  Closing the file descriptor only drops our
> + * reference to it and the device's reference to kvm.  Therefore once
> + * initialized, this file descriptor is only released on QEMU exit and
> + * we'll re-use it should another vfio device be attached before then.
> + */
> +static int vfio_kvm_device_fd = -1;
> +#endif
> +
> +/*
> + * Common VFIO interrupt disable
> + */
> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
> +{
> +    struct vfio_irq_set irq_set = {
> +        .argsz = sizeof(irq_set),
> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
> +        .index = index,
> +        .start = 0,
> +        .count = 0,
> +    };
> +
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +}
> +
> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
> +{
> +    struct vfio_irq_set irq_set = {
> +        .argsz = sizeof(irq_set),
> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
> +        .index = index,
> +        .start = 0,
> +        .count = 1,
> +    };
> +
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +}
> +
> +void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
> +{
> +    struct vfio_irq_set irq_set = {
> +        .argsz = sizeof(irq_set),
> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
> +        .index = index,
> +        .start = 0,
> +        .count = 1,
> +    };
> +
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +}
> +
> +/*
> + * IO Port/MMIO - Beware of the endians, VFIO is always little endian
> + */
> +void vfio_region_write(void *opaque, hwaddr addr,
> +                       uint64_t data, unsigned size)
> +{
> +    VFIORegion *region = opaque;
> +    VFIODevice *vbasedev = region->vbasedev;
> +    union {
> +        uint8_t byte;
> +        uint16_t word;
> +        uint32_t dword;
> +        uint64_t qword;
> +    } buf;
> +
> +    switch (size) {
> +    case 1:
> +        buf.byte = data;
> +        break;
> +    case 2:
> +        buf.word = data;
> +        break;
> +    case 4:
> +        buf.dword = data;

Please beware that this code is affected by Alexey's patch set that
fixes endianness for slow patch MMIO access and ROM regions.


Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
@ 2014-09-10 13:12   ` Alexander Graf
  2014-09-11 14:20     ` Eric Auger
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Graf @ 2014-09-10 13:12 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm



On 09.09.14 09:31, Eric Auger wrote:
> vfio-calxeda-xgmac now can be instantiated using the -device option
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v2 -> v3:
> - correct bug of reg_attr[2*i] in vfio_fdt_add_device_node
> - fix a bug related to compat_str_len computed on original compat
>   instead of corrected compat
> - wrap_vfio_fdt_add_node take a node creation function: this function
>   needs to be specialized for each VFIO device. wrap function must be
>   called in sysbus_device_create_devtree
> ---
>  hw/arm/dyn_sysbus_devtree.c | 141 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 141 insertions(+)
> 
> diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
> index 61e5b5f..3ef9430 100644
> --- a/hw/arm/dyn_sysbus_devtree.c
> +++ b/hw/arm/dyn_sysbus_devtree.c
> @@ -20,6 +20,141 @@
>  #include "hw/arm/dyn_sysbus_devtree.h"
>  #include "qemu/error-report.h"
>  #include "sysemu/device_tree.h"
> +#include "hw/vfio/vfio-platform.h"
> +#include "hw/vfio/vfio-calxeda-xgmac.h"
> +
> +typedef void (*vfio_fdt_add_device_node_t)(SysBusDevice *sbdev, void *opaque);
> +
> +static char *format_compat(char * compat)
> +{
> +    char *str_ptr, *corrected_compat;
> +    /*
> +     * process compatibility property string passed by end-user
> +     * replaces / by , and ; by NUL character
> +     */
> +    corrected_compat = g_strdup(compat);
> +
> +    str_ptr = corrected_compat;
> +    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
> +        *str_ptr = ',';
> +    }
> +
> +    /* substitute ";" with the NUL char */
> +    str_ptr = corrected_compat;
> +    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
> +        *str_ptr = '\0';
> +    }
> +
> +    /*
> +     * corrected compat includes a "\0" before or at the same location
> +     * as compat's one
> +     */
> +    return corrected_compat;
> +}
> +
> +static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque,
> +                                   vfio_fdt_add_device_node_t add_node_fn)
> +{
> +    PlatformDevtreeData *data = opaque;
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    gchar irq_number_prop[8];
> +    Object *obj = OBJECT(sbdev);
> +    char *corrected_compat;
> +    uint64_t irq_number;
> +    int corrected_compat_str_len, i;
> +
> +    corrected_compat = format_compat(vdev->compat);
> +    corrected_compat_str_len = strlen(corrected_compat) + 1;
> +    /* we copy the corrected_compat string + its "\0" */
> +    snprintf(vdev->compat, corrected_compat_str_len, "%s", corrected_compat);
> +    g_free(corrected_compat);
> +
> +    add_node_fn(sbdev, opaque);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
> +                                                 + data->irq_start;
> +        /*
> +         * for setting irqfd up we must provide the virtual IRQ number
> +         * which is the sum of irq_start and actual platform bus irq
> +         * index. At realize point we do not have this info.
> +         */
> +        vfio_start_irq_injection(sbdev, i, irq_number);

Does this really have anything to do with fdt? Also, don't we have
notifiers that call IRQ holders when an IRQ gets connected? That would
probably be the cleaner approach here.


Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module
  2014-09-10 13:09   ` Alexander Graf
@ 2014-09-11 12:11     ` Eric Auger
  2014-09-11 12:13       ` Alexander Graf
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2014-09-11 12:11 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

On 09/10/2014 03:09 PM, Alexander Graf wrote:
> 
> 
> On 09.09.14 09:31, Eric Auger wrote:
>> A new common module is created. It implements all functions
>> that have no device specificity (PCI, Platform).
>>
>> This patch only consists in move (no functional changes)
>>
>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v5 -> v6:
>> - follow all evolutions of original PCI code from v5 to V6
>> - move declaration of vfio_region_ops, vfio_memory_listener,
>>   vfio_group_list, vfio_address_spaces into vfio-common.h
>>
>> v4 -> v5:
>> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
>> - VFIOdevice .name dealloc removed from vfio_put_base_device
>> - add some includes according to vfio inclusion policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> move done after all PCI modifications to anticipate for
>> VFIO Platform needs. Purpose is to alleviate the whole
>> review process.
>>
>> <= v3
>> First split done by Kim Phillips
>> ---
>>  hw/vfio/Makefile.objs         |    1 +
>>  hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
>>  hw/vfio/pci.c                 | 1028 +----------------------------------------
>>  include/hw/vfio/vfio-common.h |  152 ++++++
>>  trace-events                  |    1 +
>>  5 files changed, 1113 insertions(+), 1027 deletions(-)
>>  create mode 100644 hw/vfio/common.c
>>  create mode 100644 include/hw/vfio/vfio-common.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index 31c7dab..e31f30e 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,3 +1,4 @@
>>  ifeq ($(CONFIG_LINUX), y)
>> +obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>>  endif
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> new file mode 100644
>> index 0000000..252c0b8
>> --- /dev/null
>> +++ b/hw/vfio/common.c
>> @@ -0,0 +1,958 @@
>> +/*
>> + * generic functions used by VFIO devices
>> + *
>> + * Copyright Red Hat, Inc. 2012
>> + *
>> + * Authors:
>> + *  Alex Williamson <alex.williamson@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on qemu-kvm device-assignment:
>> + *  Adapted for KVM by Qumranet.
>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>> + */
>> +
>> +#include <sys/ioctl.h>
>> +#include <sys/mman.h>
>> +#include <linux/vfio.h>
>> +
>> +#include "hw/vfio/vfio-common.h"
>> +#include "hw/vfio/vfio.h"
>> +#include "exec/address-spaces.h"
>> +#include "exec/memory.h"
>> +#include "hw/hw.h"
>> +#include "qemu/error-report.h"
>> +#include "sysemu/kvm.h"
>> +#include "trace.h"
>> +
>> +struct vfio_group_head vfio_group_list =
>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>> +struct vfio_as_head vfio_address_spaces =
>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>> +
>> +#ifdef CONFIG_KVM
>> +/*
>> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
>> + * for the life of the VM.  Closing the file descriptor only drops our
>> + * reference to it and the device's reference to kvm.  Therefore once
>> + * initialized, this file descriptor is only released on QEMU exit and
>> + * we'll re-use it should another vfio device be attached before then.
>> + */
>> +static int vfio_kvm_device_fd = -1;
>> +#endif
>> +
>> +/*
>> + * Common VFIO interrupt disable
>> + */
>> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>> +{
>> +    struct vfio_irq_set irq_set = {
>> +        .argsz = sizeof(irq_set),
>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
>> +        .index = index,
>> +        .start = 0,
>> +        .count = 0,
>> +    };
>> +
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +}
>> +
>> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>> +{
>> +    struct vfio_irq_set irq_set = {
>> +        .argsz = sizeof(irq_set),
>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>> +        .index = index,
>> +        .start = 0,
>> +        .count = 1,
>> +    };
>> +
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +}
>> +
>> +void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>> +{
>> +    struct vfio_irq_set irq_set = {
>> +        .argsz = sizeof(irq_set),
>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>> +        .index = index,
>> +        .start = 0,
>> +        .count = 1,
>> +    };
>> +
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +}
>> +
>> +/*
>> + * IO Port/MMIO - Beware of the endians, VFIO is always little endian
>> + */
>> +void vfio_region_write(void *opaque, hwaddr addr,
>> +                       uint64_t data, unsigned size)
>> +{
>> +    VFIORegion *region = opaque;
>> +    VFIODevice *vbasedev = region->vbasedev;
>> +    union {
>> +        uint8_t byte;
>> +        uint16_t word;
>> +        uint32_t dword;
>> +        uint64_t qword;
>> +    } buf;
>> +
>> +    switch (size) {
>> +    case 1:
>> +        buf.byte = data;
>> +        break;
>> +    case 2:
>> +        buf.word = data;
>> +        break;
>> +    case 4:
>> +        buf.dword = data;
> 
> Please beware that this code is affected by Alexey's patch set that
> fixes endianness for slow patch MMIO access and ROM regions.

Hi Alex,

do you mean vfio_region_write/read implementation will be different
depending on whether we are on PCI or platform; or simply I need to pay
attention to the fact this code will need an upgrade with Alexey's patch
( [PATCH 0/2] vfio: Another try to fix ROM BAR endianness).

Thanks

Eric

> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module
  2014-09-11 12:11     ` Eric Auger
@ 2014-09-11 12:13       ` Alexander Graf
  2014-09-11 14:21         ` Eric Auger
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Graf @ 2014-09-11 12:13 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm



On 11.09.14 14:11, Eric Auger wrote:
> On 09/10/2014 03:09 PM, Alexander Graf wrote:
>>
>>
>> On 09.09.14 09:31, Eric Auger wrote:
>>> A new common module is created. It implements all functions
>>> that have no device specificity (PCI, Platform).
>>>
>>> This patch only consists in move (no functional changes)
>>>
>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>> v5 -> v6:
>>> - follow all evolutions of original PCI code from v5 to V6
>>> - move declaration of vfio_region_ops, vfio_memory_listener,
>>>   vfio_group_list, vfio_address_spaces into vfio-common.h
>>>
>>> v4 -> v5:
>>> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
>>> - VFIOdevice .name dealloc removed from vfio_put_base_device
>>> - add some includes according to vfio inclusion policy
>>>
>>> v3 -> v4:
>>> [Eric Auger]
>>> move done after all PCI modifications to anticipate for
>>> VFIO Platform needs. Purpose is to alleviate the whole
>>> review process.
>>>
>>> <= v3
>>> First split done by Kim Phillips
>>> ---
>>>  hw/vfio/Makefile.objs         |    1 +
>>>  hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
>>>  hw/vfio/pci.c                 | 1028 +----------------------------------------
>>>  include/hw/vfio/vfio-common.h |  152 ++++++
>>>  trace-events                  |    1 +
>>>  5 files changed, 1113 insertions(+), 1027 deletions(-)
>>>  create mode 100644 hw/vfio/common.c
>>>  create mode 100644 include/hw/vfio/vfio-common.h
>>>
>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>> index 31c7dab..e31f30e 100644
>>> --- a/hw/vfio/Makefile.objs
>>> +++ b/hw/vfio/Makefile.objs
>>> @@ -1,3 +1,4 @@
>>>  ifeq ($(CONFIG_LINUX), y)
>>> +obj-$(CONFIG_SOFTMMU) += common.o
>>>  obj-$(CONFIG_PCI) += pci.o
>>>  endif
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> new file mode 100644
>>> index 0000000..252c0b8
>>> --- /dev/null
>>> +++ b/hw/vfio/common.c
>>> @@ -0,0 +1,958 @@
>>> +/*
>>> + * generic functions used by VFIO devices
>>> + *
>>> + * Copyright Red Hat, Inc. 2012
>>> + *
>>> + * Authors:
>>> + *  Alex Williamson <alex.williamson@redhat.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>> + * the COPYING file in the top-level directory.
>>> + *
>>> + * Based on qemu-kvm device-assignment:
>>> + *  Adapted for KVM by Qumranet.
>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>>> + */
>>> +
>>> +#include <sys/ioctl.h>
>>> +#include <sys/mman.h>
>>> +#include <linux/vfio.h>
>>> +
>>> +#include "hw/vfio/vfio-common.h"
>>> +#include "hw/vfio/vfio.h"
>>> +#include "exec/address-spaces.h"
>>> +#include "exec/memory.h"
>>> +#include "hw/hw.h"
>>> +#include "qemu/error-report.h"
>>> +#include "sysemu/kvm.h"
>>> +#include "trace.h"
>>> +
>>> +struct vfio_group_head vfio_group_list =
>>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>>> +struct vfio_as_head vfio_address_spaces =
>>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>>> +
>>> +#ifdef CONFIG_KVM
>>> +/*
>>> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
>>> + * for the life of the VM.  Closing the file descriptor only drops our
>>> + * reference to it and the device's reference to kvm.  Therefore once
>>> + * initialized, this file descriptor is only released on QEMU exit and
>>> + * we'll re-use it should another vfio device be attached before then.
>>> + */
>>> +static int vfio_kvm_device_fd = -1;
>>> +#endif
>>> +
>>> +/*
>>> + * Common VFIO interrupt disable
>>> + */
>>> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>>> +{
>>> +    struct vfio_irq_set irq_set = {
>>> +        .argsz = sizeof(irq_set),
>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
>>> +        .index = index,
>>> +        .start = 0,
>>> +        .count = 0,
>>> +    };
>>> +
>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>> +}
>>> +
>>> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>>> +{
>>> +    struct vfio_irq_set irq_set = {
>>> +        .argsz = sizeof(irq_set),
>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>>> +        .index = index,
>>> +        .start = 0,
>>> +        .count = 1,
>>> +    };
>>> +
>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>> +}
>>> +
>>> +void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>>> +{
>>> +    struct vfio_irq_set irq_set = {
>>> +        .argsz = sizeof(irq_set),
>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>>> +        .index = index,
>>> +        .start = 0,
>>> +        .count = 1,
>>> +    };
>>> +
>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>> +}
>>> +
>>> +/*
>>> + * IO Port/MMIO - Beware of the endians, VFIO is always little endian
>>> + */
>>> +void vfio_region_write(void *opaque, hwaddr addr,
>>> +                       uint64_t data, unsigned size)
>>> +{
>>> +    VFIORegion *region = opaque;
>>> +    VFIODevice *vbasedev = region->vbasedev;
>>> +    union {
>>> +        uint8_t byte;
>>> +        uint16_t word;
>>> +        uint32_t dword;
>>> +        uint64_t qword;
>>> +    } buf;
>>> +
>>> +    switch (size) {
>>> +    case 1:
>>> +        buf.byte = data;
>>> +        break;
>>> +    case 2:
>>> +        buf.word = data;
>>> +        break;
>>> +    case 4:
>>> +        buf.dword = data;
>>
>> Please beware that this code is affected by Alexey's patch set that
>> fixes endianness for slow patch MMIO access and ROM regions.
> 
> Hi Alex,
> 
> do you mean vfio_region_write/read implementation will be different
> depending on whether we are on PCI or platform; or simply I need to pay
> attention to the fact this code will need an upgrade with Alexey's patch
> ( [PATCH 0/2] vfio: Another try to fix ROM BAR endianness).

You will simply need an update when Alexey's patches are in. I don't see
why vfio-platform should be any different from vfio-pci here.


Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation
  2014-09-10 13:12   ` Alexander Graf
@ 2014-09-11 14:20     ` Eric Auger
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-11 14:20 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

On 09/10/2014 03:12 PM, Alexander Graf wrote:
> 
> 
> On 09.09.14 09:31, Eric Auger wrote:
>> vfio-calxeda-xgmac now can be instantiated using the -device option
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v2 -> v3:
>> - correct bug of reg_attr[2*i] in vfio_fdt_add_device_node
>> - fix a bug related to compat_str_len computed on original compat
>>   instead of corrected compat
>> - wrap_vfio_fdt_add_node take a node creation function: this function
>>   needs to be specialized for each VFIO device. wrap function must be
>>   called in sysbus_device_create_devtree
>> ---
>>  hw/arm/dyn_sysbus_devtree.c | 141 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 141 insertions(+)
>>
>> diff --git a/hw/arm/dyn_sysbus_devtree.c b/hw/arm/dyn_sysbus_devtree.c
>> index 61e5b5f..3ef9430 100644
>> --- a/hw/arm/dyn_sysbus_devtree.c
>> +++ b/hw/arm/dyn_sysbus_devtree.c
>> @@ -20,6 +20,141 @@
>>  #include "hw/arm/dyn_sysbus_devtree.h"
>>  #include "qemu/error-report.h"
>>  #include "sysemu/device_tree.h"
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>> +
>> +typedef void (*vfio_fdt_add_device_node_t)(SysBusDevice *sbdev, void *opaque);
>> +
>> +static char *format_compat(char * compat)
>> +{
>> +    char *str_ptr, *corrected_compat;
>> +    /*
>> +     * process compatibility property string passed by end-user
>> +     * replaces / by , and ; by NUL character
>> +     */
>> +    corrected_compat = g_strdup(compat);
>> +
>> +    str_ptr = corrected_compat;
>> +    while ((str_ptr = strchr(str_ptr, '/')) != NULL) {
>> +        *str_ptr = ',';
>> +    }
>> +
>> +    /* substitute ";" with the NUL char */
>> +    str_ptr = corrected_compat;
>> +    while ((str_ptr = strchr(str_ptr, ';')) != NULL) {
>> +        *str_ptr = '\0';
>> +    }
>> +
>> +    /*
>> +     * corrected compat includes a "\0" before or at the same location
>> +     * as compat's one
>> +     */
>> +    return corrected_compat;
>> +}
>> +
>> +static void wrap_vfio_fdt_add_node(SysBusDevice *sbdev, void *opaque,
>> +                                   vfio_fdt_add_device_node_t add_node_fn)
>> +{
>> +    PlatformDevtreeData *data = opaque;
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    gchar irq_number_prop[8];
>> +    Object *obj = OBJECT(sbdev);
>> +    char *corrected_compat;
>> +    uint64_t irq_number;
>> +    int corrected_compat_str_len, i;
>> +
>> +    corrected_compat = format_compat(vdev->compat);
>> +    corrected_compat_str_len = strlen(corrected_compat) + 1;
>> +    /* we copy the corrected_compat string + its "\0" */
>> +    snprintf(vdev->compat, corrected_compat_str_len, "%s", corrected_compat);
>> +    g_free(corrected_compat);
>> +
>> +    add_node_fn(sbdev, opaque);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        snprintf(irq_number_prop, sizeof(irq_number_prop), "irq[%d]", i);
>> +        irq_number = object_property_get_int(obj, irq_number_prop, NULL)
>> +                                                 + data->irq_start;
>> +        /*
>> +         * for setting irqfd up we must provide the virtual IRQ number
>> +         * which is the sum of irq_start and actual platform bus irq
>> +         * index. At realize point we do not have this info.
>> +         */
>> +        vfio_start_irq_injection(sbdev, i, irq_number);
> 
> Does this really have anything to do with fdt?

No it doesn't, I aknowledge ;-)
 Also, don't we have
> notifiers that call IRQ holders when an IRQ gets connected?
Do we? I was not able to identify such modality. the notifier would be
triggered in qemu_allocate_irq right?
 That would
> probably be the cleaner approach here.
If it is and if the functionality does not exist yet, I can it, sure.

Besides I do not get how VFIO-PCI does handle the problem of late irq
binding. If someone can share some knowledge on this, it would be much
appreciated.

Best Regards

Eric

> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module
  2014-09-11 12:13       ` Alexander Graf
@ 2014-09-11 14:21         ` Eric Auger
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-11 14:21 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel, a.rigo,
	kim.phillips, marc.zyngier, manish.jaggi, joel.schopp,
	peter.maydell, pbonzini, afaerber
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder, Bharat.Bhushan,
	alex.williamson, a.motakis, kvmarm

On 09/11/2014 02:13 PM, Alexander Graf wrote:
> 
> 
> On 11.09.14 14:11, Eric Auger wrote:
>> On 09/10/2014 03:09 PM, Alexander Graf wrote:
>>>
>>>
>>> On 09.09.14 09:31, Eric Auger wrote:
>>>> A new common module is created. It implements all functions
>>>> that have no device specificity (PCI, Platform).
>>>>
>>>> This patch only consists in move (no functional changes)
>>>>
>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>> v5 -> v6:
>>>> - follow all evolutions of original PCI code from v5 to V6
>>>> - move declaration of vfio_region_ops, vfio_memory_listener,
>>>>   vfio_group_list, vfio_address_spaces into vfio-common.h
>>>>
>>>> v4 -> v5:
>>>> - integrate "sPAPR/IOMMU: Fix TCE entry permission"
>>>> - VFIOdevice .name dealloc removed from vfio_put_base_device
>>>> - add some includes according to vfio inclusion policy
>>>>
>>>> v3 -> v4:
>>>> [Eric Auger]
>>>> move done after all PCI modifications to anticipate for
>>>> VFIO Platform needs. Purpose is to alleviate the whole
>>>> review process.
>>>>
>>>> <= v3
>>>> First split done by Kim Phillips
>>>> ---
>>>>  hw/vfio/Makefile.objs         |    1 +
>>>>  hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
>>>>  hw/vfio/pci.c                 | 1028 +----------------------------------------
>>>>  include/hw/vfio/vfio-common.h |  152 ++++++
>>>>  trace-events                  |    1 +
>>>>  5 files changed, 1113 insertions(+), 1027 deletions(-)
>>>>  create mode 100644 hw/vfio/common.c
>>>>  create mode 100644 include/hw/vfio/vfio-common.h
>>>>
>>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>>> index 31c7dab..e31f30e 100644
>>>> --- a/hw/vfio/Makefile.objs
>>>> +++ b/hw/vfio/Makefile.objs
>>>> @@ -1,3 +1,4 @@
>>>>  ifeq ($(CONFIG_LINUX), y)
>>>> +obj-$(CONFIG_SOFTMMU) += common.o
>>>>  obj-$(CONFIG_PCI) += pci.o
>>>>  endif
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> new file mode 100644
>>>> index 0000000..252c0b8
>>>> --- /dev/null
>>>> +++ b/hw/vfio/common.c
>>>> @@ -0,0 +1,958 @@
>>>> +/*
>>>> + * generic functions used by VFIO devices
>>>> + *
>>>> + * Copyright Red Hat, Inc. 2012
>>>> + *
>>>> + * Authors:
>>>> + *  Alex Williamson <alex.williamson@redhat.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>>> + * the COPYING file in the top-level directory.
>>>> + *
>>>> + * Based on qemu-kvm device-assignment:
>>>> + *  Adapted for KVM by Qumranet.
>>>> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
>>>> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
>>>> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
>>>> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
>>>> + *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
>>>> + */
>>>> +
>>>> +#include <sys/ioctl.h>
>>>> +#include <sys/mman.h>
>>>> +#include <linux/vfio.h>
>>>> +
>>>> +#include "hw/vfio/vfio-common.h"
>>>> +#include "hw/vfio/vfio.h"
>>>> +#include "exec/address-spaces.h"
>>>> +#include "exec/memory.h"
>>>> +#include "hw/hw.h"
>>>> +#include "qemu/error-report.h"
>>>> +#include "sysemu/kvm.h"
>>>> +#include "trace.h"
>>>> +
>>>> +struct vfio_group_head vfio_group_list =
>>>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>>>> +struct vfio_as_head vfio_address_spaces =
>>>> +    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
>>>> +
>>>> +#ifdef CONFIG_KVM
>>>> +/*
>>>> + * We have a single VFIO pseudo device per KVM VM.  Once created it lives
>>>> + * for the life of the VM.  Closing the file descriptor only drops our
>>>> + * reference to it and the device's reference to kvm.  Therefore once
>>>> + * initialized, this file descriptor is only released on QEMU exit and
>>>> + * we'll re-use it should another vfio device be attached before then.
>>>> + */
>>>> +static int vfio_kvm_device_fd = -1;
>>>> +#endif
>>>> +
>>>> +/*
>>>> + * Common VFIO interrupt disable
>>>> + */
>>>> +void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>>>> +{
>>>> +    struct vfio_irq_set irq_set = {
>>>> +        .argsz = sizeof(irq_set),
>>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
>>>> +        .index = index,
>>>> +        .start = 0,
>>>> +        .count = 0,
>>>> +    };
>>>> +
>>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>>> +}
>>>> +
>>>> +void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>>>> +{
>>>> +    struct vfio_irq_set irq_set = {
>>>> +        .argsz = sizeof(irq_set),
>>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>>>> +        .index = index,
>>>> +        .start = 0,
>>>> +        .count = 1,
>>>> +    };
>>>> +
>>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>>> +}
>>>> +
>>>> +void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>>>> +{
>>>> +    struct vfio_irq_set irq_set = {
>>>> +        .argsz = sizeof(irq_set),
>>>> +        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>>>> +        .index = index,
>>>> +        .start = 0,
>>>> +        .count = 1,
>>>> +    };
>>>> +
>>>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>>> +}
>>>> +
>>>> +/*
>>>> + * IO Port/MMIO - Beware of the endians, VFIO is always little endian
>>>> + */
>>>> +void vfio_region_write(void *opaque, hwaddr addr,
>>>> +                       uint64_t data, unsigned size)
>>>> +{
>>>> +    VFIORegion *region = opaque;
>>>> +    VFIODevice *vbasedev = region->vbasedev;
>>>> +    union {
>>>> +        uint8_t byte;
>>>> +        uint16_t word;
>>>> +        uint32_t dword;
>>>> +        uint64_t qword;
>>>> +    } buf;
>>>> +
>>>> +    switch (size) {
>>>> +    case 1:
>>>> +        buf.byte = data;
>>>> +        break;
>>>> +    case 2:
>>>> +        buf.word = data;
>>>> +        break;
>>>> +    case 4:
>>>> +        buf.dword = data;
>>>
>>> Please beware that this code is affected by Alexey's patch set that
>>> fixes endianness for slow patch MMIO access and ROM regions.
>>
>> Hi Alex,
>>
>> do you mean vfio_region_write/read implementation will be different
>> depending on whether we are on PCI or platform; or simply I need to pay
>> attention to the fact this code will need an upgrade with Alexey's patch
>> ( [PATCH 0/2] vfio: Another try to fix ROM BAR endianness).
> 
> You will simply need an update when Alexey's patches are in. I don't see
> why vfio-platform should be any different from vfio-pci here.

OK thanks for the confirmation

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
                   ` (15 preceding siblings ...)
  2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 16/16] VFIO: PLATFORM: add forwarded irq support Eric Auger
@ 2014-09-11 22:14 ` Alex Williamson
  2014-09-11 22:23   ` Christoffer Dall
  16 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2014-09-11 22:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, kvmarm, a.motakis, stuart.yoder,
	pbonzini, afaerber, christoffer.dall

On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> This RFC series aims at enabling KVM platform device passthrough.
> It implements a VFIO platform device, derived from VFIO PCI device.
> 
> The VFIO platform device uses the host VFIO platform driver which must
> be bound to the assigned device prior to the QEMU system start.
> 
> - the guest can directly access the device register space
> - assigned device IRQs are transparently routed to the guest by
>   QEMU/KVM (3 methods currently are supported: user-level eventfd
>   handling, irqfd, forwarded IRQs)
> - iommu is transparently programmed to prevent the device from
>   accessing physical pages outside of the guest address space
> 
> This patch series is made of the following patch files:
> 
> 1-7) Modifications to PCI code to prepare for VFIO platform device
> 8) split of PCI specific code and generic code (move)
> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
>       support (MMIO direct access and IRQ assignment).
> 12) fake injection test modality (to test multiple IRQ)
> 13) addition of irqfd/virqfd support
> 14-16) forwarded IRQ
> 
> Dependency List:
> 
> QEMU dependencies:
> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
>     Eric Auger
>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> [4] [RFC] vfio: migration to trace points, Eric Auger
>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> 
> Kernel Dependencies:
> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
>     https://lkml.org/lkml/2014/9/1/141
> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>     https://lkml.org/lkml/2014/9/1/344
> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>     Marc Zyngier
>     http://lwn.net/Articles/603514/
> 
> kernel pieces can be found at:
> http://git.linaro.org/people/eric.auger/linux.git
> (branch 3.17rc3_irqfd_forward_integ_v2)
> QEMU pieces can be found at:
> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> 
> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> is assigned to KVM host while the second one is assigned to the guest.
> Reworked PCI device is not tested.
> 
> Wiki for Calxeda Midway setup:
> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> 
> History:
> 
> v5->v6:
> - rebase on 2.1rc5 PCI code
> - forwarded IRQ first integraton

Why?  Are there acceleration paths that you're concerned cannot be
implemented or we do not already have a proof of concept for?  The base
kernel patch series you depend on is 3 months old yet this series
continues to grow and add new dependencies.  Please let's prioritize
getting something upstream instead of adding more blockers to prevent
that.  Thanks,

Alex

> - vfio_device property renamed into host property
> - split IRQ setup in different functions that match the 3 supported
>   injection techniques (user handled eventfd, irqfd, forwarded IRQ):
>   removes dynamic switch between injection methods
> - introduce fake interrupts as a test modality:
>   x makes possible to test multiple IRQ user-side handling.
>   x this is a test feature only: enable to trigger a fd as if the
>     real physical IRQ hit. No virtual IRQ is injected into the guest
>     but handling is simulated so that the state machine can be tested
> - user handled eventfd:
>   x add mutex to protect IRQ state & list manipulation,
>   x correct misleading comment in vfio_intp_interrupt.
>   x Fix bugs using fake interrupt modality
> - irqfd no more advertised in this patchset (handled in [3])
> - VFIOPlatformDeviceClass becomes abstract and Calxeda xgmac device
>   and class is re-introduced (as per v4)
> - all DPRINTF removed in platform and replaced by trace-points
> - corrects compilation with configure --disable-kvm
> - simplifies the split for vfio_get_device and introduce a unique
>   specialized function named vfio_populate_device
> - group_list renamed into vfio_group_list
> - hw/arm/dyn_sysbus_devtree.c currently only support vfio-calxeda-xgmac
>   instantiation. Needs to be specialized for other VFIO devices
> - fix 2 bugs in dyn_sysbus_devtree(reg_attr index and compat)
> 
> v4->v5:
> - rebase on v2.1.0 PCI code
> - take into account Alex Williamson comments on PCI code rework
>   - trace updates in vfio_region_write/read
>   - remove fd from VFIORegion
>   - get/put ckeanup
> - bug fix: bar region's vbasedev field duly initialization
> - misc cleanups in platform device
> - device tree node generation removed from device and handled in
>   hw/arm/dyn_sysbus_devtree.c
> - remove "hw/vfio: add an example calxeda_xgmac": with removal of
>   device tree node generation we do not have so many things to
>   implement in that derived device yet. May be re-introduced later
>   on if needed typically for reset/migration.
> - no GSI routing table anymore
> 
> v3->v4 changes (Eric Auger, Alvise Rigo)
> - rebase on last VFIO PCI code (v2.1.0-rc0)
> - full git history rework to ease PCI code change review
> - mv include files in hw/vfio
> - DPRINTF reformatting temporarily moved out
> - support of VFIO virq (removal of resamplefd handler on user-side)
> - integration with sysbus dynamic instantiation framwork
> - removal of unrealize and cleanup routines until it is better
>   understood what is really needed
> - Support of VFIO for Amba devices should be handled in an inherited
>   device to specialize the device tree generation (clock handle currently
>   missing in framework however)
> - "Always use eventfd as notifying mechanism" temporarily moved out
> - static instantiation is not mainstream (although it remains possible)
>   note if static instantiation is used, irqfd must be setup in machine file
>   when virtual IRQ is known
> - create the GSI routing table on qemu side
> 
> v2->v3 changes (Alvise Rigo, Eric Auger):
> - Following Alex W recommandations, further efforts to factorize the
>   code between PCI:introduction of VFIODevice and VFIORegion
>   as base classes
> - unique reset handler for platform and PCI
> - cleanup following Kim's comments
> - multiple IRQ support mechanics should be in place although not
>   tested
> - Better handling of MMIO multiple regions
> - New features and fixes by Alvise (multiple compat string, exec
>   flag, force eventfd usage, amba device tree support)
> - irqfd support
> 
> v1->v2 changes (Kim Phillips, Eric Auger):
> - IRQ initial support (legacy mode where eventfds are handled on
>   user side)
> - hacked dynamic instantiation
> 
> v1 (Kim Phillips):
> - initial split between PCI and platform
> - MMIO support only
> - static instantiation
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (15):
>   hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
>   hw/vfio/pci: introduce VFIODevice
>   hw/vfio/pci: Introduce VFIORegion
>   hw/vfio/pci: split vfio_get_device
>   hw/vfio/pci: rename group_list into vfio_group_list
>   hw/vfio/pci: use name field in format strings
>   hw/vfio: create common module
>   hw/vfio/platform: add vfio-platform support
>   hw/vfio: calxeda xgmac device
>   hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic
>     instantiation
>   vfio/platform: add fake injection modality
>   hw/vfio/platform: Add irqfd support
>   linux-headers: Update KVM headers from linux-next tag ToBeFilled
>   VFIO: COMMON: vfio_kvm_device_fd moved in the common header
>   VFIO: PLATFORM: add forwarded irq support
> 
> Kim Phillips (1):
>   vfio: move hw/misc/vfio.c to hw/vfio/pci.c     Move vfio.h into
>     include/hw/vfio
> 
>  LICENSE                              |    2 +-
>  MAINTAINERS                          |    2 +-
>  hw/Makefile.objs                     |    1 +
>  hw/arm/dyn_sysbus_devtree.c          |  141 +++
>  hw/misc/Makefile.objs                |    1 -
>  hw/ppc/spapr_pci_vfio.c              |    2 +-
>  hw/vfio/Makefile.objs                |    6 +
>  hw/vfio/calxeda_xgmac.c              |   57 ++
>  hw/vfio/common.c                     |  959 +++++++++++++++++++
>  hw/{misc/vfio.c => vfio/pci.c}       | 1670 +++++++---------------------------
>  hw/vfio/platform.c                   |  874 ++++++++++++++++++
>  include/hw/vfio/vfio-calxeda-xgmac.h |   41 +
>  include/hw/vfio/vfio-common.h        |  157 ++++
>  include/hw/vfio/vfio-platform.h      |   95 ++
>  include/hw/{misc => vfio}/vfio.h     |    0
>  linux-headers/linux/kvm.h            |    9 +
>  trace-events                         |  136 +--
>  17 files changed, 2739 insertions(+), 1414 deletions(-)
>  create mode 100644 hw/vfio/Makefile.objs
>  create mode 100644 hw/vfio/calxeda_xgmac.c
>  create mode 100644 hw/vfio/common.c
>  rename hw/{misc/vfio.c => vfio/pci.c} (65%)
>  create mode 100644 hw/vfio/platform.c
>  create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h
>  create mode 100644 include/hw/vfio/vfio-common.h
>  create mode 100644 include/hw/vfio/vfio-platform.h
>  rename include/hw/{misc => vfio}/vfio.h (100%)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-11 22:14 ` [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Alex Williamson
@ 2014-09-11 22:23   ` Christoffer Dall
  2014-09-11 22:51     ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Christoffer Dall @ 2014-09-11 22:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell, Eric Auger,
	marc.zyngier, manish.jaggi, patches, will.deacon, a.rigo,
	qemu-devel, Bharat.Bhushan, agraf, kvmarm, a.motakis,
	stuart.yoder, pbonzini, afaerber

On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> > This RFC series aims at enabling KVM platform device passthrough.
> > It implements a VFIO platform device, derived from VFIO PCI device.
> > 
> > The VFIO platform device uses the host VFIO platform driver which must
> > be bound to the assigned device prior to the QEMU system start.
> > 
> > - the guest can directly access the device register space
> > - assigned device IRQs are transparently routed to the guest by
> >   QEMU/KVM (3 methods currently are supported: user-level eventfd
> >   handling, irqfd, forwarded IRQs)
> > - iommu is transparently programmed to prevent the device from
> >   accessing physical pages outside of the guest address space
> > 
> > This patch series is made of the following patch files:
> > 
> > 1-7) Modifications to PCI code to prepare for VFIO platform device
> > 8) split of PCI specific code and generic code (move)
> > 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
> >       support (MMIO direct access and IRQ assignment).
> > 12) fake injection test modality (to test multiple IRQ)
> > 13) addition of irqfd/virqfd support
> > 14-16) forwarded IRQ
> > 
> > Dependency List:
> > 
> > QEMU dependencies:
> > [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
> >     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> > [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> > [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
> >     Eric Auger
> >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> > [4] [RFC] vfio: migration to trace points, Eric Auger
> >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> > 
> > Kernel Dependencies:
> > [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
> >     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
> >     https://lkml.org/lkml/2014/9/1/141
> > [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
> >     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> > [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> >     https://lkml.org/lkml/2014/9/1/344
> > [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> >     Marc Zyngier
> >     http://lwn.net/Articles/603514/
> > 
> > kernel pieces can be found at:
> > http://git.linaro.org/people/eric.auger/linux.git
> > (branch 3.17rc3_irqfd_forward_integ_v2)
> > QEMU pieces can be found at:
> > http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> > 
> > The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> > is assigned to KVM host while the second one is assigned to the guest.
> > Reworked PCI device is not tested.
> > 
> > Wiki for Calxeda Midway setup:
> > https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> > 
> > History:
> > 
> > v5->v6:
> > - rebase on 2.1rc5 PCI code
> > - forwarded IRQ first integraton
> 
> Why?  Are there acceleration paths that you're concerned cannot be
> implemented or we do not already have a proof of concept for?  The base
> kernel patch series you depend on is 3 months old yet this series
> continues to grow and add new dependencies.  Please let's prioritize
> getting something upstream instead of adding more blockers to prevent
> that.  Thanks,
> 
I'm not exactly sure what this changelog line was referring to
(depending on Marc's forwarding IRQ patches?), but just want to add that
there are a number of dependencies for the GIC that need to go in as
well (should happen within a few weeks), but I think it's unlikely that
the IRQ forwarding stuff goes in for v3.18 at this point.

It may make sense as you suggest to keep that part out of this patch set
and something merged sooner as opposed to later, but I'm too jet-lagged
to completely understand if that's going to be a horrible mess.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-11 22:23   ` Christoffer Dall
@ 2014-09-11 22:51     ` Alex Williamson
  2014-09-11 23:05       ` Christoffer Dall
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2014-09-11 22:51 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell, Eric Auger,
	marc.zyngier, manish.jaggi, patches, will.deacon, a.rigo,
	qemu-devel, Bharat.Bhushan, agraf, kvmarm, a.motakis,
	stuart.yoder, pbonzini, afaerber

On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
> > On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> > > This RFC series aims at enabling KVM platform device passthrough.
> > > It implements a VFIO platform device, derived from VFIO PCI device.
> > > 
> > > The VFIO platform device uses the host VFIO platform driver which must
> > > be bound to the assigned device prior to the QEMU system start.
> > > 
> > > - the guest can directly access the device register space
> > > - assigned device IRQs are transparently routed to the guest by
> > >   QEMU/KVM (3 methods currently are supported: user-level eventfd
> > >   handling, irqfd, forwarded IRQs)
> > > - iommu is transparently programmed to prevent the device from
> > >   accessing physical pages outside of the guest address space
> > > 
> > > This patch series is made of the following patch files:
> > > 
> > > 1-7) Modifications to PCI code to prepare for VFIO platform device
> > > 8) split of PCI specific code and generic code (move)
> > > 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
> > >       support (MMIO direct access and IRQ assignment).
> > > 12) fake injection test modality (to test multiple IRQ)
> > > 13) addition of irqfd/virqfd support
> > > 14-16) forwarded IRQ
> > > 
> > > Dependency List:
> > > 
> > > QEMU dependencies:
> > > [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
> > >     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> > > [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> > > [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
> > >     Eric Auger
> > >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> > > [4] [RFC] vfio: migration to trace points, Eric Auger
> > >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> > > 
> > > Kernel Dependencies:
> > > [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
> > >     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
> > >     https://lkml.org/lkml/2014/9/1/141
> > > [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
> > >     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> > > [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> > >     https://lkml.org/lkml/2014/9/1/344
> > > [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> > >     Marc Zyngier
> > >     http://lwn.net/Articles/603514/
> > > 
> > > kernel pieces can be found at:
> > > http://git.linaro.org/people/eric.auger/linux.git
> > > (branch 3.17rc3_irqfd_forward_integ_v2)
> > > QEMU pieces can be found at:
> > > http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> > > 
> > > The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> > > is assigned to KVM host while the second one is assigned to the guest.
> > > Reworked PCI device is not tested.
> > > 
> > > Wiki for Calxeda Midway setup:
> > > https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> > > 
> > > History:
> > > 
> > > v5->v6:
> > > - rebase on 2.1rc5 PCI code
> > > - forwarded IRQ first integraton
> > 
> > Why?  Are there acceleration paths that you're concerned cannot be
> > implemented or we do not already have a proof of concept for?  The base
> > kernel patch series you depend on is 3 months old yet this series
> > continues to grow and add new dependencies.  Please let's prioritize
> > getting something upstream instead of adding more blockers to prevent
> > that.  Thanks,
> > 
> I'm not exactly sure what this changelog line was referring to
> (depending on Marc's forwarding IRQ patches?), but just want to add that
> there are a number of dependencies for the GIC that need to go in as
> well (should happen within a few weeks), but I think it's unlikely that
> the IRQ forwarding stuff goes in for v3.18 at this point.
> 
> It may make sense as you suggest to keep that part out of this patch set
> and something merged sooner as opposed to later, but I'm too jet-lagged
> to completely understand if that's going to be a horrible mess.

The point is that we're on v6 of a patch series and its first non-RFC
posting and we're rolling in a first pass at a QEMU implementation that
depends on a contested kernel RFC, which depends on another stagnant
kernel RFC.  I'm fine with working on it in parallel, but give me some
light at the end of the tunnel as a reviewer and maintainer that this
code isn't going to live indefinitely on the mailing list.  Do we really
need those GIC patches do be able to have non-KVM accelerated VFIO
platform device assignment?  We certainly don't need IRQ forwarding.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-11 22:51     ` Alex Williamson
@ 2014-09-11 23:05       ` Christoffer Dall
  2014-09-15 22:01         ` Eric Auger
  0 siblings, 1 reply; 32+ messages in thread
From: Christoffer Dall @ 2014-09-11 23:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell, Eric Auger,
	marc.zyngier, manish.jaggi, patches, will.deacon, a.rigo,
	qemu-devel, Bharat.Bhushan, agraf, kvmarm, a.motakis,
	stuart.yoder, pbonzini, afaerber

On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
> > On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
> > > On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> > > > This RFC series aims at enabling KVM platform device passthrough.
> > > > It implements a VFIO platform device, derived from VFIO PCI device.
> > > > 
> > > > The VFIO platform device uses the host VFIO platform driver which must
> > > > be bound to the assigned device prior to the QEMU system start.
> > > > 
> > > > - the guest can directly access the device register space
> > > > - assigned device IRQs are transparently routed to the guest by
> > > >   QEMU/KVM (3 methods currently are supported: user-level eventfd
> > > >   handling, irqfd, forwarded IRQs)
> > > > - iommu is transparently programmed to prevent the device from
> > > >   accessing physical pages outside of the guest address space
> > > > 
> > > > This patch series is made of the following patch files:
> > > > 
> > > > 1-7) Modifications to PCI code to prepare for VFIO platform device
> > > > 8) split of PCI specific code and generic code (move)
> > > > 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
> > > >       support (MMIO direct access and IRQ assignment).
> > > > 12) fake injection test modality (to test multiple IRQ)
> > > > 13) addition of irqfd/virqfd support
> > > > 14-16) forwarded IRQ
> > > > 
> > > > Dependency List:
> > > > 
> > > > QEMU dependencies:
> > > > [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
> > > >     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> > > > [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> > > > [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
> > > >     Eric Auger
> > > >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> > > > [4] [RFC] vfio: migration to trace points, Eric Auger
> > > >     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> > > > 
> > > > Kernel Dependencies:
> > > > [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
> > > >     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > > [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
> > > >     https://lkml.org/lkml/2014/9/1/141
> > > > [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
> > > >     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> > > > [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> > > >     https://lkml.org/lkml/2014/9/1/344
> > > > [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> > > >     Marc Zyngier
> > > >     http://lwn.net/Articles/603514/
> > > > 
> > > > kernel pieces can be found at:
> > > > http://git.linaro.org/people/eric.auger/linux.git
> > > > (branch 3.17rc3_irqfd_forward_integ_v2)
> > > > QEMU pieces can be found at:
> > > > http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> > > > 
> > > > The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> > > > is assigned to KVM host while the second one is assigned to the guest.
> > > > Reworked PCI device is not tested.
> > > > 
> > > > Wiki for Calxeda Midway setup:
> > > > https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> > > > 
> > > > History:
> > > > 
> > > > v5->v6:
> > > > - rebase on 2.1rc5 PCI code
> > > > - forwarded IRQ first integraton
> > > 
> > > Why?  Are there acceleration paths that you're concerned cannot be
> > > implemented or we do not already have a proof of concept for?  The base
> > > kernel patch series you depend on is 3 months old yet this series
> > > continues to grow and add new dependencies.  Please let's prioritize
> > > getting something upstream instead of adding more blockers to prevent
> > > that.  Thanks,
> > > 
> > I'm not exactly sure what this changelog line was referring to
> > (depending on Marc's forwarding IRQ patches?), but just want to add that
> > there are a number of dependencies for the GIC that need to go in as
> > well (should happen within a few weeks), but I think it's unlikely that
> > the IRQ forwarding stuff goes in for v3.18 at this point.
> > 
> > It may make sense as you suggest to keep that part out of this patch set
> > and something merged sooner as opposed to later, but I'm too jet-lagged
> > to completely understand if that's going to be a horrible mess.
> 
> The point is that we're on v6 of a patch series and its first non-RFC
> posting and we're rolling in a first pass at a QEMU implementation that
> depends on a contested kernel RFC, which depends on another stagnant
> kernel RFC.  I'm fine with working on it in parallel, but give me some
> light at the end of the tunnel as a reviewer and maintainer that this
> code isn't going to live indefinitely on the mailing list.  Do we really
> need those GIC patches do be able to have non-KVM accelerated VFIO
> platform device assignment?  We certainly don't need IRQ forwarding.
> Thanks,
> 
You need the vgic cleanup and fixes series to do platform device
assignment on ARM, yes.

I would also like to see us moving faster on the VFIO platform patch
set, but we're not driving this effort so not sure what we (Linaro) can
do here.

The irqfd patch itself doesn't require IRQ forwarding and Eric was
accurately sending that as a separate patch, which I expect will be in
an upstreamable state soon.

The QEMU patch set should then probably be split, so an initial version
of the patch set without irq forwarding can go in.

The whole KVM-VFIO patch set is only about IRQ forwarding and I think
Eric prioritized this work in parallel because it makes the whole thing
useful performance-wise.

But, I agree with your point, this has been floating around for a long
time, so we should try to get some fixed points.  I'm mostly worried
about the vfio platform kernel patch set at this point though...

-Christoffer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-11 23:05       ` Christoffer Dall
@ 2014-09-15 22:01         ` Eric Auger
  2014-09-16 20:51           ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Auger @ 2014-09-15 22:01 UTC (permalink / raw)
  To: Christoffer Dall, Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, kvmarm, a.motakis, stuart.yoder,
	pbonzini, afaerber

On 09/12/2014 01:05 AM, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
>> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
>>> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
>>>> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
>>>>> This RFC series aims at enabling KVM platform device passthrough.
>>>>> It implements a VFIO platform device, derived from VFIO PCI device.
>>>>>
>>>>> The VFIO platform device uses the host VFIO platform driver which must
>>>>> be bound to the assigned device prior to the QEMU system start.
>>>>>
>>>>> - the guest can directly access the device register space
>>>>> - assigned device IRQs are transparently routed to the guest by
>>>>>   QEMU/KVM (3 methods currently are supported: user-level eventfd
>>>>>   handling, irqfd, forwarded IRQs)
>>>>> - iommu is transparently programmed to prevent the device from
>>>>>   accessing physical pages outside of the guest address space
>>>>>
>>>>> This patch series is made of the following patch files:
>>>>>
>>>>> 1-7) Modifications to PCI code to prepare for VFIO platform device
>>>>> 8) split of PCI specific code and generic code (move)
>>>>> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
>>>>>       support (MMIO direct access and IRQ assignment).
>>>>> 12) fake injection test modality (to test multiple IRQ)
>>>>> 13) addition of irqfd/virqfd support
>>>>> 14-16) forwarded IRQ
>>>>>
>>>>> Dependency List:
>>>>>
>>>>> QEMU dependencies:
>>>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
>>>>>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
>>>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
>>>>> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
>>>>>     Eric Auger
>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
>>>>> [4] [RFC] vfio: migration to trace points, Eric Auger
>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
>>>>>
>>>>> Kernel Dependencies:
>>>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
>>>>>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>>>> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
>>>>>     https://lkml.org/lkml/2014/9/1/141
>>>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
>>>>>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
>>>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>>>>>     https://lkml.org/lkml/2014/9/1/344
>>>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>>>>>     Marc Zyngier
>>>>>     http://lwn.net/Articles/603514/
>>>>>
>>>>> kernel pieces can be found at:
>>>>> http://git.linaro.org/people/eric.auger/linux.git
>>>>> (branch 3.17rc3_irqfd_forward_integ_v2)
>>>>> QEMU pieces can be found at:
>>>>> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
>>>>>
>>>>> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
>>>>> is assigned to KVM host while the second one is assigned to the guest.
>>>>> Reworked PCI device is not tested.
>>>>>
>>>>> Wiki for Calxeda Midway setup:
>>>>> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
>>>>>
>>>>> History:
>>>>>
>>>>> v5->v6:
>>>>> - rebase on 2.1rc5 PCI code
>>>>> - forwarded IRQ first integraton
>>>>
>>>> Why?  Are there acceleration paths that you're concerned cannot be
>>>> implemented or we do not already have a proof of concept for?  The base
>>>> kernel patch series you depend on is 3 months old yet this series
>>>> continues to grow and add new dependencies.  Please let's prioritize
>>>> getting something upstream instead of adding more blockers to prevent
>>>> that.  Thanks,
>>>>
>>> I'm not exactly sure what this changelog line was referring to
>>> (depending on Marc's forwarding IRQ patches?), but just want to add that
>>> there are a number of dependencies for the GIC that need to go in as
>>> well (should happen within a few weeks), but I think it's unlikely that
>>> the IRQ forwarding stuff goes in for v3.18 at this point.
>>>
>>> It may make sense as you suggest to keep that part out of this patch set
>>> and something merged sooner as opposed to later, but I'm too jet-lagged
>>> to completely understand if that's going to be a horrible mess.
>>
>> The point is that we're on v6 of a patch series and its first non-RFC
>> posting and we're rolling in a first pass at a QEMU implementation that
>> depends on a contested kernel RFC, which depends on another stagnant
>> kernel RFC.  I'm fine with working on it in parallel, but give me some
>> light at the end of the tunnel as a reviewer and maintainer that this
>> code isn't going to live indefinitely on the mailing list.  Do we really
>> need those GIC patches do be able to have non-KVM accelerated VFIO
>> platform device assignment?  We certainly don't need IRQ forwarding.
>> Thanks,

Hi Alex,

Sorry for the delay, I was travelling.

I understand your impatience. I personally would be happy if we could
envision upstreaming this patch in several steps. Let me know if it
makes sense.

STEP I:  integrate 1 - 11: leads to have a non-KVM accelerated VFIO QEMU
device. 12 can be part of it too but since it is a test feature this one
might be dropped. just let me know what you think.

depends on:
QEMU:
[1] [PATCH v2 0/9] Dynamic sysbus device allocation support, A. Graf
http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
[2] [RFC v3] machvirt dynamic sysbus device instantiation, E. Auger
[4] [RFC] vfio: migration to trace points, E. Auger
http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
KERNEL:
[5] [RFC Patch v6 0/20] VFIO support for platform devices, A. Motakis
https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html

Step II: integrate 13: kvm-accelerated QEMU VFIO device featuring
iqrfd/virqfd

depends on
[7] arm/arm64: KVM: Various VGIC cleanups and improvements, C. Dall
[6] [PATCH v3] ARM: KVM: add irqfd support, E. Auger
https://lkml.org/lkml/2014/9/1/141

Step III: integrate > 13:  kvm-accelerated QEMU VFIO device featuring
forwarded IRQs:
[8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
https://lkml.org/lkml/2014/9/1/344
[9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
Marc Zyngier, http://lwn.net/Articles/603514/

To me these 3 steps are quite independent from each other.

with respect to performance I think we have something reasonable now
with irqfd and forwarded IRQ so I do not expect any new features added
soon.

from now on, I do not plan to add any new patch file to this series but
just correct/modify according to comments & weaknesses.

I Hope it clarifies plans. Please let me know.

Best Regards

Eric



>>
> You need the vgic cleanup and fixes series to do platform device
> assignment on ARM, yes.
> 
> I would also like to see us moving faster on the VFIO platform patch
> set, but we're not driving this effort so not sure what we (Linaro) can
> do here.
> 
> The irqfd patch itself doesn't require IRQ forwarding and Eric was
> accurately sending that as a separate patch, which I expect will be in
> an upstreamable state soon.
> 
> The QEMU patch set should then probably be split, so an initial version
> of the patch set without irq forwarding can go in.
> 
> The whole KVM-VFIO patch set is only about IRQ forwarding and I think
> Eric prioritized this work in parallel because it makes the whole thing
> useful performance-wise.
> 
> But, I agree with your point, this has been floating around for a long
> time, so we should try to get some fixed points.  I'm mostly worried
> about the vfio platform kernel patch set at this point though...
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-15 22:01         ` Eric Auger
@ 2014-09-16 20:51           ` Alex Williamson
  2014-09-16 21:19             ` Eric Auger
  2014-09-16 21:23             ` Alex Williamson
  0 siblings, 2 replies; 32+ messages in thread
From: Alex Williamson @ 2014-09-16 20:51 UTC (permalink / raw)
  To: Eric Auger
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, kvmarm, a.motakis, stuart.yoder,
	pbonzini, afaerber, Christoffer Dall

On Tue, 2014-09-16 at 00:01 +0200, Eric Auger wrote:
> On 09/12/2014 01:05 AM, Christoffer Dall wrote:
> > On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
> >> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
> >>> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
> >>>> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> >>>>> This RFC series aims at enabling KVM platform device passthrough.
> >>>>> It implements a VFIO platform device, derived from VFIO PCI device.
> >>>>>
> >>>>> The VFIO platform device uses the host VFIO platform driver which must
> >>>>> be bound to the assigned device prior to the QEMU system start.
> >>>>>
> >>>>> - the guest can directly access the device register space
> >>>>> - assigned device IRQs are transparently routed to the guest by
> >>>>>   QEMU/KVM (3 methods currently are supported: user-level eventfd
> >>>>>   handling, irqfd, forwarded IRQs)
> >>>>> - iommu is transparently programmed to prevent the device from
> >>>>>   accessing physical pages outside of the guest address space
> >>>>>
> >>>>> This patch series is made of the following patch files:
> >>>>>
> >>>>> 1-7) Modifications to PCI code to prepare for VFIO platform device
> >>>>> 8) split of PCI specific code and generic code (move)
> >>>>> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
> >>>>>       support (MMIO direct access and IRQ assignment).
> >>>>> 12) fake injection test modality (to test multiple IRQ)
> >>>>> 13) addition of irqfd/virqfd support
> >>>>> 14-16) forwarded IRQ
> >>>>>
> >>>>> Dependency List:
> >>>>>
> >>>>> QEMU dependencies:
> >>>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
> >>>>>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> >>>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> >>>>> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
> >>>>>     Eric Auger
> >>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> >>>>> [4] [RFC] vfio: migration to trace points, Eric Auger
> >>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> >>>>>
> >>>>> Kernel Dependencies:
> >>>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
> >>>>>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> >>>>> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
> >>>>>     https://lkml.org/lkml/2014/9/1/141
> >>>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
> >>>>>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> >>>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> >>>>>     https://lkml.org/lkml/2014/9/1/344
> >>>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> >>>>>     Marc Zyngier
> >>>>>     http://lwn.net/Articles/603514/
> >>>>>
> >>>>> kernel pieces can be found at:
> >>>>> http://git.linaro.org/people/eric.auger/linux.git
> >>>>> (branch 3.17rc3_irqfd_forward_integ_v2)
> >>>>> QEMU pieces can be found at:
> >>>>> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> >>>>>
> >>>>> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> >>>>> is assigned to KVM host while the second one is assigned to the guest.
> >>>>> Reworked PCI device is not tested.
> >>>>>
> >>>>> Wiki for Calxeda Midway setup:
> >>>>> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> >>>>>
> >>>>> History:
> >>>>>
> >>>>> v5->v6:
> >>>>> - rebase on 2.1rc5 PCI code
> >>>>> - forwarded IRQ first integraton
> >>>>
> >>>> Why?  Are there acceleration paths that you're concerned cannot be
> >>>> implemented or we do not already have a proof of concept for?  The base
> >>>> kernel patch series you depend on is 3 months old yet this series
> >>>> continues to grow and add new dependencies.  Please let's prioritize
> >>>> getting something upstream instead of adding more blockers to prevent
> >>>> that.  Thanks,
> >>>>
> >>> I'm not exactly sure what this changelog line was referring to
> >>> (depending on Marc's forwarding IRQ patches?), but just want to add that
> >>> there are a number of dependencies for the GIC that need to go in as
> >>> well (should happen within a few weeks), but I think it's unlikely that
> >>> the IRQ forwarding stuff goes in for v3.18 at this point.
> >>>
> >>> It may make sense as you suggest to keep that part out of this patch set
> >>> and something merged sooner as opposed to later, but I'm too jet-lagged
> >>> to completely understand if that's going to be a horrible mess.
> >>
> >> The point is that we're on v6 of a patch series and its first non-RFC
> >> posting and we're rolling in a first pass at a QEMU implementation that
> >> depends on a contested kernel RFC, which depends on another stagnant
> >> kernel RFC.  I'm fine with working on it in parallel, but give me some
> >> light at the end of the tunnel as a reviewer and maintainer that this
> >> code isn't going to live indefinitely on the mailing list.  Do we really
> >> need those GIC patches do be able to have non-KVM accelerated VFIO
> >> platform device assignment?  We certainly don't need IRQ forwarding.
> >> Thanks,
> 
> Hi Alex,
> 
> Sorry for the delay, I was travelling.
> 
> I understand your impatience. I personally would be happy if we could
> envision upstreaming this patch in several steps. Let me know if it
> makes sense.
> 
> STEP I:  integrate 1 - 11: leads to have a non-KVM accelerated VFIO QEMU
> device. 12 can be part of it too but since it is a test feature this one
> might be dropped. just let me know what you think.

I'd probably drop 12.  Is that really something that's useful in
upstream code?  It's a good use of the vfio loopback interrupt and good
testing, but do you really want to maintain it in the code?  Is it
sufficient that it's been posted to the mailing list so you can find and
re-apply it if you want to do similar testing again?

> depends on:
> QEMU:
> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, A. Graf
> http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> [2] [RFC v3] machvirt dynamic sysbus device instantiation, E. Auger
> [4] [RFC] vfio: migration to trace points, E. Auger
> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> KERNEL:
> [5] [RFC Patch v6 0/20] VFIO support for platform devices, A. Motakis
> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html

Ok, so let's start whittling down these dependencies.  Trace points
shouldn't be any kind of blocker, you'll just need to teach me how to
use them and post a non-RFC patch ;)  At this point I don't even
remember the comments for the v6 VFIO kernel support for platform
devices.  I hope we're close enough that the next version can be sent as
non-RFC.  It might be a good idea to pick a target kernel version and
start working towards it.  v3.18 is probably not a realistic goal at
this point.  I don't know about the rest, but at least the remaining
series is non-RFC and the other is only a single patch.

> Step II: integrate 13: kvm-accelerated QEMU VFIO device featuring
> iqrfd/virqfd
> 
> depends on
> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, C. Dall
> [6] [PATCH v3] ARM: KVM: add irqfd support, E. Auger
> https://lkml.org/lkml/2014/9/1/141
> 
> Step III: integrate > 13:  kvm-accelerated QEMU VFIO device featuring
> forwarded IRQs:
> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> https://lkml.org/lkml/2014/9/1/344
> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> Marc Zyngier, http://lwn.net/Articles/603514/
> 
> To me these 3 steps are quite independent from each other.

Yep, I agree.  Let's not get bogged down in letting these additional
features interfere with progress on the base support.

> with respect to performance I think we have something reasonable now
> with irqfd and forwarded IRQ so I do not expect any new features added
> soon.
> 
> from now on, I do not plan to add any new patch file to this series but
> just correct/modify according to comments & weaknesses.
> 
> I Hope it clarifies plans. Please let me know.

Thanks, it does.  We have several players in the VFIO platform space and
I want to make sure we're aligned on a goal of getting code upstream,
not just posting it to the list.  Thanks for the breakdown and your work
towards getting those dependencies resolved.

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-16 20:51           ` Alex Williamson
@ 2014-09-16 21:19             ` Eric Auger
  2014-09-16 21:23             ` Alex Williamson
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-16 21:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, kvmarm, a.motakis, stuart.yoder,
	pbonzini, afaerber, Christoffer Dall

On 09/16/2014 10:51 PM, Alex Williamson wrote:
> On Tue, 2014-09-16 at 00:01 +0200, Eric Auger wrote:
>> On 09/12/2014 01:05 AM, Christoffer Dall wrote:
>>> On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
>>>> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
>>>>> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
>>>>>> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
>>>>>>> This RFC series aims at enabling KVM platform device passthrough.
>>>>>>> It implements a VFIO platform device, derived from VFIO PCI device.
>>>>>>>
>>>>>>> The VFIO platform device uses the host VFIO platform driver which must
>>>>>>> be bound to the assigned device prior to the QEMU system start.
>>>>>>>
>>>>>>> - the guest can directly access the device register space
>>>>>>> - assigned device IRQs are transparently routed to the guest by
>>>>>>>   QEMU/KVM (3 methods currently are supported: user-level eventfd
>>>>>>>   handling, irqfd, forwarded IRQs)
>>>>>>> - iommu is transparently programmed to prevent the device from
>>>>>>>   accessing physical pages outside of the guest address space
>>>>>>>
>>>>>>> This patch series is made of the following patch files:
>>>>>>>
>>>>>>> 1-7) Modifications to PCI code to prepare for VFIO platform device
>>>>>>> 8) split of PCI specific code and generic code (move)
>>>>>>> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
>>>>>>>       support (MMIO direct access and IRQ assignment).
>>>>>>> 12) fake injection test modality (to test multiple IRQ)
>>>>>>> 13) addition of irqfd/virqfd support
>>>>>>> 14-16) forwarded IRQ
>>>>>>>
>>>>>>> Dependency List:
>>>>>>>
>>>>>>> QEMU dependencies:
>>>>>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
>>>>>>>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
>>>>>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
>>>>>>> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
>>>>>>>     Eric Auger
>>>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
>>>>>>> [4] [RFC] vfio: migration to trace points, Eric Auger
>>>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
>>>>>>>
>>>>>>> Kernel Dependencies:
>>>>>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
>>>>>>>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>>>>>> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
>>>>>>>     https://lkml.org/lkml/2014/9/1/141
>>>>>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
>>>>>>>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
>>>>>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>>>>>>>     https://lkml.org/lkml/2014/9/1/344
>>>>>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>>>>>>>     Marc Zyngier
>>>>>>>     http://lwn.net/Articles/603514/
>>>>>>>
>>>>>>> kernel pieces can be found at:
>>>>>>> http://git.linaro.org/people/eric.auger/linux.git
>>>>>>> (branch 3.17rc3_irqfd_forward_integ_v2)
>>>>>>> QEMU pieces can be found at:
>>>>>>> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
>>>>>>>
>>>>>>> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
>>>>>>> is assigned to KVM host while the second one is assigned to the guest.
>>>>>>> Reworked PCI device is not tested.
>>>>>>>
>>>>>>> Wiki for Calxeda Midway setup:
>>>>>>> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
>>>>>>>
>>>>>>> History:
>>>>>>>
>>>>>>> v5->v6:
>>>>>>> - rebase on 2.1rc5 PCI code
>>>>>>> - forwarded IRQ first integraton
>>>>>>
>>>>>> Why?  Are there acceleration paths that you're concerned cannot be
>>>>>> implemented or we do not already have a proof of concept for?  The base
>>>>>> kernel patch series you depend on is 3 months old yet this series
>>>>>> continues to grow and add new dependencies.  Please let's prioritize
>>>>>> getting something upstream instead of adding more blockers to prevent
>>>>>> that.  Thanks,
>>>>>>
>>>>> I'm not exactly sure what this changelog line was referring to
>>>>> (depending on Marc's forwarding IRQ patches?), but just want to add that
>>>>> there are a number of dependencies for the GIC that need to go in as
>>>>> well (should happen within a few weeks), but I think it's unlikely that
>>>>> the IRQ forwarding stuff goes in for v3.18 at this point.
>>>>>
>>>>> It may make sense as you suggest to keep that part out of this patch set
>>>>> and something merged sooner as opposed to later, but I'm too jet-lagged
>>>>> to completely understand if that's going to be a horrible mess.
>>>>
>>>> The point is that we're on v6 of a patch series and its first non-RFC
>>>> posting and we're rolling in a first pass at a QEMU implementation that
>>>> depends on a contested kernel RFC, which depends on another stagnant
>>>> kernel RFC.  I'm fine with working on it in parallel, but give me some
>>>> light at the end of the tunnel as a reviewer and maintainer that this
>>>> code isn't going to live indefinitely on the mailing list.  Do we really
>>>> need those GIC patches do be able to have non-KVM accelerated VFIO
>>>> platform device assignment?  We certainly don't need IRQ forwarding.
>>>> Thanks,
>>
>> Hi Alex,
>>
>> Sorry for the delay, I was travelling.
>>
>> I understand your impatience. I personally would be happy if we could
>> envision upstreaming this patch in several steps. Let me know if it
>> makes sense.
>>
>> STEP I:  integrate 1 - 11: leads to have a non-KVM accelerated VFIO QEMU
>> device. 12 can be part of it too but since it is a test feature this one
>> might be dropped. just let me know what you think.
> 
> I'd probably drop 12.  Is that really something that's useful in
> upstream code?  It's a good use of the vfio loopback interrupt and good
> testing, but do you really want to maintain it in the code?  Is it
> sufficient that it's been posted to the mailing list so you can find and
> re-apply it if you want to do similar testing again?
Hi Alex,

yes I agree with you about dropping it.

> 
>> depends on:
>> QEMU:
>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, A. Graf
>> http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, E. Auger
>> [4] [RFC] vfio: migration to trace points, E. Auger
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
>> KERNEL:
>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, A. Motakis
>> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> 
> Ok, so let's start whittling down these dependencies.  Trace points
> shouldn't be any kind of blocker, you'll just need to teach me how to
> use them and post a non-RFC patch ;)

Yes sure I will write some instructions. I need to investigate the
parser issues related to parenthesis (either fix it myself or ask
Stefan's help).

  At this point I don't even
> remember the comments for the v6 VFIO kernel support for platform
> devices.  I hope we're close enough that the next version can be sent as
> non-RFC.  It might be a good idea to pick a target kernel version and
> start working towards it.  v3.18 is probably not a realistic goal at
> this point.  I don't know about the rest, but at least the remaining
> series is non-RFC and the other is only a single patch.

On my side I will iterate rapidly on both
[2] [RFC v3] machvirt dynamic sysbus device instantiation and
[6] [PATCH v3] ARM: KVM: add irqfd support

> 
>> Step II: integrate 13: kvm-accelerated QEMU VFIO device featuring
>> iqrfd/virqfd
>>
>> depends on
>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, C. Dall
>> [6] [PATCH v3] ARM: KVM: add irqfd support, E. Auger
>> https://lkml.org/lkml/2014/9/1/141
>>
>> Step III: integrate > 13:  kvm-accelerated QEMU VFIO device featuring
>> forwarded IRQs:
>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>> https://lkml.org/lkml/2014/9/1/344
>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>> Marc Zyngier, http://lwn.net/Articles/603514/
>>
>> To me these 3 steps are quite independent from each other.
> 
> Yep, I agree.  Let's not get bogged down in letting these additional
> features interfere with progress on the base support.
> 
>> with respect to performance I think we have something reasonable now
>> with irqfd and forwarded IRQ so I do not expect any new features added
>> soon.
>>
>> from now on, I do not plan to add any new patch file to this series but
>> just correct/modify according to comments & weaknesses.
>>
>> I Hope it clarifies plans. Please let me know.
> 
> Thanks, it does.  We have several players in the VFIO platform space and
> I want to make sure we're aligned on a goal of getting code upstream,
> not just posting it to the list.  Thanks for the breakdown and your work
> towards getting those dependencies resolved.
Thanks

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-16 20:51           ` Alex Williamson
  2014-09-16 21:19             ` Eric Auger
@ 2014-09-16 21:23             ` Alex Williamson
  2014-09-19  0:29               ` Eric Auger
  1 sibling, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2014-09-16 21:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, Christoffer Dall, pbonzini,
	stuart.yoder, a.motakis, kvmarm, afaerber

On Tue, 2014-09-16 at 14:51 -0600, Alex Williamson wrote:
> On Tue, 2014-09-16 at 00:01 +0200, Eric Auger wrote:
> > On 09/12/2014 01:05 AM, Christoffer Dall wrote:
> > > On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
> > >> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
> > >>> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
> > >>>> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
> > >>>>> This RFC series aims at enabling KVM platform device passthrough.
> > >>>>> It implements a VFIO platform device, derived from VFIO PCI device.
> > >>>>>
> > >>>>> The VFIO platform device uses the host VFIO platform driver which must
> > >>>>> be bound to the assigned device prior to the QEMU system start.
> > >>>>>
> > >>>>> - the guest can directly access the device register space
> > >>>>> - assigned device IRQs are transparently routed to the guest by
> > >>>>>   QEMU/KVM (3 methods currently are supported: user-level eventfd
> > >>>>>   handling, irqfd, forwarded IRQs)
> > >>>>> - iommu is transparently programmed to prevent the device from
> > >>>>>   accessing physical pages outside of the guest address space
> > >>>>>
> > >>>>> This patch series is made of the following patch files:
> > >>>>>
> > >>>>> 1-7) Modifications to PCI code to prepare for VFIO platform device
> > >>>>> 8) split of PCI specific code and generic code (move)
> > >>>>> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
> > >>>>>       support (MMIO direct access and IRQ assignment).
> > >>>>> 12) fake injection test modality (to test multiple IRQ)
> > >>>>> 13) addition of irqfd/virqfd support
> > >>>>> 14-16) forwarded IRQ
> > >>>>>
> > >>>>> Dependency List:
> > >>>>>
> > >>>>> QEMU dependencies:
> > >>>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
> > >>>>>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> > >>>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
> > >>>>> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
> > >>>>>     Eric Auger
> > >>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
> > >>>>> [4] [RFC] vfio: migration to trace points, Eric Auger
> > >>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> > >>>>>
> > >>>>> Kernel Dependencies:
> > >>>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
> > >>>>>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > >>>>> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
> > >>>>>     https://lkml.org/lkml/2014/9/1/141
> > >>>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
> > >>>>>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
> > >>>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> > >>>>>     https://lkml.org/lkml/2014/9/1/344
> > >>>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> > >>>>>     Marc Zyngier
> > >>>>>     http://lwn.net/Articles/603514/
> > >>>>>
> > >>>>> kernel pieces can be found at:
> > >>>>> http://git.linaro.org/people/eric.auger/linux.git
> > >>>>> (branch 3.17rc3_irqfd_forward_integ_v2)
> > >>>>> QEMU pieces can be found at:
> > >>>>> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
> > >>>>>
> > >>>>> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
> > >>>>> is assigned to KVM host while the second one is assigned to the guest.
> > >>>>> Reworked PCI device is not tested.
> > >>>>>
> > >>>>> Wiki for Calxeda Midway setup:
> > >>>>> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
> > >>>>>
> > >>>>> History:
> > >>>>>
> > >>>>> v5->v6:
> > >>>>> - rebase on 2.1rc5 PCI code
> > >>>>> - forwarded IRQ first integraton
> > >>>>
> > >>>> Why?  Are there acceleration paths that you're concerned cannot be
> > >>>> implemented or we do not already have a proof of concept for?  The base
> > >>>> kernel patch series you depend on is 3 months old yet this series
> > >>>> continues to grow and add new dependencies.  Please let's prioritize
> > >>>> getting something upstream instead of adding more blockers to prevent
> > >>>> that.  Thanks,
> > >>>>
> > >>> I'm not exactly sure what this changelog line was referring to
> > >>> (depending on Marc's forwarding IRQ patches?), but just want to add that
> > >>> there are a number of dependencies for the GIC that need to go in as
> > >>> well (should happen within a few weeks), but I think it's unlikely that
> > >>> the IRQ forwarding stuff goes in for v3.18 at this point.
> > >>>
> > >>> It may make sense as you suggest to keep that part out of this patch set
> > >>> and something merged sooner as opposed to later, but I'm too jet-lagged
> > >>> to completely understand if that's going to be a horrible mess.
> > >>
> > >> The point is that we're on v6 of a patch series and its first non-RFC
> > >> posting and we're rolling in a first pass at a QEMU implementation that
> > >> depends on a contested kernel RFC, which depends on another stagnant
> > >> kernel RFC.  I'm fine with working on it in parallel, but give me some
> > >> light at the end of the tunnel as a reviewer and maintainer that this
> > >> code isn't going to live indefinitely on the mailing list.  Do we really
> > >> need those GIC patches do be able to have non-KVM accelerated VFIO
> > >> platform device assignment?  We certainly don't need IRQ forwarding.
> > >> Thanks,
> > 
> > Hi Alex,
> > 
> > Sorry for the delay, I was travelling.
> > 
> > I understand your impatience. I personally would be happy if we could
> > envision upstreaming this patch in several steps. Let me know if it
> > makes sense.
> > 
> > STEP I:  integrate 1 - 11: leads to have a non-KVM accelerated VFIO QEMU
> > device. 12 can be part of it too but since it is a test feature this one
> > might be dropped. just let me know what you think.
> 
> I'd probably drop 12.  Is that really something that's useful in
> upstream code?  It's a good use of the vfio loopback interrupt and good
> testing, but do you really want to maintain it in the code?  Is it
> sufficient that it's been posted to the mailing list so you can find and
> re-apply it if you want to do similar testing again?
> 
> > depends on:
> > QEMU:
> > [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, A. Graf
> > http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
> > [2] [RFC v3] machvirt dynamic sysbus device instantiation, E. Auger
> > [4] [RFC] vfio: migration to trace points, E. Auger
> > http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
> > KERNEL:
> > [5] [RFC Patch v6 0/20] VFIO support for platform devices, A. Motakis
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> 
> Ok, so let's start whittling down these dependencies.  Trace points
> shouldn't be any kind of blocker, you'll just need to teach me how to
> use them and post a non-RFC patch ;)  At this point I don't even
> remember the comments for the v6 VFIO kernel support for platform
> devices.  I hope we're close enough that the next version can be sent as
> non-RFC.  It might be a good idea to pick a target kernel version and
> start working towards it.  v3.18 is probably not a realistic goal at
> this point.  I don't know about the rest, but at least the remaining
> series is non-RFC and the other is only a single patch.
> 
> > Step II: integrate 13: kvm-accelerated QEMU VFIO device featuring
> > iqrfd/virqfd
> > 
> > depends on
> > [7] arm/arm64: KVM: Various VGIC cleanups and improvements, C. Dall
> > [6] [PATCH v3] ARM: KVM: add irqfd support, E. Auger
> > https://lkml.org/lkml/2014/9/1/141
> > 
> > Step III: integrate > 13:  kvm-accelerated QEMU VFIO device featuring
> > forwarded IRQs:
> > [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
> > https://lkml.org/lkml/2014/9/1/344
> > [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
> > Marc Zyngier, http://lwn.net/Articles/603514/
> > 
> > To me these 3 steps are quite independent from each other.
> 
> Yep, I agree.  Let's not get bogged down in letting these additional
> features interfere with progress on the base support.
> 
> > with respect to performance I think we have something reasonable now
> > with irqfd and forwarded IRQ so I do not expect any new features added
> > soon.
> > 
> > from now on, I do not plan to add any new patch file to this series but
> > just correct/modify according to comments & weaknesses.
> > 
> > I Hope it clarifies plans. Please let me know.
> 
> Thanks, it does.  We have several players in the VFIO platform space and
> I want to make sure we're aligned on a goal of getting code upstream,
> not just posting it to the list.  Thanks for the breakdown and your work
> towards getting those dependencies resolved.

Actually, should Step I from your perspective be patches 1-8 of this
series?  If we remove VFIO_DEVICE_TYPE_PLATFORM from patch 3 and the
resulting instances of it, the rest is simply moving and splitting PCI
support in preparation for, but independent of platform support.  That
can be done entirely in parallel to the platform kernel support and
leaves a lot less here to review when that comes around.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough
  2014-09-16 21:23             ` Alex Williamson
@ 2014-09-19  0:29               ` Eric Auger
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Auger @ 2014-09-19  0:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, peter.maydell,
	marc.zyngier, manish.jaggi, patches, will.deacon, qemu-devel,
	a.rigo, Bharat.Bhushan, agraf, Christoffer Dall, pbonzini,
	stuart.yoder, a.motakis, kvmarm, afaerber

On 09/16/2014 11:23 PM, Alex Williamson wrote:
> On Tue, 2014-09-16 at 14:51 -0600, Alex Williamson wrote:
>> On Tue, 2014-09-16 at 00:01 +0200, Eric Auger wrote:
>>> On 09/12/2014 01:05 AM, Christoffer Dall wrote:
>>>> On Thu, Sep 11, 2014 at 04:51:14PM -0600, Alex Williamson wrote:
>>>>> On Thu, 2014-09-11 at 15:23 -0700, Christoffer Dall wrote:
>>>>>> On Thu, Sep 11, 2014 at 04:14:09PM -0600, Alex Williamson wrote:
>>>>>>> On Tue, 2014-09-09 at 08:31 +0100, Eric Auger wrote:
>>>>>>>> This RFC series aims at enabling KVM platform device passthrough.
>>>>>>>> It implements a VFIO platform device, derived from VFIO PCI device.
>>>>>>>>
>>>>>>>> The VFIO platform device uses the host VFIO platform driver which must
>>>>>>>> be bound to the assigned device prior to the QEMU system start.
>>>>>>>>
>>>>>>>> - the guest can directly access the device register space
>>>>>>>> - assigned device IRQs are transparently routed to the guest by
>>>>>>>>   QEMU/KVM (3 methods currently are supported: user-level eventfd
>>>>>>>>   handling, irqfd, forwarded IRQs)
>>>>>>>> - iommu is transparently programmed to prevent the device from
>>>>>>>>   accessing physical pages outside of the guest address space
>>>>>>>>
>>>>>>>> This patch series is made of the following patch files:
>>>>>>>>
>>>>>>>> 1-7) Modifications to PCI code to prepare for VFIO platform device
>>>>>>>> 8) split of PCI specific code and generic code (move)
>>>>>>>> 9-11) creation of the VFIO calxeda xgmac platform device, without irqfd
>>>>>>>>       support (MMIO direct access and IRQ assignment).
>>>>>>>> 12) fake injection test modality (to test multiple IRQ)
>>>>>>>> 13) addition of irqfd/virqfd support
>>>>>>>> 14-16) forwarded IRQ
>>>>>>>>
>>>>>>>> Dependency List:
>>>>>>>>
>>>>>>>> QEMU dependencies:
>>>>>>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, Alex Graf
>>>>>>>>     http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
>>>>>>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, Eric Auger
>>>>>>>> [3] [PATCH v2 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
>>>>>>>>     Eric Auger
>>>>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
>>>>>>>> [4] [RFC] vfio: migration to trace points, Eric Auger
>>>>>>>>     http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
>>>>>>>>
>>>>>>>> Kernel Dependencies:
>>>>>>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, Antonios Motakis
>>>>>>>>     https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>>>>>>> [6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
>>>>>>>>     https://lkml.org/lkml/2014/9/1/141
>>>>>>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, Christoffer Dall
>>>>>>>>     http://comments.gmane.org/gmane.linux.ports.arm.kernel/340430
>>>>>>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>>>>>>>>     https://lkml.org/lkml/2014/9/1/344
>>>>>>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>>>>>>>>     Marc Zyngier
>>>>>>>>     http://lwn.net/Articles/603514/
>>>>>>>>
>>>>>>>> kernel pieces can be found at:
>>>>>>>> http://git.linaro.org/people/eric.auger/linux.git
>>>>>>>> (branch 3.17rc3_irqfd_forward_integ_v2)
>>>>>>>> QEMU pieces can be found at:
>>>>>>>> http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v6)
>>>>>>>>
>>>>>>>> The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
>>>>>>>> is assigned to KVM host while the second one is assigned to the guest.
>>>>>>>> Reworked PCI device is not tested.
>>>>>>>>
>>>>>>>> Wiki for Calxeda Midway setup:
>>>>>>>> https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway
>>>>>>>>
>>>>>>>> History:
>>>>>>>>
>>>>>>>> v5->v6:
>>>>>>>> - rebase on 2.1rc5 PCI code
>>>>>>>> - forwarded IRQ first integraton
>>>>>>>
>>>>>>> Why?  Are there acceleration paths that you're concerned cannot be
>>>>>>> implemented or we do not already have a proof of concept for?  The base
>>>>>>> kernel patch series you depend on is 3 months old yet this series
>>>>>>> continues to grow and add new dependencies.  Please let's prioritize
>>>>>>> getting something upstream instead of adding more blockers to prevent
>>>>>>> that.  Thanks,
>>>>>>>
>>>>>> I'm not exactly sure what this changelog line was referring to
>>>>>> (depending on Marc's forwarding IRQ patches?), but just want to add that
>>>>>> there are a number of dependencies for the GIC that need to go in as
>>>>>> well (should happen within a few weeks), but I think it's unlikely that
>>>>>> the IRQ forwarding stuff goes in for v3.18 at this point.
>>>>>>
>>>>>> It may make sense as you suggest to keep that part out of this patch set
>>>>>> and something merged sooner as opposed to later, but I'm too jet-lagged
>>>>>> to completely understand if that's going to be a horrible mess.
>>>>>
>>>>> The point is that we're on v6 of a patch series and its first non-RFC
>>>>> posting and we're rolling in a first pass at a QEMU implementation that
>>>>> depends on a contested kernel RFC, which depends on another stagnant
>>>>> kernel RFC.  I'm fine with working on it in parallel, but give me some
>>>>> light at the end of the tunnel as a reviewer and maintainer that this
>>>>> code isn't going to live indefinitely on the mailing list.  Do we really
>>>>> need those GIC patches do be able to have non-KVM accelerated VFIO
>>>>> platform device assignment?  We certainly don't need IRQ forwarding.
>>>>> Thanks,
>>>
>>> Hi Alex,
>>>
>>> Sorry for the delay, I was travelling.
>>>
>>> I understand your impatience. I personally would be happy if we could
>>> envision upstreaming this patch in several steps. Let me know if it
>>> makes sense.
>>>
>>> STEP I:  integrate 1 - 11: leads to have a non-KVM accelerated VFIO QEMU
>>> device. 12 can be part of it too but since it is a test feature this one
>>> might be dropped. just let me know what you think.
>>
>> I'd probably drop 12.  Is that really something that's useful in
>> upstream code?  It's a good use of the vfio loopback interrupt and good
>> testing, but do you really want to maintain it in the code?  Is it
>> sufficient that it's been posted to the mailing list so you can find and
>> re-apply it if you want to do similar testing again?
>>
>>> depends on:
>>> QEMU:
>>> [1] [PATCH v2 0/9] Dynamic sysbus device allocation support, A. Graf
>>> http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
>>> [2] [RFC v3] machvirt dynamic sysbus device instantiation, E. Auger
>>> [4] [RFC] vfio: migration to trace points, E. Auger
>>> http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00569.html
>>> KERNEL:
>>> [5] [RFC Patch v6 0/20] VFIO support for platform devices, A. Motakis
>>> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>
>> Ok, so let's start whittling down these dependencies.  Trace points
>> shouldn't be any kind of blocker, you'll just need to teach me how to
>> use them and post a non-RFC patch ;)  At this point I don't even
>> remember the comments for the v6 VFIO kernel support for platform
>> devices.  I hope we're close enough that the next version can be sent as
>> non-RFC.  It might be a good idea to pick a target kernel version and
>> start working towards it.  v3.18 is probably not a realistic goal at
>> this point.  I don't know about the rest, but at least the remaining
>> series is non-RFC and the other is only a single patch.
>>
>>> Step II: integrate 13: kvm-accelerated QEMU VFIO device featuring
>>> iqrfd/virqfd
>>>
>>> depends on
>>> [7] arm/arm64: KVM: Various VGIC cleanups and improvements, C. Dall
>>> [6] [PATCH v3] ARM: KVM: add irqfd support, E. Auger
>>> https://lkml.org/lkml/2014/9/1/141
>>>
>>> Step III: integrate > 13:  kvm-accelerated QEMU VFIO device featuring
>>> forwarded IRQs:
>>> [8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
>>> https://lkml.org/lkml/2014/9/1/344
>>> [9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
>>> Marc Zyngier, http://lwn.net/Articles/603514/
>>>
>>> To me these 3 steps are quite independent from each other.
>>
>> Yep, I agree.  Let's not get bogged down in letting these additional
>> features interfere with progress on the base support.
>>
>>> with respect to performance I think we have something reasonable now
>>> with irqfd and forwarded IRQ so I do not expect any new features added
>>> soon.
>>>
>>> from now on, I do not plan to add any new patch file to this series but
>>> just correct/modify according to comments & weaknesses.
>>>
>>> I Hope it clarifies plans. Please let me know.
>>
>> Thanks, it does.  We have several players in the VFIO platform space and
>> I want to make sure we're aligned on a goal of getting code upstream,
>> not just posting it to the list.  Thanks for the breakdown and your work
>> towards getting those dependencies resolved.
> 
> Actually, should Step I from your perspective be patches 1-8 of this
> series?  If we remove VFIO_DEVICE_TYPE_PLATFORM from patch 3 and the
> resulting instances of it, the rest is simply moving and splitting PCI
> support in preparation for, but independent of platform support.  That
> can be done entirely in parallel to the platform kernel support and
> leaves a lot less here to review when that comes around.  Thanks,

Hi Alex,

yes sure, we can add another step consisting in preparing the PCI code
before introducing vfio platform device. I thought you would prefer to
have a "client" of those changes.

Best Regards

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-09-19  0:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-09  7:31 [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 07/16] hw/vfio/pci: use name field in format strings Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 08/16] hw/vfio: create common module Eric Auger
2014-09-10 13:09   ` Alexander Graf
2014-09-11 12:11     ` Eric Auger
2014-09-11 12:13       ` Alexander Graf
2014-09-11 14:21         ` Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 10/16] hw/vfio: calxeda xgmac device Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 11/16] hw/arm/dyn_sysbus_devtree: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
2014-09-10 13:12   ` Alexander Graf
2014-09-11 14:20     ` Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 12/16] vfio/platform: add fake injection modality Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 13/16] hw/vfio/platform: Add irqfd support Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 15/16] VFIO: COMMON: vfio_kvm_device_fd moved in the common header Eric Auger
2014-09-09  7:31 ` [Qemu-devel] [PATCH v6 16/16] VFIO: PLATFORM: add forwarded irq support Eric Auger
2014-09-11 22:14 ` [Qemu-devel] [PATCH v6 00/16] KVM platform device passthrough Alex Williamson
2014-09-11 22:23   ` Christoffer Dall
2014-09-11 22:51     ` Alex Williamson
2014-09-11 23:05       ` Christoffer Dall
2014-09-15 22:01         ` Eric Auger
2014-09-16 20:51           ` Alex Williamson
2014-09-16 21:19             ` Eric Auger
2014-09-16 21:23             ` Alex Williamson
2014-09-19  0:29               ` Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.