All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough
@ 2014-10-31 14:05 Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
                   ` (15 more replies)
  0 siblings, 16 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

This RFC series aims at enabling KVM platform device passthrough.
It implements a VFIO platform device, derived from VFIO PCI device.

The VFIO platform device uses the host VFIO platform driver which must
be bound to the assigned device prior to the QEMU system start.

- the guest can directly access the device register space
- assigned device IRQs are transparently routed to the guest by
  QEMU/KVM (3 methods currently are supported: user-level eventfd
  handling, irqfd, forwarded IRQs)
- iommu is transparently programmed to prevent the device from
  accessing physical pages outside of the guest address space

This patch series is made of the following patch file groups:

1-8) PCI modifications to prepare for platform device introduction
9-12) VFIO platform device without irqfd support
13) VFIO platform device with irqfd support
14-16) VFIO platform device with IRQ forwarding support

Each group is independent and should be separately upstreamable.

Dependency List:

QEMU dependencies:
[1] [PATCH v3 0/7] Dynamic sysbus device allocation support, Alex Graf
    http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04860.html
[2] [PATCH v4] machvirt dynamic sysbus device instantiation, Eric Auger
[3] [PATCH v3 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
    Eric Auger
    http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
[4] [PATCH v2] vfio: migration to trace points, Eric Auger
    https://patchwork.ozlabs.org/patch/394785/

Kernel Dependencies:
[5] [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM
    http://comments.gmane.org/gmane.linux.kernel.iommu/7096
[6] [PATCH v3] ARM: KVM: add irqfd support, Eric Auger
    https://lkml.org/lkml/2014/9/1/141
[8] [RFC v2 0/9] KVM-VFIO IRQ forward control, Eric Auger
    https://lkml.org/lkml/2014/9/1/344
[9] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
    Marc Zyngier
    http://lwn.net/Articles/603514/

- kernel pieces can be found at:
  http://git.linaro.org/people/eric.auger/linux.git (branch 3.17rc7-v8)
- QEMU pieces can be found at:
  http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v7)

The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
is assigned to KVM host while the second one is assigned to the guest.
Reworked PCI device is not tested.

Wiki for Calxeda Midway setup:
https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway

History:
v6->v7:
- fake injection test modality removed
- VFIO_DEVICE_TYPE_PLATFORM only introduced with VFIO platform
- new helper functions to start VFIO IRQ on machine init done notifier
  (introduced in hw/vfio/platform: add vfio-platform support and notifier
  registration invoked in hw/arm/virt: add support for VFIO devices).
  vfio_start_irq_injection is replaced by vfio_register_irq_starter.

v5->v6:
- rebase on 2.1rc5 PCI code
- forwarded IRQ first integraton
- vfio_device property renamed into host property
- split IRQ setup in different functions that match the 3 supported
  injection techniques (user handled eventfd, irqfd, forwarded IRQ):
  removes dynamic switch between injection methods
- introduce fake interrupts as a test modality:
  x makes possible to test multiple IRQ user-side handling.
  x this is a test feature only: enable to trigger a fd as if the
    real physical IRQ hit. No virtual IRQ is injected into the guest
    but handling is simulated so that the state machine can be tested
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs using fake interrupt modality
- irqfd no more advertised in this patchset (handled in [3])
- VFIOPlatformDeviceClass becomes abstract and Calxeda xgmac device
  and class is re-introduced (as per v4)
- all DPRINTF removed in platform and replaced by trace-points
- corrects compilation with configure --disable-kvm
- simplifies the split for vfio_get_device and introduce a unique
  specialized function named vfio_populate_device
- group_list renamed into vfio_group_list
- hw/arm/dyn_sysbus_devtree.c currently only support vfio-calxeda-xgmac
  instantiation. Needs to be specialized for other VFIO devices
- fix 2 bugs in dyn_sysbus_devtree(reg_attr index and compat)

v4->v5:
- rebase on v2.1.0 PCI code
- take into account Alex Williamson comments on PCI code rework
  - trace updates in vfio_region_write/read
  - remove fd from VFIORegion
  - get/put ckeanup
- bug fix: bar region's vbasedev field duly initialization
- misc cleanups in platform device
- device tree node generation removed from device and handled in
  hw/arm/dyn_sysbus_devtree.c
- remove "hw/vfio: add an example calxeda_xgmac": with removal of
  device tree node generation we do not have so many things to
  implement in that derived device yet. May be re-introduced later
  on if needed typically for reset/migration.
- no GSI routing table anymore

v3->v4 changes (Eric Auger, Alvise Rigo)
- rebase on last VFIO PCI code (v2.1.0-rc0)
- full git history rework to ease PCI code change review
- mv include files in hw/vfio
- DPRINTF reformatting temporarily moved out
- support of VFIO virq (removal of resamplefd handler on user-side)
- integration with sysbus dynamic instantiation framwork
- removal of unrealize and cleanup routines until it is better
  understood what is really needed
- Support of VFIO for Amba devices should be handled in an inherited
  device to specialize the device tree generation (clock handle currently
  missing in framework however)
- "Always use eventfd as notifying mechanism" temporarily moved out
- static instantiation is not mainstream (although it remains possible)
  note if static instantiation is used, irqfd must be setup in machine file
  when virtual IRQ is known
- create the GSI routing table on qemu side

v2->v3 changes (Alvise Rigo, Eric Auger):
- Following Alex W recommandations, further efforts to factorize the
  code between PCI:introduction of VFIODevice and VFIORegion
  as base classes
- unique reset handler for platform and PCI
- cleanup following Kim's comments
- multiple IRQ support mechanics should be in place although not
  tested
- Better handling of MMIO multiple regions
- New features and fixes by Alvise (multiple compat string, exec
  flag, force eventfd usage, amba device tree support)
- irqfd support

v1->v2 changes (Kim Phillips, Eric Auger):
- IRQ initial support (legacy mode where eventfds are handled on
  user side)
- hacked dynamic instantiation

v1 (Kim Phillips):
- initial split between PCI and platform
- MMIO support only
- static instantiation

Best Regards

Eric

Eric Auger (15):
  hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  hw/vfio/pci: introduce VFIODevice
  hw/vfio/pci: Introduce VFIORegion
  hw/vfio/pci: split vfio_get_device
  hw/vfio/pci: rename group_list into vfio_group_list
  hw/vfio/pci: use name field in format strings
  hw/vfio: create common module
  hw/vfio/platform: add vfio-platform support
  hw/vfio: calxeda xgmac device
  hw/arm/virt: add support for VFIO devices
  hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  hw/vfio/platform: Add irqfd support
  linux-headers: Update KVM headers from linux-next tag ToBeFilled
  hw/vfio/common: vfio_kvm_device_fd moved in the common header
  hw/vfio/platform: add forwarded irq support

Kim Phillips (1):
  vfio: move hw/misc/vfio.c to hw/vfio/pci.c     Move vfio.h into
    include/hw/vfio

 LICENSE                              |    2 +-
 MAINTAINERS                          |    2 +-
 hw/Makefile.objs                     |    1 +
 hw/arm/sysbus-fdt.c                  |   88 ++
 hw/arm/virt.c                        |    9 +
 hw/misc/Makefile.objs                |    1 -
 hw/ppc/spapr_pci_vfio.c              |    2 +-
 hw/vfio/Makefile.objs                |    6 +
 hw/vfio/calxeda_xgmac.c              |   54 ++
 hw/vfio/common.c                     |  959 +++++++++++++++++++
 hw/{misc/vfio.c => vfio/pci.c}       | 1671 +++++++---------------------------
 hw/vfio/platform.c                   |  820 +++++++++++++++++
 include/hw/vfio/vfio-calxeda-xgmac.h |   41 +
 include/hw/vfio/vfio-common.h        |  157 ++++
 include/hw/vfio/vfio-platform.h      |   90 ++
 include/hw/{misc => vfio}/vfio.h     |    0
 linux-headers/linux/kvm.h            |    9 +
 trace-events                         |  137 +--
 18 files changed, 2636 insertions(+), 1413 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 create mode 100644 hw/vfio/calxeda_xgmac.c
 create mode 100644 hw/vfio/common.c
 rename hw/{misc/vfio.c => vfio/pci.c} (65%)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h
 create mode 100644 include/hw/vfio/vfio-common.h
 create mode 100644 include/hw/vfio/vfio-platform.h
 rename include/hw/{misc => vfio}/vfio.h (100%)

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, eric.auger, will.deacon,
	stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

From: Kim Phillips <kim.phillips@linaro.org>

This is done in preparation for the addition of VFIO platform
device support.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 LICENSE                          | 2 +-
 MAINTAINERS                      | 2 +-
 hw/Makefile.objs                 | 1 +
 hw/misc/Makefile.objs            | 1 -
 hw/ppc/spapr_pci_vfio.c          | 2 +-
 hw/vfio/Makefile.objs            | 3 +++
 hw/{misc/vfio.c => vfio/pci.c}   | 2 +-
 include/hw/{misc => vfio}/vfio.h | 0
 8 files changed, 8 insertions(+), 5 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 rename hw/{misc/vfio.c => vfio/pci.c} (99%)
 rename include/hw/{misc => vfio}/vfio.h (100%)

diff --git a/LICENSE b/LICENSE
index da70e94..0e0b4b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -11,7 +11,7 @@ option) any later version.
 
 As of July 2013, contributions under version 2 of the GNU General Public
 License (and no later version) are only accepted for the following files
-or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
+or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*.
 
 3) The Tiny Code Generator (TCG) is released under the BSD license
    (see license headers in files).
diff --git a/MAINTAINERS b/MAINTAINERS
index 94366ef..3f2db91 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -656,7 +656,7 @@ F: hw/usb/dev-serial.c
 VFIO
 M: Alex Williamson <alex.williamson@redhat.com>
 S: Supported
-F: hw/misc/vfio.c
+F: hw/vfio/*
 
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 52a1464..73afa41 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/
 devices-dirs-$(CONFIG_SOFTMMU) += timer/
 devices-dirs-$(CONFIG_TPM) += tpm/
 devices-dirs-$(CONFIG_SOFTMMU) += usb/
+devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 979e532..e47fea8 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/
 
 ifeq ($(CONFIG_PCI), y)
 obj-$(CONFIG_KVM) += ivshmem.o
-obj-$(CONFIG_LINUX) += vfio.o
 endif
 
 obj-$(CONFIG_REALVIEW) += arm_sysctl.o
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..144912b 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,7 +20,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "linux/vfio.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
 
 static Property spapr_phb_vfio_properties[] = {
     DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
new file mode 100644
index 0000000..31c7dab
--- /dev/null
+++ b/hw/vfio/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_PCI) += pci.o
+endif
diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c
similarity index 99%
rename from hw/misc/vfio.c
rename to hw/vfio/pci.c
index cdf4922..8514b9e 100644
--- a/hw/misc/vfio.c
+++ b/hw/vfio/pci.c
@@ -39,8 +39,8 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
-#include "hw/misc/vfio.h"
 #include "trace.h"
+#include "hw/vfio/vfio.h"
 
 /* Extra debugging, trap acceleration paths for more logging */
 #define VFIO_ALLOW_MMAP 1
diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h
similarity index 100%
rename from include/hw/misc/vfio.h
rename to include/hw/vfio/vfio.h
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

This prepares for the introduction of VFIOPlatformDevice

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 210 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 106 insertions(+), 104 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8514b9e..93181bf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,11 +48,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
-struct VFIODevice;
+struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
     MemoryRegion mem;
-    struct VFIODevice *vdev;
+    struct VFIOPCIDevice *vdev;
     QLIST_ENTRY(VFIOQuirk) next;
     struct {
         uint32_t base_offset:TARGET_PAGE_BITS;
@@ -123,7 +123,7 @@ typedef struct VFIOMSIVector {
      */
     EventNotifier interrupt;
     EventNotifier kvm_interrupt;
-    struct VFIODevice *vdev; /* back pointer to device */
+    struct VFIOPCIDevice *vdev; /* back pointer to device */
     int virq;
     bool use;
 } VFIOMSIVector;
@@ -185,7 +185,7 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODevice {
+typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     int fd;
     VFIOINTx intx;
@@ -203,7 +203,7 @@ typedef struct VFIODevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIODevice) next;
+    QLIST_ENTRY(VFIOPCIDevice) next;
     struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
@@ -218,13 +218,13 @@ typedef struct VFIODevice {
     bool has_pm_reset;
     bool needs_reset;
     bool rom_read_failed;
-} VFIODevice;
+} VFIOPCIDevice;
 
 typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_HEAD(, VFIOPCIDevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -268,16 +268,16 @@ static QLIST_HEAD(, VFIOGroup)
 static int vfio_kvm_device_fd = -1;
 #endif
 
-static void vfio_disable_interrupts(VFIODevice *vdev);
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIODevice *vdev, int index)
+static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -293,7 +293,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int index)
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIODevice *vdev)
+static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -307,7 +307,7 @@ static void vfio_unmask_intx(VFIODevice *vdev)
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIODevice *vdev)
+static void vfio_mask_intx(VFIOPCIDevice *vdev)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -338,7 +338,7 @@ static void vfio_mask_intx(VFIODevice *vdev)
  */
 static void vfio_intx_mmap_enable(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (vdev->intx.pending) {
         timer_mod(vdev->intx.mmap_timer,
@@ -351,7 +351,7 @@ static void vfio_intx_mmap_enable(void *opaque)
 
 static void vfio_intx_interrupt(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
         return;
@@ -370,7 +370,7 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIODevice *vdev)
+static void vfio_eoi(VFIOPCIDevice *vdev)
 {
     if (!vdev->intx.pending) {
         return;
@@ -384,7 +384,7 @@ static void vfio_eoi(VFIODevice *vdev)
     vfio_unmask_intx(vdev);
 }
 
-static void vfio_enable_intx_kvm(VFIODevice *vdev)
+static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -462,7 +462,7 @@ fail:
 #endif
 }
 
-static void vfio_disable_intx_kvm(VFIODevice *vdev)
+static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
     struct kvm_irqfd irqfd = {
@@ -506,7 +506,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev)
 
 static void vfio_update_irq(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     PCIINTxRoute route;
 
     if (vdev->interrupt != VFIO_INT_INTx) {
@@ -537,7 +537,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_eoi(vdev);
 }
 
-static int vfio_enable_intx(VFIODevice *vdev)
+static int vfio_enable_intx(VFIOPCIDevice *vdev)
 {
     uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
     int ret, argsz;
@@ -602,7 +602,7 @@ static int vfio_enable_intx(VFIODevice *vdev)
     return 0;
 }
 
-static void vfio_disable_intx(VFIODevice *vdev)
+static void vfio_disable_intx(VFIOPCIDevice *vdev)
 {
     int fd;
 
@@ -629,7 +629,7 @@ static void vfio_disable_intx(VFIODevice *vdev)
 static void vfio_msi_interrupt(void *opaque)
 {
     VFIOMSIVector *vector = opaque;
-    VFIODevice *vdev = vector->vdev;
+    VFIOPCIDevice *vdev = vector->vdev;
     int nr = vector - vdev->msi_vectors;
 
     if (!event_notifier_test_and_clear(&vector->interrupt)) {
@@ -661,7 +661,7 @@ static void vfio_msi_interrupt(void *opaque)
     }
 }
 
-static int vfio_enable_vectors(VFIODevice *vdev, bool msix)
+static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
     struct vfio_irq_set *irq_set;
     int ret = 0, i, argsz;
@@ -752,7 +752,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg)
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector;
     int ret;
 
@@ -841,7 +841,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
     trace_vfio_msix_vector_release(vdev->host.domain, vdev->host.bus,
@@ -880,7 +880,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
     }
 }
 
-static void vfio_enable_msix(VFIODevice *vdev)
+static void vfio_enable_msix(VFIOPCIDevice *vdev)
 {
     vfio_disable_interrupts(vdev);
 
@@ -913,7 +913,7 @@ static void vfio_enable_msix(VFIODevice *vdev)
                            vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_enable_msi(VFIODevice *vdev)
+static void vfio_enable_msi(VFIOPCIDevice *vdev)
 {
     int ret, i;
 
@@ -991,7 +991,7 @@ retry:
                           vdev->nr_vectors);
 }
 
-static void vfio_disable_msi_common(VFIODevice *vdev)
+static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1015,7 +1015,7 @@ static void vfio_disable_msi_common(VFIODevice *vdev)
     vfio_enable_intx(vdev);
 }
 
-static void vfio_disable_msix(VFIODevice *vdev)
+static void vfio_disable_msix(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1042,7 +1042,7 @@ static void vfio_disable_msix(VFIODevice *vdev)
                             vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_disable_msi(VFIODevice *vdev)
+static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
     vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
@@ -1051,7 +1051,7 @@ static void vfio_disable_msi(VFIODevice *vdev)
                            vdev->host.slot, vdev->host.function);
 }
 
-static void vfio_update_msi(VFIODevice *vdev)
+static void vfio_update_msi(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -1104,7 +1104,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         trace_vfio_bar_write(vdev->host.domain, vdev->host.bus,
                              vdev->host.slot, vdev->host.function,
@@ -1120,7 +1120,7 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 }
 
 static uint64_t vfio_bar_read(void *opaque,
@@ -1158,7 +1158,7 @@ static uint64_t vfio_bar_read(void *opaque,
 
 #ifdef DEBUG_VFIO
     {
-        VFIODevice *vdev = container_of(bar, VFIODevice, bars[bar->nr]);
+        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
 
         trace_vfio_bar_read(vdev->host.domain, vdev->host.bus,
                             vdev->host.slot, vdev->host.function,
@@ -1167,7 +1167,7 @@ static uint64_t vfio_bar_read(void *opaque,
 #endif
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIODevice, bars[bar->nr]));
+    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
 
     return data;
 }
@@ -1178,7 +1178,7 @@ static const MemoryRegionOps vfio_bar_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_pci_load_rom(VFIODevice *vdev)
+static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
         .argsz = sizeof(reg_info),
@@ -1236,7 +1236,7 @@ static void vfio_pci_load_rom(VFIODevice *vdev)
 
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1286,7 +1286,7 @@ static const MemoryRegionOps vfio_rom_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
+static bool vfio_blacklist_opt_rom(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t vendor_id, device_id;
@@ -1306,7 +1306,7 @@ static bool vfio_blacklist_opt_rom(VFIODevice *vdev)
     return false;
 }
 
-static void vfio_pci_size_rom(VFIODevice *vdev)
+static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
@@ -1484,7 +1484,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1520,7 +1520,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     if (ranges_overlap(addr, size,
                        quirk->data.address_offset, quirk->data.address_size)) {
@@ -1578,7 +1578,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
     uint64_t data;
@@ -1611,7 +1611,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                                      uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
     hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
 
@@ -1659,7 +1659,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
                                         hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     uint64_t data = vfio_pci_read_config(&vdev->pdev,
                                          PCI_BASE_ADDRESS_0 + (4 * 4) + 1,
                                          size);
@@ -1673,7 +1673,7 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1715,7 +1715,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIODevice *vdev)
  * that only read-only access is provided, but we drop writes when the window
  * is enabled to config space nonetheless.
  */
-static void vfio_probe_ati_bar4_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1778,7 +1778,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
                                                hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1824,7 +1824,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
                                             uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
 
     switch (addr) {
     case 4: /* address */
@@ -1873,7 +1873,7 @@ static const MemoryRegionOps vfio_rtl8168_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1902,7 +1902,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIODevice *vdev, int nr)
 /*
  * Trap the BAR2 MMIO window to config space as well.
  */
-static void vfio_probe_ati_bar2_4000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -1971,7 +1971,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
                                            hwaddr addr, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
                                   addr + quirk->data.base_offset, size);
@@ -1990,7 +1990,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, hwaddr addr,
                                         uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
 
     switch (quirk->data.flags) {
@@ -2037,7 +2037,7 @@ static const MemoryRegionOps vfio_nvidia_3d0_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_vga_probe_nvidia_3d0_quirk(VFIODevice *vdev)
+static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2130,7 +2130,7 @@ static const MemoryRegionOps vfio_nvidia_bar5_window_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void vfio_probe_nvidia_bar5_window_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2166,7 +2166,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
                                           uint64_t data, unsigned size)
 {
     VFIOQuirk *quirk = opaque;
-    VFIODevice *vdev = quirk->vdev;
+    VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
     hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
 
@@ -2199,7 +2199,7 @@ static const MemoryRegionOps vfio_nvidia_88000_quirk = {
  *
  * Here's offset 0x88000...
  */
-static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2238,7 +2238,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIODevice *vdev, int nr)
 /*
  * And here's the same for BAR0 offset 0x1800...
  */
-static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
+static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 {
     PCIDevice *pdev = &vdev->pdev;
     VFIOQuirk *quirk;
@@ -2283,13 +2283,13 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIODevice *vdev, int nr)
 /*
  * Common quirk probe entry points.
  */
-static void vfio_vga_quirk_setup(VFIODevice *vdev)
+static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 {
     vfio_vga_probe_ati_3c3_quirk(vdev);
     vfio_vga_probe_nvidia_3d0_quirk(vdev);
 }
 
-static void vfio_vga_quirk_teardown(VFIODevice *vdev)
+static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -2304,7 +2304,7 @@ static void vfio_vga_quirk_teardown(VFIODevice *vdev)
     }
 }
 
-static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
 {
     vfio_probe_ati_bar4_window_quirk(vdev, nr);
     vfio_probe_ati_bar2_4000_quirk(vdev, nr);
@@ -2314,7 +2314,7 @@ static void vfio_bar_quirk_setup(VFIODevice *vdev, int nr)
     vfio_probe_rtl8168_bar2_window_quirk(vdev, nr);
 }
 
-static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
+static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2332,7 +2332,7 @@ static void vfio_bar_quirk_teardown(VFIODevice *vdev, int nr)
  */
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
     memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -2367,7 +2367,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t val_le = cpu_to_le32(val);
 
     trace_vfio_pci_write_config(vdev->host.domain, vdev->host.bus,
@@ -2722,7 +2722,7 @@ static void vfio_listener_release(VFIOContainer *container)
 /*
  * Interrupt setup
  */
-static void vfio_disable_interrupts(VFIODevice *vdev)
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
 {
     switch (vdev->interrupt) {
     case VFIO_INT_INTx:
@@ -2737,7 +2737,7 @@ static void vfio_disable_interrupts(VFIODevice *vdev)
     }
 }
 
-static int vfio_setup_msi(VFIODevice *vdev, int pos)
+static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
 {
     uint16_t ctrl;
     bool msi_64bit, msi_maskbit;
@@ -2777,7 +2777,7 @@ static int vfio_setup_msi(VFIODevice *vdev, int pos)
  * need to first look for where the MSI-X table lives.  So we
  * unfortunately split MSI-X setup across two functions.
  */
-static int vfio_early_setup_msix(VFIODevice *vdev)
+static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
 {
     uint8_t pos;
     uint16_t ctrl;
@@ -2823,7 +2823,7 @@ static int vfio_early_setup_msix(VFIODevice *vdev)
     return 0;
 }
 
-static int vfio_setup_msix(VFIODevice *vdev, int pos)
+static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
 {
     int ret;
 
@@ -2843,7 +2843,7 @@ static int vfio_setup_msix(VFIODevice *vdev, int pos)
     return 0;
 }
 
-static void vfio_teardown_msi(VFIODevice *vdev)
+static void vfio_teardown_msi(VFIOPCIDevice *vdev)
 {
     msi_uninit(&vdev->pdev);
 
@@ -2856,7 +2856,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
 /*
  * Resource setup
  */
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
 {
     int i;
 
@@ -2874,7 +2874,7 @@ static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled)
     }
 }
 
-static void vfio_unmap_bar(VFIODevice *vdev, int nr)
+static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
@@ -2893,7 +2893,7 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
     }
 }
 
-static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar,
+static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
                          MemoryRegion *mem, MemoryRegion *submem,
                          void **map, size_t size, off_t offset,
                          const char *name)
@@ -2931,7 +2931,7 @@ empty_region:
     return ret;
 }
 
-static void vfio_map_bar(VFIODevice *vdev, int nr)
+static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
     unsigned size = bar->size;
@@ -3000,7 +3000,7 @@ static void vfio_map_bar(VFIODevice *vdev, int nr)
     vfio_bar_quirk_setup(vdev, nr);
 }
 
-static void vfio_map_bars(VFIODevice *vdev)
+static void vfio_map_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3032,7 +3032,7 @@ static void vfio_map_bars(VFIODevice *vdev)
     }
 }
 
-static void vfio_unmap_bars(VFIODevice *vdev)
+static void vfio_unmap_bars(VFIOPCIDevice *vdev)
 {
     int i;
 
@@ -3068,7 +3068,7 @@ static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
     pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_word(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_word(VFIOPCIDevice *vdev, int pos,
                                    uint16_t val, uint16_t mask)
 {
     vfio_set_word_bits(vdev->pdev.config + pos, val, mask);
@@ -3081,7 +3081,7 @@ static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask)
     pci_set_long(buf, (pci_get_long(buf) & ~mask) | val);
 }
 
-static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
+static void vfio_add_emulated_long(VFIOPCIDevice *vdev, int pos,
                                    uint32_t val, uint32_t mask)
 {
     vfio_set_long_bits(vdev->pdev.config + pos, val, mask);
@@ -3089,7 +3089,7 @@ static void vfio_add_emulated_long(VFIODevice *vdev, int pos,
     vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
 }
 
-static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size)
 {
     uint16_t flags;
     uint8_t type;
@@ -3181,7 +3181,7 @@ static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size)
     return pos;
 }
 
-static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
 
@@ -3192,7 +3192,7 @@ static void vfio_check_pcie_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
 
@@ -3203,7 +3203,7 @@ static void vfio_check_pm_reset(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
+static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
 {
     uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
 
@@ -3214,7 +3214,7 @@ static void vfio_check_af_flr(VFIODevice *vdev, uint8_t pos)
     }
 }
 
-static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint8_t cap_id, next, size;
@@ -3289,7 +3289,7 @@ static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos)
     return 0;
 }
 
-static int vfio_add_capabilities(VFIODevice *vdev)
+static int vfio_add_capabilities(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
 
@@ -3301,7 +3301,7 @@ static int vfio_add_capabilities(VFIODevice *vdev)
     return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
 }
 
-static void vfio_pci_pre_reset(VFIODevice *vdev)
+static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t cmd;
@@ -3338,7 +3338,7 @@ static void vfio_pci_pre_reset(VFIODevice *vdev)
     vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
-static void vfio_pci_post_reset(VFIODevice *vdev)
+static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
     vfio_enable_intx(vdev);
 }
@@ -3350,7 +3350,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
             host1->slot == host2->slot && host1->function == host2->function);
 }
 
-static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
     VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info;
@@ -3401,7 +3401,7 @@ static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3495,7 +3495,7 @@ out:
     /* Re-enable INTx on affected devices */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
-        VFIODevice *tmp;
+        VFIOPCIDevice *tmp;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3545,12 +3545,12 @@ out_single:
  * _one() will only do a hot reset for the one in-use devices case, calling
  * _multi() will do nothing if a _one() would have been sufficient.
  */
-static int vfio_pci_hot_reset_one(VFIODevice *vdev)
+static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
 {
     return vfio_pci_hot_reset(vdev, false);
 }
@@ -3558,7 +3558,7 @@ static int vfio_pci_hot_reset_multi(VFIODevice *vdev)
 static void vfio_pci_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIODevice *vdev;
+    VFIOPCIDevice *vdev;
 
     QLIST_FOREACH(group, &group_list, next) {
         QLIST_FOREACH(vdev, &group->device_list, next) {
@@ -3896,7 +3896,8 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev)
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIOPCIDevice *vdev)
 {
     struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
     struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
@@ -4049,7 +4050,7 @@ error:
     return ret;
 }
 
-static void vfio_put_device(VFIODevice *vdev)
+static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     QLIST_REMOVE(vdev, next);
     vdev->group = NULL;
@@ -4063,7 +4064,7 @@ static void vfio_put_device(VFIODevice *vdev)
 
 static void vfio_err_notifier_handler(void *opaque)
 {
-    VFIODevice *vdev = opaque;
+    VFIOPCIDevice *vdev = opaque;
 
     if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
         return;
@@ -4092,7 +4093,7 @@ static void vfio_err_notifier_handler(void *opaque)
  * and continue after disabling error recovery support for the
  * device.
  */
-static void vfio_register_err_notifier(VFIODevice *vdev)
+static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
 {
     int ret;
     int argsz;
@@ -4133,7 +4134,7 @@ static void vfio_register_err_notifier(VFIODevice *vdev)
     g_free(irq_set);
 }
 
-static void vfio_unregister_err_notifier(VFIODevice *vdev)
+static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 {
     int argsz;
     struct vfio_irq_set *irq_set;
@@ -4168,7 +4169,7 @@ static void vfio_unregister_err_notifier(VFIODevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIODevice *pvdev, *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4321,7 +4322,7 @@ out_put:
 
 static void vfio_exitfn(PCIDevice *pdev)
 {
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOGroup *group = vdev->group;
 
     vfio_unregister_err_notifier(vdev);
@@ -4341,7 +4342,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 static void vfio_pci_reset(DeviceState *dev)
 {
     PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
     trace_vfio_pci_reset(vdev->host.domain, vdev->host.bus,
                          vdev->host.slot, vdev->host.function);
@@ -4375,7 +4376,7 @@ post_reset:
 static void vfio_instance_init(Object *obj)
 {
     PCIDevice *pci_dev = PCI_DEVICE(obj);
-    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, PCI_DEVICE(obj));
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, PCI_DEVICE(obj));
 
     device_add_bootindex_property(obj, &vdev->bootindex,
                                   "bootindex", NULL,
@@ -4383,15 +4384,16 @@ static void vfio_instance_init(Object *obj)
 }
 
 static Property vfio_pci_dev_properties[] = {
-    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIODevice, host),
-    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIODevice,
+    DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
+    DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
                        intx.mmap_timeout, 1100),
-    DEFINE_PROP_BIT("x-vga", VFIODevice, features,
+    DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
                     VFIO_FEATURE_ENABLE_VGA_BIT, false),
+    DEFINE_PROP_INT32("bootindex", VFIOPCIDevice, bootindex, -1),
     /*
      * TODO - support passed fds... is this necessary?
-     * DEFINE_PROP_STRING("vfiofd", VFIODevice, vfiofd_name),
-     * DEFINE_PROP_STRING("vfiogroupfd, VFIODevice, vfiogroupfd_name),
+     * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
+     * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
      */
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -4421,7 +4423,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 static const TypeInfo vfio_pci_dev_info = {
     .name = "vfio-pci",
     .parent = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(VFIODevice),
+    .instance_size = sizeof(VFIOPCIDevice),
     .class_init = vfio_pci_dev_class_init,
     .instance_init = vfio_instance_init,
 };
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-11-05 17:35   ` Alex Williamson
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Introduce the VFIODevice struct that is going to be shared by
VFIOPCIDevice and VFIOPlatformDevice.

Additional fields will be added there later on for review
convenience.

the group's device_list becomes a list of VFIODevice

This obliges to rework the reset_handler which becomes generic and
calls VFIODevice ops that are specialized in each parent object.
Also functions that iterate on this list must take care that the
devices can be something else than VFIOPCIDevice. The type is used
to discriminate them.

we profit from this step to change the prototype of
vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
apply to VFIODevice. They are renamed as *_irqindex.
The index is passed as parameter to anticipate their usage for
platform IRQs

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- fix style issues
- in vfio_initfn, rework allocation of vdev->vbasedev.name and
  replace snprintf by g_strdup_printf
---
 hw/vfio/pci.c | 241 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 147 insertions(+), 94 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 93181bf..0531744 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,6 +48,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
+};
+
 struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
@@ -185,9 +190,27 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+};
+
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
-    int fd;
+    VFIODevice vbasedev;
     VFIOINTx intx;
     unsigned int config_size;
     uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
@@ -203,20 +226,16 @@ typedef struct VFIOPCIDevice {
     VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
     VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
     PCIHostDeviceAddress host;
-    QLIST_ENTRY(VFIOPCIDevice) next;
-    struct VFIOGroup *group;
     EventNotifier err_notifier;
     uint32_t features;
 #define VFIO_FEATURE_ENABLE_VGA_BIT 0
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
     int32_t bootindex;
     uint8_t pm_cap;
-    bool reset_works;
     bool has_vga;
     bool pci_aer;
     bool has_flr;
     bool has_pm_reset;
-    bool needs_reset;
     bool rom_read_failed;
 } VFIOPCIDevice;
 
@@ -224,7 +243,7 @@ typedef struct VFIOGroup {
     int fd;
     int groupid;
     VFIOContainer *container;
-    QLIST_HEAD(, VFIOPCIDevice) device_list;
+    QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -277,7 +296,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
@@ -287,37 +306,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
         .count = 0,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIOPCIDevice *vdev)
+static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIOPCIDevice *vdev)
+static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
 {
     struct vfio_irq_set irq_set = {
         .argsz = sizeof(irq_set),
         .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = VFIO_PCI_INTX_IRQ_INDEX,
+        .index = index,
         .start = 0,
         .count = 1,
     };
 
-    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 #endif
 
@@ -381,7 +400,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -404,7 +423,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     /* Get to a known interrupt state */
     qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -434,7 +453,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     *pfd = irqfd.resamplefd;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
@@ -442,7 +461,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
     }
 
     /* Let'em rip */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     vdev->intx.kvm_accel = true;
 
@@ -458,7 +477,7 @@ fail_irqfd:
     event_notifier_cleanup(&vdev->intx.unmask);
 fail:
     qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 #endif
 }
 
@@ -479,7 +498,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
      * Get to a known state, hardware masked, QEMU ready to accept new
      * interrupts, QEMU IRQ de-asserted.
      */
-    vfio_mask_intx(vdev);
+    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
 
@@ -497,7 +516,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
     vdev->intx.kvm_accel = false;
 
     /* If we've missed an event, let it re-fire through QEMU */
-    vfio_unmask_intx(vdev);
+    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
     trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
                                 vdev->host.slot, vdev->host.function);
@@ -583,7 +602,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
     qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
     if (ret) {
         error_report("vfio: Error: Failed to setup INTx fd: %m");
@@ -608,7 +627,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
     timer_del(vdev->intx.mmap_timer);
     vfio_disable_intx_kvm(vdev);
-    vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
     vfio_mmap_set_enabled(vdev, true);
@@ -698,7 +717,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
         fds[i] = fd;
     }
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
     g_free(irq_set);
 
@@ -795,7 +814,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
      * increase them as needed.
      */
     if (vdev->nr_vectors < nr + 1) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
         vdev->nr_vectors = nr + 1;
         ret = vfio_enable_vectors(vdev, true);
         if (ret) {
@@ -823,7 +842,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             *pfd = event_notifier_get_fd(&vector->interrupt);
         }
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
         g_free(irq_set);
         if (ret) {
             error_report("vfio: failed to modify vector, %d", ret);
@@ -874,7 +893,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 
         *pfd = event_notifier_get_fd(&vector->interrupt);
 
-        ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
         g_free(irq_set);
     }
@@ -1033,7 +1052,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
     }
 
     if (vdev->nr_vectors) {
-        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
+        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
     }
 
     vfio_disable_msi_common(vdev);
@@ -1044,7 +1063,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
 
 static void vfio_disable_msi(VFIOPCIDevice *vdev)
 {
-    vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
+    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
 
     trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
@@ -1188,7 +1207,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     off_t off = 0;
     size_t bytes;
 
-    if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
+    if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
         error_report("vfio: Error getting ROM info: %m");
         return;
     }
@@ -1218,7 +1237,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     memset(vdev->rom, 0xff, size);
 
     while (size) {
-        bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
+        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+                      size, vdev->rom_offset + off);
         if (bytes == 0) {
             break;
         } else if (bytes > 0) {
@@ -1312,6 +1332,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
     DeviceState *dev = DEVICE(vdev);
     char name[32];
+    int fd = vdev->vbasedev.fd;
 
     if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
         /* Since pci handles romfile, just print a message and return */
@@ -1330,10 +1351,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
      * Use the same size ROM BAR as the physical device.  The contents
      * will get filled in later when the guest tries to read it.
      */
-    if (pread(vdev->fd, &orig, 4, offset) != 4 ||
-        pwrite(vdev->fd, &size, 4, offset) != 4 ||
-        pread(vdev->fd, &size, 4, offset) != 4 ||
-        pwrite(vdev->fd, &orig, 4, offset) != 4) {
+    if (pread(fd, &orig, 4, offset) != 4 ||
+        pwrite(fd, &size, 4, offset) != 4 ||
+        pread(fd, &size, 4, offset) != 4 ||
+        pwrite(fd, &orig, 4, offset) != 4) {
         error_report("%s(%04x:%02x:%02x.%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function);
@@ -2345,7 +2366,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
     if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
         ssize_t ret;
 
-        ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
+        ret = pread(vdev->vbasedev.fd, &phys_val, len,
+                    vdev->config_offset + addr);
         if (ret != len) {
             error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
                          __func__, vdev->host.domain, vdev->host.bus,
@@ -2375,7 +2397,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                 addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
-    if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
+    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
+                != len) {
         error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
                      __func__, vdev->host.domain, vdev->host.bus,
                      vdev->host.slot, vdev->host.function, addr, val, len);
@@ -2743,7 +2766,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
     bool msi_64bit, msi_maskbit;
     int ret, entries;
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
@@ -2782,23 +2805,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
     uint8_t pos;
     uint16_t ctrl;
     uint32_t table, pba;
+    int fd = vdev->vbasedev.fd;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
     if (!pos) {
         return 0;
     }
 
-    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
+    if (pread(fd, &ctrl, sizeof(ctrl),
               vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &table, sizeof(table),
+    if (pread(fd, &table, sizeof(table),
               vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
         return -errno;
     }
 
-    if (pread(vdev->fd, &pba, sizeof(pba),
+    if (pread(fd, &pba, sizeof(pba),
               vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
         return -errno;
     }
@@ -2950,7 +2974,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
              vdev->host.function, nr);
 
     /* Determine what type of BAR this is for registration */
-    ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
+    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
                 vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
     if (ret != sizeof(pci_bar)) {
         error_report("vfio: Failed to read BAR %d (%m)", nr);
@@ -3365,12 +3389,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                              single ? "one" : "multi");
 
     vfio_pci_pre_reset(vdev);
-    vdev->needs_reset = false;
+    vdev->vbasedev.needs_reset = false;
 
     info = g_malloc0(sizeof(*info));
     info->argsz = sizeof(*info);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret && errno != ENOSPC) {
         ret = -errno;
         if (!vdev->has_pm_reset) {
@@ -3386,7 +3410,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     info->argsz = sizeof(*info) + (count * sizeof(*devices));
     devices = &info->devices[0];
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret) {
         ret = -errno;
         error_report("vfio: hot reset info failed: %m");
@@ -3402,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3433,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         }
 
         /* Prep dependent devices for reset and clear our marker. */
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 if (single) {
                     error_report("vfio: found another in-use device "
@@ -3443,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                     goto out_single;
                 }
                 vfio_pci_pre_reset(tmp);
-                tmp->needs_reset = false;
+                tmp->vbasedev.needs_reset = false;
                 multi = true;
                 break;
             }
@@ -3482,7 +3511,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     }
 
     /* Bus reset! */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
     g_free(reset);
 
     trace_vfio_pci_hot_reset_result(vdev->host.domain,
@@ -3496,6 +3525,7 @@ out:
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
         VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
 
         host.domain = devices[i].segment;
         host.bus = devices[i].bus;
@@ -3516,7 +3546,11 @@ out:
             break;
         }
 
-        QLIST_FOREACH(tmp, &group->device_list, next) {
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 vfio_pci_post_reset(tmp);
                 break;
@@ -3550,28 +3584,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
     return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     return vfio_pci_hot_reset(vdev, false);
 }
 
-static void vfio_pci_reset_handler(void *opaque)
+static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+    if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+        vbasedev->needs_reset = true;
+    }
+    return vbasedev->needs_reset;
+}
+
+static VFIODeviceOps vfio_pci_ops = {
+    .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+};
+
+static void vfio_reset_handler(void *opaque)
 {
     VFIOGroup *group;
-    VFIOPCIDevice *vdev;
+    VFIODevice *vbasedev;
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
-                vdev->needs_reset = true;
-            }
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
         }
     }
 
     QLIST_FOREACH(group, &group_list, next) {
-        QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (vdev->needs_reset) {
-                vfio_pci_hot_reset_multi(vdev);
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
             }
         }
     }
@@ -3860,7 +3907,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
     }
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_register_reset(vfio_pci_reset_handler, NULL);
+        qemu_register_reset(vfio_reset_handler, NULL);
     }
 
     QLIST_INSERT_HEAD(&group_list, group, next);
@@ -3892,7 +3939,7 @@ static void vfio_put_group(VFIOGroup *group)
     g_free(group);
 
     if (QLIST_EMPTY(&group_list)) {
-        qemu_unregister_reset(vfio_pci_reset_handler, NULL);
+        qemu_unregister_reset(vfio_reset_handler, NULL);
     }
 }
 
@@ -3913,12 +3960,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         return ret;
     }
 
-    vdev->fd = ret;
-    vdev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vdev, next);
+    vdev->vbasedev.fd = ret;
+    vdev->vbasedev.group = group;
+    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
 
     /* Sanity check device */
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
     if (ret) {
         error_report("vfio: error getting device info: %m");
         goto error;
@@ -3932,7 +3979,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
 
     if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
         error_report("vfio: unexpected number of io regions %u",
@@ -3948,7 +3995,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
         if (ret) {
             error_report("vfio: Error getting region %d info: %m", i);
             goto error;
@@ -3962,14 +4009,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         vdev->bars[i].flags = reg_info.flags;
         vdev->bars[i].size = reg_info.size;
         vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->fd;
+        vdev->bars[i].fd = vdev->vbasedev.fd;
         vdev->bars[i].nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
     reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
     if (ret) {
         error_report("vfio: Error getting config info: %m");
         goto error;
@@ -3992,7 +4039,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
             .index = VFIO_PCI_VGA_REGION_INDEX,
          };
 
-        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
+        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
         if (ret) {
             error_report(
                 "vfio: Device does not support requested feature x-vga");
@@ -4009,7 +4056,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         }
 
         vdev->vga.fd_offset = vga_info.offset;
-        vdev->vga.fd = vdev->fd;
+        vdev->vga.fd = vdev->vbasedev.fd;
 
         vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
         vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
@@ -4027,7 +4074,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     }
     irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
         trace_vfio_get_device_get_irq_info_failure();
@@ -4043,19 +4090,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
 
 error:
     if (ret) {
-        QLIST_REMOVE(vdev, next);
-        vdev->group = NULL;
-        close(vdev->fd);
+        QLIST_REMOVE(&vdev->vbasedev, next);
+        vdev->vbasedev.group = NULL;
+        close(vdev->vbasedev.fd);
     }
     return ret;
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(vdev, next);
-    vdev->group = NULL;
-    trace_vfio_put_device(vdev->fd);
-    close(vdev->fd);
+    QLIST_REMOVE(&vdev->vbasedev, next);
+    vdev->vbasedev.group = NULL;
+    trace_vfio_put_device(vdev->vbasedev.fd);
+    close(vdev->vbasedev.fd);
+    g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
@@ -4124,7 +4172,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
     *pfd = event_notifier_get_fd(&vdev->err_notifier);
     qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to set up error notification");
         qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
@@ -4157,7 +4205,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
     pfd = (int32_t *)&irq_set->data;
     *pfd = -1;
 
-    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
     if (ret) {
         error_report("vfio: Failed to de-assign error fd: %m");
     }
@@ -4169,7 +4217,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
 
 static int vfio_initfn(PCIDevice *pdev)
 {
-    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    VFIODevice *vbasedev_iter;
     VFIOGroup *group;
     char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
     ssize_t len;
@@ -4187,6 +4236,13 @@ static int vfio_initfn(PCIDevice *pdev)
         return -errno;
     }
 
+    vdev->vbasedev.ops = &vfio_pci_ops;
+
+    vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
+    g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
+            vdev->host.domain, vdev->host.bus, vdev->host.slot,
+            vdev->host.function);
+
     strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
 
     len = readlink(path, iommu_group_path, sizeof(path));
@@ -4216,12 +4272,8 @@ static int vfio_initfn(PCIDevice *pdev)
             vdev->host.domain, vdev->host.bus, vdev->host.slot,
             vdev->host.function);
 
-    QLIST_FOREACH(pvdev, &group->device_list, next) {
-        if (pvdev->host.domain == vdev->host.domain &&
-            pvdev->host.bus == vdev->host.bus &&
-            pvdev->host.slot == vdev->host.slot &&
-            pvdev->host.function == vdev->host.function) {
-
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
             error_report("vfio: error: device %s is already attached", path);
             vfio_put_group(group);
             return -EBUSY;
@@ -4236,7 +4288,7 @@ static int vfio_initfn(PCIDevice *pdev)
     }
 
     /* Get a copy of config space */
-    ret = pread(vdev->fd, vdev->pdev.config,
+    ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
                 vdev->config_offset);
     if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
@@ -4323,7 +4375,7 @@ out_put:
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
-    VFIOGroup *group = vdev->group;
+    VFIOGroup *group = vdev->vbasedev.group;
 
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
@@ -4349,8 +4401,9 @@ static void vfio_pci_reset(DeviceState *dev)
 
     vfio_pci_pre_reset(vdev);
 
-    if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works &&
+        (vdev->has_flr || !vdev->has_pm_reset) &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
                                   vdev->host.slot, vdev->host.function);
         goto post_reset;
@@ -4362,8 +4415,8 @@ static void vfio_pci_reset(DeviceState *dev)
     }
 
     /* If nothing else works and the device supports PM reset, use it */
-    if (vdev->reset_works && vdev->has_pm_reset &&
-        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
+    if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
+        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
         trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
                                 vdev->host.slot, vdev->host.function);
         goto post_reset;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 04/16] hw/vfio/pci: Introduce VFIORegion
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (2 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.

vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.

vfio_mmap_bar becomes vfio_map_region

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v4->v5:
- remove fd field from VFIORegion
- change error_report format string in vfio_region_write/read
- remove #ifdef DEBUG_VFIO in the same function
- correct missing initialization of bar region's vbasedev field
- change Object * parameter name of vfio_mmap_region and remove
  useless OBJECT()

Conflicts:
	hw/vfio/pci.c
---
 hw/vfio/pci.c | 193 ++++++++++++++++++++++++++++++----------------------------
 trace-events  |   4 +-
 2 files changed, 103 insertions(+), 94 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0531744..186dfd0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -78,15 +78,19 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIOBAR {
-    off_t fd_offset; /* offset of BAR within device fd */
-    int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
     MemoryRegion mem; /* slow, read/write access */
     MemoryRegion mmap_mem; /* direct mapped access */
     void *mmap;
     size_t size;
     uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the BAR number for debug */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOBAR {
+    VFIORegion region;
     bool ioport;
     bool mem64;
     QLIST_HEAD(, VFIOQuirk) quirks;
@@ -206,6 +210,7 @@ typedef struct VFIODevice {
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -389,8 +394,10 @@ static void vfio_intx_interrupt(void *opaque)
     }
 }
 
-static void vfio_eoi(VFIOPCIDevice *vdev)
+static void vfio_eoi(VFIODevice *vbasedev)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
     if (!vdev->intx.pending) {
         return;
     }
@@ -400,7 +407,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
-    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
+    vfio_unmask_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -553,7 +560,7 @@ static void vfio_update_irq(PCIDevice *pdev)
     vfio_enable_intx_kvm(vdev);
 
     /* Re-enable the interrupt in cased we missed an EOI */
-    vfio_eoi(vdev);
+    vfio_eoi(&vdev->vbasedev);
 }
 
 static int vfio_enable_intx(VFIOPCIDevice *vdev)
@@ -1090,10 +1097,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
-static void vfio_bar_write(void *opaque, hwaddr addr,
-                           uint64_t data, unsigned size)
+static void vfio_region_write(void *opaque, hwaddr addr,
+                              uint64_t data, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1116,20 +1124,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
         break;
     }
 
-    if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
-                     __func__, addr, data, size);
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        trace_vfio_bar_write(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             region->nr, addr, data, size);
-    }
-#endif
+    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
 
     /*
      * A read or write to a BAR always signals an INTx EOI.  This will
@@ -1139,13 +1141,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
      * which access will service the interrupt, so we're potentially
      * getting quite a few host interrupts per guest interrupt.
      */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
 }
 
-static uint64_t vfio_bar_read(void *opaque,
-                              hwaddr addr, unsigned size)
+static uint64_t vfio_region_read(void *opaque,
+                                 hwaddr addr, unsigned size)
 {
-    VFIOBAR *bar = opaque;
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
     union {
         uint8_t byte;
         uint16_t word;
@@ -1154,9 +1157,10 @@ static uint64_t vfio_bar_read(void *opaque,
     } buf;
     uint64_t data = 0;
 
-    if (pread(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-        error_report("%s(,0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, addr, size);
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
         return (uint64_t)-1;
     }
 
@@ -1175,25 +1179,17 @@ static uint64_t vfio_bar_read(void *opaque,
         break;
     }
 
-#ifdef DEBUG_VFIO
-    {
-        VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-        trace_vfio_bar_read(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            region->nr, addr, size, data);
-    }
-#endif
+    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
 
     /* Same as write above */
-    vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+    vbasedev->ops->vfio_eoi(vbasedev);
 
     return data;
 }
 
-static const MemoryRegionOps vfio_bar_ops = {
-    .read = vfio_bar_read,
-    .write = vfio_bar_write,
+static const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
@@ -1530,8 +1526,8 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                              quirk->data.bar,
                                              addr, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar],
-                             addr + quirk->data.base_offset, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + quirk->data.base_offset, size);
     }
 
     return data;
@@ -1585,7 +1581,7 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
         return;
     }
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar],
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
                    addr + quirk->data.base_offset, data, size);
 }
 
@@ -1622,7 +1618,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
                                       quirk->data.bar,
                                       addr + base, size, data);
     } else {
-        data = vfio_bar_read(&vdev->bars[quirk->data.bar], addr + base, size);
+        data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                                addr + base, size);
     }
 
     return data;
@@ -1654,7 +1651,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
                                        quirk->data.bar,
                                        addr + base, data, size);
     } else {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -1707,7 +1705,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
      * As long as the BAR is >= 256 bytes it will be aligned such that the
      * lower byte is always zero.  Filter out anything else, if it exists.
      */
-    if (!vdev->bars[4].ioport || vdev->bars[4].size < 256) {
+    if (!vdev->bars[4].ioport || vdev->bars[4].region.size < 256) {
         return;
     }
 
@@ -1759,7 +1757,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_generic_window_quirk, quirk,
                           "vfio-ati-bar4-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.base_offset, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1838,7 +1836,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
                         vdev->host.domain, vdev->host.bus,
                         vdev->host.slot, vdev->host.function);
 
-    return vfio_bar_read(&vdev->bars[quirk->data.bar], addr + 0x70, size);
+    return vfio_region_read(&vdev->bars[quirk->data.bar].region,
+                            addr + 0x70, size);
 }
 
 static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
@@ -1880,7 +1879,8 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
             vdev->host.domain, vdev->host.bus,
             vdev->host.slot, vdev->host.function);
 
-    vfio_bar_write(&vdev->bars[quirk->data.bar], addr + 0x70, data, size);
+    vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                      addr + 0x70, data, size);
 }
 
 static const MemoryRegionOps vfio_rtl8168_window_quirk = {
@@ -1910,7 +1910,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_rtl8168_window_quirk,
                           quirk, "vfio-rtl8168-window-quirk", 8);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                                         0x70, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1944,7 +1944,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-ati-bar2-4000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2064,7 +2064,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
     VFIOQuirk *quirk;
 
     if (pci_get_word(pdev->config + PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA ||
-        !vdev->bars[1].size) {
+        !vdev->bars[1].region.size) {
         return;
     }
 
@@ -2173,7 +2173,8 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev),
                           &vfio_nvidia_bar5_window_quirk, quirk,
                           "vfio-nvidia-bar5-window-quirk", 16);
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem, 0, &quirk->mem, 1);
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
+                                        0, &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
@@ -2201,7 +2202,8 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
      */
     if ((pdev->cap_present & QEMU_PCI_CAP_MSI) &&
         vfio_range_contained(addr, size, pdev->msi_cap, PCI_MSI_FLAGS)) {
-        vfio_bar_write(&vdev->bars[quirk->data.bar], addr + base, data, size);
+        vfio_region_write(&vdev->bars[quirk->data.bar].region,
+                          addr + base, data, size);
     }
 }
 
@@ -2244,7 +2246,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2271,7 +2273,8 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     /* Log the chipset ID */
     trace_vfio_probe_nvidia_bar0_1800_quirk_id(
-            (unsigned int)(vfio_bar_read(&vdev->bars[0], 0, 4) >> 20) & 0xff);
+            (unsigned int)(vfio_region_read(&vdev->bars[0].region, 0, 4) >> 20)
+            & 0xff);
 
     quirk = g_malloc0(sizeof(*quirk));
     quirk->vdev = vdev;
@@ -2283,7 +2286,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-nvidia-bar0-1800-quirk",
                           TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
-    memory_region_add_subregion_overlap(&vdev->bars[nr].mem,
+    memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                           quirk->data.address_match & TARGET_PAGE_MASK,
                           &quirk->mem, 1);
 
@@ -2341,7 +2344,7 @@ static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr)
 
     while (!QLIST_EMPTY(&bar->quirks)) {
         VFIOQuirk *quirk = QLIST_FIRST(&bar->quirks);
-        memory_region_del_subregion(&bar->mem, &quirk->mem);
+        memory_region_del_subregion(&bar->region.mem, &quirk->mem);
         object_unparent(OBJECT(&quirk->mem));
         QLIST_REMOVE(quirk, next);
         g_free(quirk);
@@ -2852,9 +2855,9 @@ static int vfio_setup_msix(VFIOPCIDevice *vdev, int pos)
     int ret;
 
     ret = msix_init(&vdev->pdev, vdev->msix->entries,
-                    &vdev->bars[vdev->msix->table_bar].mem,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
                     vdev->msix->table_bar, vdev->msix->table_offset,
-                    &vdev->bars[vdev->msix->pba_bar].mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem,
                     vdev->msix->pba_bar, vdev->msix->pba_offset, pos);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
@@ -2872,8 +2875,9 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
     msi_uninit(&vdev->pdev);
 
     if (vdev->msix) {
-        msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
-                    &vdev->bars[vdev->msix->pba_bar].mem);
+        msix_uninit(&vdev->pdev,
+                    &vdev->bars[vdev->msix->table_bar].region.mem,
+                    &vdev->bars[vdev->msix->pba_bar].region.mem);
     }
 }
 
@@ -2887,11 +2891,11 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled)
     for (i = 0; i < PCI_ROM_SLOT; i++) {
         VFIOBAR *bar = &vdev->bars[i];
 
-        if (!bar->size) {
+        if (!bar->region.size) {
             continue;
         }
 
-        memory_region_set_enabled(&bar->mmap_mem, enabled);
+        memory_region_set_enabled(&bar->region.mmap_mem, enabled);
         if (vdev->msix && vdev->msix->table_bar == i) {
             memory_region_set_enabled(&vdev->msix->mmap_mem, enabled);
         }
@@ -2902,52 +2906,54 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
 
-    if (!bar->size) {
+    if (!bar->region.size) {
         return;
     }
 
     vfio_bar_quirk_teardown(vdev, nr);
 
-    memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
-    munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
+    memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem);
+    munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem));
 
     if (vdev->msix && vdev->msix->table_bar == nr) {
-        memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
+        memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem);
         munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
     }
 }
 
-static int vfio_mmap_bar(VFIOPCIDevice *vdev, VFIOBAR *bar,
-                         MemoryRegion *mem, MemoryRegion *submem,
-                         void **map, size_t size, off_t offset,
-                         const char *name)
+static int vfio_mmap_region(Object *obj, VFIORegion *region,
+                            MemoryRegion *mem, MemoryRegion *submem,
+                            void **map, size_t size, off_t offset,
+                            const char *name)
 {
     int ret = 0;
+    VFIODevice *vbasedev = region->vbasedev;
 
-    if (VFIO_ALLOW_MMAP && size && bar->flags & VFIO_REGION_INFO_FLAG_MMAP) {
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
         int prot = 0;
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_READ) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
             prot |= PROT_READ;
         }
 
-        if (bar->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
             prot |= PROT_WRITE;
         }
 
         *map = mmap(NULL, size, prot, MAP_SHARED,
-                    bar->fd, bar->fd_offset + offset);
+                    vbasedev->fd, region->fd_offset + offset);
         if (*map == MAP_FAILED) {
             *map = NULL;
             ret = -errno;
             goto empty_region;
         }
 
-        memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map);
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
     } else {
 empty_region:
         /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, OBJECT(vdev), name, 0);
+        memory_region_init(submem, obj, name, 0);
     }
 
     memory_region_add_subregion(mem, offset, submem);
@@ -2958,7 +2964,7 @@ empty_region:
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
-    unsigned size = bar->size;
+    unsigned size = bar->region.size;
     char name[64];
     uint32_t pci_bar;
     uint8_t type;
@@ -2988,9 +2994,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
                                     ~PCI_BASE_ADDRESS_MEM_MASK);
 
     /* A "slow" read/write mapping underlies all BARs */
-    memory_region_init_io(&bar->mem, OBJECT(vdev), &vfio_bar_ops,
+    memory_region_init_io(&bar->region.mem, OBJECT(vdev), &vfio_region_ops,
                           bar, name, size);
-    pci_register_bar(&vdev->pdev, nr, type, &bar->mem);
+    pci_register_bar(&vdev->pdev, nr, type, &bar->region.mem);
 
     /*
      * We can't mmap areas overlapping the MSIX vector table, so we
@@ -3001,8 +3007,9 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
     }
 
     strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
-    if (vfio_mmap_bar(vdev, bar, &bar->mem,
-                      &bar->mmap_mem, &bar->mmap, size, 0, name)) {
+    if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                      &bar->region.mmap_mem, &bar->region.mmap,
+                      size, 0, name)) {
         error_report("%s unsupported. Performance may be slow", name);
     }
 
@@ -3012,10 +3019,11 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
         start = HOST_PAGE_ALIGN(vdev->msix->table_offset +
                                 (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE));
 
-        size = start < bar->size ? bar->size - start : 0;
+        size = start < bar->region.size ? bar->region.size - start : 0;
         strncat(name, " msix-hi", sizeof(name) - strlen(name) - 1);
         /* VFIOMSIXInfo contains another MemoryRegion for this mapping */
-        if (vfio_mmap_bar(vdev, bar, &bar->mem, &vdev->msix->mmap_mem,
+        if (vfio_mmap_region(OBJECT(vdev), &bar->region, &bar->region.mem,
+                          &vdev->msix->mmap_mem,
                           &vdev->msix->mmap, size, start, name)) {
             error_report("%s unsupported. Performance may be slow", name);
         }
@@ -3602,6 +3610,7 @@ static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
 static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
+    .vfio_eoi = vfio_eoi,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -4006,11 +4015,11 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
                                      (unsigned long)reg_info.offset,
                                      (unsigned long)reg_info.flags);
 
-        vdev->bars[i].flags = reg_info.flags;
-        vdev->bars[i].size = reg_info.size;
-        vdev->bars[i].fd_offset = reg_info.offset;
-        vdev->bars[i].fd = vdev->vbasedev.fd;
-        vdev->bars[i].nr = i;
+        vdev->bars[i].region.vbasedev = &vdev->vbasedev;
+        vdev->bars[i].region.flags = reg_info.flags;
+        vdev->bars[i].region.size = reg_info.size;
+        vdev->bars[i].region.fd_offset = reg_info.offset;
+        vdev->bars[i].region.nr = i;
         QLIST_INIT(&vdev->bars[i].quirks);
     }
 
diff --git a/trace-events b/trace-events
index edf4be8..b598931 100644
--- a/trace-events
+++ b/trace-events
@@ -1410,8 +1410,8 @@ vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
 vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
 vfio_pci_reset_pm(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x PCI PM Reset"
 
-vfio_bar_write(int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, unsigned size) " (%04x:%02x:%02x.%x:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
-vfio_bar_read(int domain, int bus, int slot, int fn, int index, uint64_t addr, unsigned size, uint64_t data) " (%04x:%02x:%02x.%x:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
+vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
+vfio_region_read(const char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add %"PRIx64" - %"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] %"PRIx64" - %"PRIx64
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 05/16] hw/vfio/pci: split vfio_get_device
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (3 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

vfio_get_device now takes a VFIODevice as argument. The function is split
into 2 parts: vfio_get_device which is generic and vfio_populate_device
which is bus specific.

3 new fields are introduced in VFIODevice to store dev_info.

vfio_put_base_device is created.

---

v5->v6:
- simplifies the split for vfio_get_device:
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts
  are now gathered into a unique specialization function dubbed
  vfio_populate_device

v4->v5:
- cleanup up of error handling and get/put operations in
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts and
  vfio_get_device.
  - correct misuse of errno
  - vfio_populate_regions always returns 0
  - VFIODevice .name deallocation done in vfio_put_device instead of
    vfio_put_base_device
  - vfio_put_base_device done at vfio_get_device level.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 130 +++++++++++++++++++++++++++++++++++-----------------------
 trace-events  |  10 ++---
 2 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 186dfd0..0ee6f7f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -205,12 +205,16 @@ typedef struct VFIODevice {
     bool reset_works;
     bool needs_reset;
     VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
 } VFIODevice;
 
 struct VFIODeviceOps {
     bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
     void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_populate_device)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -297,6 +301,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_put_base_device(VFIODevice *vbasedev);
+static int vfio_populate_device(VFIODevice *vbasedev);
 
 /*
  * Common VFIO interrupt disable
@@ -3611,6 +3617,7 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
     .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
     .vfio_eoi = vfio_eoi,
+    .vfio_populate_device = vfio_populate_device,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -3952,70 +3959,45 @@ static void vfio_put_group(VFIOGroup *group)
     }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIOPCIDevice *vdev)
+static int vfio_populate_device(VFIODevice *vbasedev)
 {
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
     struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-    int ret, i;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-pci "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
-    }
-
-    vdev->vbasedev.fd = ret;
-    vdev->vbasedev.group = group;
-    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
+    int i, ret = -1;
 
     /* Sanity check device */
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
-        goto error;
-    }
-
-    trace_vfio_get_device_irq(name, dev_info.flags,
-                              dev_info.num_regions, dev_info.num_irqs);
-
-    if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
+    if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
         error_report("vfio: Um, this isn't a PCI device");
         goto error;
     }
 
-    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-    if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
+    if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
         error_report("vfio: unexpected number of io regions %u",
-                     dev_info.num_regions);
+                     vbasedev->num_regions);
         goto error;
     }
 
-    if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
-        error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
+    if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
+        error_report("vfio: unexpected number of irqs %u", vbasedev->num_irqs);
         goto error;
     }
 
     for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
         reg_info.index = i;
 
-        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
         if (ret) {
             error_report("vfio: Error getting region %d info: %m", i);
             goto error;
         }
 
-        trace_vfio_get_device_region(name, i,
-                                     (unsigned long)reg_info.size,
-                                     (unsigned long)reg_info.offset,
-                                     (unsigned long)reg_info.flags);
+        trace_vfio_populate_device_region(vbasedev->name, i,
+                                          (unsigned long)reg_info.size,
+                                          (unsigned long)reg_info.offset,
+                                          (unsigned long)reg_info.flags);
 
-        vdev->bars[i].region.vbasedev = &vdev->vbasedev;
+        vdev->bars[i].region.vbasedev = vbasedev;
         vdev->bars[i].region.flags = reg_info.flags;
         vdev->bars[i].region.size = reg_info.size;
         vdev->bars[i].region.fd_offset = reg_info.offset;
@@ -4031,9 +4013,10 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
         goto error;
     }
 
-    trace_vfio_get_device_config(name, (unsigned long)reg_info.size,
-                                 (unsigned long)reg_info.offset,
-                                 (unsigned long)reg_info.flags);
+    trace_vfio_populate_device_config(vdev->vbasedev.name,
+                                      (unsigned long)reg_info.size,
+                                      (unsigned long)reg_info.offset,
+                                      (unsigned long)reg_info.flags);
 
     vdev->config_size = reg_info.size;
     if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
@@ -4042,7 +4025,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     vdev->config_offset = reg_info.offset;
 
     if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
-        dev_info.num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+        vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
         struct vfio_region_info vga_info = {
             .argsz = sizeof(vga_info),
             .index = VFIO_PCI_VGA_REGION_INDEX,
@@ -4086,7 +4069,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
-        trace_vfio_get_device_get_irq_info_failure();
+        trace_vfio_populate_device_get_irq_info_failure();
         ret = 0;
     } else if (irq_info.count == 1) {
         vdev->pci_aer = true;
@@ -4098,25 +4081,68 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
     }
 
 error:
+    return ret;
+}
+
+static int vfio_get_device(VFIOGroup *group, const char *name,
+                           VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    int ret;
+
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    trace_vfio_get_device(name, dev_info.flags,
+                          dev_info.num_regions, dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    ret = vbasedev->ops->vfio_populate_device(vbasedev);
+
+error:
     if (ret) {
-        QLIST_REMOVE(&vdev->vbasedev, next);
-        vdev->vbasedev.group = NULL;
-        close(vdev->vbasedev.fd);
+        vfio_put_base_device(vbasedev);
     }
     return ret;
 }
 
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    trace_vfio_put_base_device(vbasedev->fd);
+    close(vbasedev->fd);
+}
+
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
-    QLIST_REMOVE(&vdev->vbasedev, next);
-    vdev->vbasedev.group = NULL;
-    trace_vfio_put_device(vdev->vbasedev.fd);
-    close(vdev->vbasedev.fd);
     g_free(vdev->vbasedev.name);
     if (vdev->msix) {
         g_free(vdev->msix);
         vdev->msix = NULL;
     }
+    vfio_put_base_device(&vdev->vbasedev);
 }
 
 static void vfio_err_notifier_handler(void *opaque)
@@ -4289,7 +4315,7 @@ static int vfio_initfn(PCIDevice *pdev)
         }
     }
 
-    ret = vfio_get_device(group, path, vdev);
+    ret = vfio_get_device(group, path, &vdev->vbasedev);
     if (ret) {
         error_report("vfio: failed to get device %s", path);
         vfio_put_group(group);
diff --git a/trace-events b/trace-events
index b598931..0634227 100644
--- a/trace-events
+++ b/trace-events
@@ -1401,10 +1401,10 @@ vfio_pci_hot_reset(int domain, int bus, int slot, int fn, const char *type) " (%
 vfio_pci_hot_reset_has_dep_devices(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x: hot reset dependent devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
 vfio_pci_hot_reset_result(int domain, int bus, int slot, int fn, const char *result) "%04x:%02x:%02x.%x hot reset: %s"
-vfio_get_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_get_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_get_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
-vfio_get_device_irq(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
+vfio_populate_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
+vfio_get_device(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
 vfio_initfn(int domain, int bus, int slot, int fn, int group_id) " (%04x:%02x:%02x.%x) group %d"
 vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
 vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
@@ -1420,7 +1420,7 @@ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del %"PRIx64" - %"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
-vfio_put_device(int fd) "close vdev->fd=%d"
+vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 06/16] hw/vfio/pci: rename group_list into vfio_group_list
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (4 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 07/16] hw/vfio/pci: use name field in format strings Eric Auger
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

better fit in the rest of the namespace

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/pci.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0ee6f7f..2216bd4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -283,7 +283,7 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 #define MSIX_CAP_LENGTH 12
 
 static QLIST_HEAD(, VFIOGroup)
-    group_list = QLIST_HEAD_INITIALIZER(group_list);
+    vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list);
 
 #ifdef CONFIG_KVM
 /*
@@ -3454,7 +3454,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
             continue;
         }
 
-        QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(group, &vfio_group_list, next) {
             if (group->groupid == devices[i].group_id) {
                 break;
             }
@@ -3501,7 +3501,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 
     /* Determine how many group fds need to be passed */
     count = 0;
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         for (i = 0; i < info->count; i++) {
             if (group->groupid == devices[i].group_id) {
                 count++;
@@ -3515,7 +3515,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     fds = &reset->group_fds[0];
 
     /* Fill in group fds */
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         for (i = 0; i < info->count; i++) {
             if (group->groupid == devices[i].group_id) {
                 fds[reset->count++] = group->fd;
@@ -3550,7 +3550,7 @@ out:
             continue;
         }
 
-        QLIST_FOREACH(group, &group_list, next) {
+        QLIST_FOREACH(group, &vfio_group_list, next) {
             if (group->groupid == devices[i].group_id) {
                 break;
             }
@@ -3625,13 +3625,13 @@ static void vfio_reset_handler(void *opaque)
     VFIOGroup *group;
     VFIODevice *vbasedev;
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             vbasedev->ops->vfio_compute_needs_reset(vbasedev);
         }
     }
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             if (vbasedev->needs_reset) {
                 vbasedev->ops->vfio_hot_reset_multi(vbasedev);
@@ -3880,7 +3880,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
     char path[32];
     struct vfio_group_status status = { .argsz = sizeof(status) };
 
-    QLIST_FOREACH(group, &group_list, next) {
+    QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
             if (group->container->space->as == as) {
@@ -3922,11 +3922,11 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
         goto close_fd_exit;
     }
 
-    if (QLIST_EMPTY(&group_list)) {
+    if (QLIST_EMPTY(&vfio_group_list)) {
         qemu_register_reset(vfio_reset_handler, NULL);
     }
 
-    QLIST_INSERT_HEAD(&group_list, group, next);
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
     vfio_kvm_device_add_group(group);
 
@@ -3954,7 +3954,7 @@ static void vfio_put_group(VFIOGroup *group)
     close(group->fd);
     g_free(group);
 
-    if (QLIST_EMPTY(&group_list)) {
+    if (QLIST_EMPTY(&vfio_group_list)) {
         qemu_unregister_reset(vfio_reset_handler, NULL);
     }
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 07/16] hw/vfio/pci: use name field in format strings
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (5 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 08/16] hw/vfio: create common module Eric Auger
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Signed-off-by: Eric Auger <eric.auger@linaro.org>

Conflicts:
	trace-events
---
 hw/vfio/pci.c | 213 ++++++++++++++++------------------------------------------
 trace-events  | 109 ++++++++++++++++--------------
 2 files changed, 116 insertions(+), 206 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2216bd4..6584425 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -387,9 +387,7 @@ static void vfio_intx_interrupt(void *opaque)
         return;
     }
 
-    trace_vfio_intx_interrupt(vdev->host.domain, vdev->host.bus,
-                              vdev->host.slot, vdev->host.function,
-                              'A' + vdev->intx.pin);
+    trace_vfio_intx_interrupt(vdev->vbasedev.name, 'A' + vdev->intx.pin);
 
     vdev->intx.pending = true;
     pci_irq_assert(&vdev->pdev);
@@ -408,8 +406,7 @@ static void vfio_eoi(VFIODevice *vbasedev)
         return;
     }
 
-    trace_vfio_eoi(vdev->host.domain, vdev->host.bus,
-                   vdev->host.slot, vdev->host.function);
+    trace_vfio_eoi(vbasedev->name);
 
     vdev->intx.pending = false;
     pci_irq_deassert(&vdev->pdev);
@@ -478,8 +475,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
     vdev->intx.kvm_accel = true;
 
-    trace_vfio_enable_intx_kvm(vdev->host.domain, vdev->host.bus,
-                               vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_intx_kvm(vdev->vbasedev.name);
 
     return;
 
@@ -531,8 +527,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
     /* If we've missed an event, let it re-fire through QEMU */
     vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
-    trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_intx_kvm(vdev->vbasedev.name);
 #endif
 }
 
@@ -551,8 +546,7 @@ static void vfio_update_irq(PCIDevice *pdev)
         return; /* Nothing changed */
     }
 
-    trace_vfio_update_irq(vdev->host.domain, vdev->host.bus,
-                          vdev->host.slot, vdev->host.function,
+    trace_vfio_update_irq(vdev->vbasedev.name,
                           vdev->intx.route.irq, route.irq);
 
     vfio_disable_intx_kvm(vdev);
@@ -628,8 +622,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
 
     vdev->interrupt = VFIO_INT_INTx;
 
-    trace_vfio_enable_intx(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_intx(vdev->vbasedev.name);
 
     return 0;
 }
@@ -651,8 +644,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
     vdev->interrupt = VFIO_INT_NONE;
 
-    trace_vfio_disable_intx(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_intx(vdev->vbasedev.name);
 }
 
 /*
@@ -679,9 +671,7 @@ static void vfio_msi_interrupt(void *opaque)
         abort();
     }
 
-    trace_vfio_msi_interrupt(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             nr, msg.address, msg.data);
+    trace_vfio_msi_interrupt(vbasedev->name, nr, msg.address, msg.data);
 #endif
 
     if (vdev->interrupt == VFIO_INT_MSIX) {
@@ -788,9 +778,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     VFIOMSIVector *vector;
     int ret;
 
-    trace_vfio_msix_vector_do_use(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function,
-                                  nr);
+    trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
 
     vector = &vdev->msi_vectors[nr];
 
@@ -876,9 +864,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
-    trace_vfio_msix_vector_release(vdev->host.domain, vdev->host.bus,
-                                   vdev->host.slot, vdev->host.function,
-                                   nr);
+    trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
 
     /*
      * There are still old guests that mask and unmask vectors on every
@@ -941,8 +927,7 @@ static void vfio_enable_msix(VFIOPCIDevice *vdev)
         error_report("vfio: msix_set_vector_notifiers failed");
     }
 
-    trace_vfio_enable_msix(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_enable_msix(vdev->vbasedev.name);
 }
 
 static void vfio_enable_msi(VFIOPCIDevice *vdev)
@@ -1018,9 +1003,7 @@ retry:
         return;
     }
 
-    trace_vfio_enable_msi(vdev->host.domain, vdev->host.bus,
-                          vdev->host.slot, vdev->host.function,
-                          vdev->nr_vectors);
+    trace_vfio_enable_msi(vdev->vbasedev.name, vdev->nr_vectors);
 }
 
 static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
@@ -1070,8 +1053,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
 
     vfio_disable_msi_common(vdev);
 
-    trace_vfio_disable_msix(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_msix(vdev->vbasedev.name);
 }
 
 static void vfio_disable_msi(VFIOPCIDevice *vdev)
@@ -1079,8 +1061,7 @@ static void vfio_disable_msi(VFIOPCIDevice *vdev)
     vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
     vfio_disable_msi_common(vdev);
 
-    trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
-                           vdev->host.slot, vdev->host.function);
+    trace_vfio_disable_msi(vdev->vbasedev.name);
 }
 
 static void vfio_update_msi(VFIOPCIDevice *vdev)
@@ -1214,9 +1195,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
         return;
     }
 
-    trace_vfio_pci_load_rom(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            (unsigned long)reg_info.size,
+    trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info.size,
                             (unsigned long)reg_info.offset,
                             (unsigned long)reg_info.flags);
 
@@ -1226,9 +1205,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     if (!vdev->rom_size) {
         vdev->rom_read_failed = true;
         error_report("vfio-pci: Cannot read device rom at "
-                    "%04x:%02x:%02x.%x",
-                    vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                    vdev->host.function);
+                    "%s", vdev->vbasedev.name);
         error_printf("Device option ROM contents are probably invalid "
                     "(check dmesg).\nSkip option ROM probe with rombar=0, "
                     "or load from file with romfile=\n");
@@ -1290,9 +1267,7 @@ static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
         break;
     }
 
-    trace_vfio_rom_read(vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function,
-                        addr, size, data);
+    trace_vfio_rom_read(vdev->vbasedev.name, addr, size, data);
 
     return data;
 }
@@ -1389,9 +1364,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
         }
     }
 
-    trace_vfio_pci_size_rom(vdev->host.domain, vdev->host.bus,
-                            vdev->host.slot, vdev->host.function,
-                            size);
+    trace_vfio_pci_size_rom(vdev->vbasedev.name, size);
 
     snprintf(name, sizeof(name), "vfio[%04x:%02x:%02x.%x].rom",
              vdev->host.domain, vdev->host.bus, vdev->host.slot,
@@ -1525,10 +1498,7 @@ static uint64_t vfio_generic_window_quirk_read(void *opaque,
                                     quirk->data.address_val + offset, size);
 
         trace_vfio_generic_window_quirk_read(memory_region_name(&quirk->mem),
-                                             vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function,
+                                             vdev->vbasedev.name,
                                              quirk->data.bar,
                                              addr, size, data);
     } else {
@@ -1576,14 +1546,10 @@ static void vfio_generic_window_quirk_write(void *opaque, hwaddr addr,
 
         vfio_pci_write_config(&vdev->pdev,
                               quirk->data.address_val + offset, data, size);
-
         trace_vfio_generic_window_quirk_write(memory_region_name(&quirk->mem),
-                                             vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function,
-                                             quirk->data.bar,
-                                             addr, data, size);
+                                              vdev->vbasedev.name,
+                                              quirk->data.bar,
+                                              addr, data, size);
         return;
     }
 
@@ -1617,11 +1583,7 @@ static uint64_t vfio_generic_quirk_read(void *opaque,
         data = vfio_pci_read_config(&vdev->pdev, addr - offset, size);
 
         trace_vfio_generic_quirk_read(memory_region_name(&quirk->mem),
-                                      vdev->host.domain,
-                                      vdev->host.bus,
-                                      vdev->host.slot,
-                                      vdev->host.function,
-                                      quirk->data.bar,
+                                      vdev->vbasedev.name, quirk->data.bar,
                                       addr + base, size, data);
     } else {
         data = vfio_region_read(&vdev->bars[quirk->data.bar].region,
@@ -1650,11 +1612,7 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
         vfio_pci_write_config(&vdev->pdev, addr - offset, data, size);
 
         trace_vfio_generic_quirk_write(memory_region_name(&quirk->mem),
-                                       vdev->host.domain,
-                                       vdev->host.bus,
-                                       vdev->host.slot,
-                                       vdev->host.function,
-                                       quirk->data.bar,
+                                       vdev->vbasedev.name, quirk->data.bar,
                                        addr + base, data, size);
     } else {
         vfio_region_write(&vdev->bars[quirk->data.bar].region,
@@ -1726,8 +1684,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
     QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
                       quirk, next);
 
-    trace_vfio_vga_probe_ati_3c3_quirk(vdev->host.domain, vdev->host.bus,
-                                       vdev->host.slot, vdev->host.function);
+    trace_vfio_vga_probe_ati_3c3_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -1768,10 +1725,7 @@ static void vfio_probe_ati_bar4_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_ati_bar4_window_quirk(vdev->host.domain,
-                                           vdev->host.bus,
-                                           vdev->host.slot,
-                                           vdev->host.function);
+    trace_vfio_probe_ati_bar4_window_quirk(vdev->vbasedev.name);
 }
 
 #define PCI_VENDOR_ID_REALTEK 0x10ec
@@ -1810,8 +1764,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
         if (quirk->data.flags) {
             trace_vfio_rtl8168_window_quirk_read_fake(
                     memory_region_name(&quirk->mem),
-                    vdev->host.domain, vdev->host.bus,
-                    vdev->host.slot, vdev->host.function);
+                    vdev->vbasedev.name);
 
             return quirk->data.address_match ^ 0x10000000U;
         }
@@ -1822,9 +1775,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
 
             trace_vfio_rtl8168_window_quirk_read_table(
                     memory_region_name(&quirk->mem),
-                    vdev->host.domain, vdev->host.bus,
-                    vdev->host.slot, vdev->host.function
-               );
+                    vdev->vbasedev.name);
 
             if (!(vdev->pdev.cap_present & QEMU_PCI_CAP_MSIX)) {
                 return 0;
@@ -1837,10 +1788,8 @@ static uint64_t vfio_rtl8168_window_quirk_read(void *opaque,
         }
     }
 
-    trace_vfio_rtl8168_window_quirk_read_direct(
-                        memory_region_name(&quirk->mem),
-                        vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function);
+    trace_vfio_rtl8168_window_quirk_read_direct(memory_region_name(&quirk->mem),
+                                                vdev->vbasedev.name);
 
     return vfio_region_read(&vdev->bars[quirk->data.bar].region,
                             addr + 0x70, size);
@@ -1860,8 +1809,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
 
                 trace_vfio_rtl8168_window_quirk_write_table(
                         memory_region_name(&quirk->mem),
-                        vdev->host.domain, vdev->host.bus,
-                        vdev->host.slot, vdev->host.function);
+                        vdev->vbasedev.name);
 
                 io_mem_write(&vdev->pdev.msix_table_mmio,
                              (hwaddr)(quirk->data.address_match & 0xfff),
@@ -1882,8 +1830,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, hwaddr addr,
 
     trace_vfio_rtl8168_window_quirk_write_direct(
             memory_region_name(&quirk->mem),
-            vdev->host.domain, vdev->host.bus,
-            vdev->host.slot, vdev->host.function);
+            vdev->vbasedev.name);
 
     vfio_region_write(&vdev->bars[quirk->data.bar].region,
                       addr + 0x70, data, size);
@@ -1921,10 +1868,7 @@ static void vfio_probe_rtl8168_bar2_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_rtl8168_bar2_window_quirk(vdev->host.domain,
-                                               vdev->host.bus,
-                                               vdev->host.slot,
-                                               vdev->host.function);
+    trace_vfio_probe_rtl8168_bar2_window_quirk(vdev->vbasedev.name);
 }
 /*
  * Trap the BAR2 MMIO window to config space as well.
@@ -1956,10 +1900,7 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_ati_bar2_4000_quirk(vdev->host.domain,
-                                         vdev->host.bus,
-                                         vdev->host.slot,
-                                         vdev->host.function);
+    trace_vfio_probe_ati_bar2_4000_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2092,10 +2033,7 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
     QLIST_INSERT_HEAD(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].quirks,
                       quirk, next);
 
-    trace_vfio_vga_probe_nvidia_3d0_quirk(vdev->host.domain,
-                                          vdev->host.bus,
-                                          vdev->host.slot,
-                                          vdev->host.function);
+    trace_vfio_vga_probe_nvidia_3d0_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2184,10 +2122,7 @@ static void vfio_probe_nvidia_bar5_window_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar5_window_quirk(vdev->host.domain,
-                                              vdev->host.bus,
-                                              vdev->host.slot,
-                                              vdev->host.function);
+    trace_vfio_probe_nvidia_bar5_window_quirk(vdev->vbasedev.name);
 }
 
 static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
@@ -2258,10 +2193,7 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar0_88000_quirk(vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function);
+    trace_vfio_probe_nvidia_bar0_88000_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2298,10 +2230,7 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 
-    trace_vfio_probe_nvidia_bar0_1800_quirk(vdev->host.domain,
-                                            vdev->host.bus,
-                                            vdev->host.slot,
-                                            vdev->host.function);
+    trace_vfio_probe_nvidia_bar0_1800_quirk(vdev->vbasedev.name);
 }
 
 /*
@@ -2388,9 +2317,7 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 
     val = (emu_val & emu_bits) | (phys_val & ~emu_bits);
 
-    trace_vfio_pci_read_config(vdev->host.domain, vdev->host.bus,
-                               vdev->host.slot, vdev->host.function,
-                               addr, len, val);
+    trace_vfio_pci_read_config(vdev->vbasedev.name, addr, len, val);
 
     return val;
 }
@@ -2401,9 +2328,7 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     uint32_t val_le = cpu_to_le32(val);
 
-    trace_vfio_pci_write_config(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function,
-                                addr, val, len);
+    trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
     if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
@@ -2540,7 +2465,7 @@ static void vfio_iommu_map_notify(Notifier *n, void *data)
                                  &xlat, &len, iotlb->perm & IOMMU_WO);
     if (!memory_region_is_ram(mr)) {
         error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
-                xlat);
+                     xlat);
         return;
     }
     /*
@@ -2785,8 +2710,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
     msi_maskbit = !!(ctrl & PCI_MSI_FLAGS_MASKBIT);
     entries = 1 << ((ctrl & PCI_MSI_FLAGS_QMASK) >> 1);
 
-    trace_vfio_setup_msi(vdev->host.domain, vdev->host.bus,
-                         vdev->host.slot, vdev->host.function, pos);
+    trace_vfio_setup_msi(vdev->vbasedev.name, pos);
 
     ret = msi_init(&vdev->pdev, pos, entries, msi_64bit, msi_maskbit);
     if (ret < 0) {
@@ -2847,9 +2771,8 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
     vdev->msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
     vdev->msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
 
-    trace_vfio_early_setup_msix(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function,
-                                pos, vdev->msix->table_bar,
+    trace_vfio_early_setup_msix(vdev->vbasedev.name, pos,
+                                vdev->msix->table_bar,
                                 vdev->msix->table_offset,
                                 vdev->msix->entries);
 
@@ -3224,8 +3147,7 @@ static void vfio_check_pcie_flr(VFIOPCIDevice *vdev, uint8_t pos)
     uint32_t cap = pci_get_long(vdev->pdev.config + pos + PCI_EXP_DEVCAP);
 
     if (cap & PCI_EXP_DEVCAP_FLR) {
-        trace_vfio_check_pcie_flr(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_check_pcie_flr(vdev->vbasedev.name);
         vdev->has_flr = true;
     }
 }
@@ -3235,8 +3157,7 @@ static void vfio_check_pm_reset(VFIOPCIDevice *vdev, uint8_t pos)
     uint16_t csr = pci_get_word(vdev->pdev.config + pos + PCI_PM_CTRL);
 
     if (!(csr & PCI_PM_CTRL_NO_SOFT_RESET)) {
-        trace_vfio_check_pm_reset(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_check_pm_reset(vdev->vbasedev.name);
         vdev->has_pm_reset = true;
     }
 }
@@ -3246,8 +3167,7 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
     uint8_t cap = pci_get_byte(vdev->pdev.config + pos + PCI_AF_CAP);
 
     if ((cap & PCI_AF_CAP_TP) && (cap & PCI_AF_CAP_FLR)) {
-        trace_vfio_check_af_flr(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+        trace_vfio_check_af_flr(vdev->vbasedev.name);
         vdev->has_flr = true;
     }
 }
@@ -3398,9 +3318,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     int ret, i, count;
     bool multi = false;
 
-    trace_vfio_pci_hot_reset(vdev->host.domain, vdev->host.bus,
-                             vdev->host.slot, vdev->host.function,
-                             single ? "one" : "multi");
+    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
     vfio_pci_pre_reset(vdev);
     vdev->vbasedev.needs_reset = false;
@@ -3431,10 +3349,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         goto out_single;
     }
 
-    trace_vfio_pci_hot_reset_has_dep_devices(vdev->host.domain,
-                                             vdev->host.bus,
-                                             vdev->host.slot,
-                                             vdev->host.function);
+    trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
@@ -3462,10 +3377,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 
         if (!group) {
             if (!vdev->has_pm_reset) {
-                error_report("vfio: Cannot reset device %04x:%02x:%02x.%x, "
+                error_report("vfio: Cannot reset device %s, "
                              "depends on group %d which is not owned.",
-                             vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                             vdev->host.function, devices[i].group_id);
+                             vdev->vbasedev.name, devices[i].group_id);
             }
             ret = -EPERM;
             goto out;
@@ -3480,8 +3394,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
             if (vfio_pci_host_match(&host, &tmp->host)) {
                 if (single) {
                     error_report("vfio: found another in-use device "
-                            "%04x:%02x:%02x.%x\n", host.domain, host.bus,
-                            host.slot, host.function);
+                            "%s\n", vbasedev_iter->name);
                     ret = -EINVAL;
                     goto out_single;
                 }
@@ -3528,10 +3441,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
     g_free(reset);
 
-    trace_vfio_pci_hot_reset_result(vdev->host.domain,
-                                    vdev->host.bus,
-                                    vdev->host.slot,
-                                    vdev->host.function,
+    trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
                                     ret ? "%m" : "Success");
 
 out:
@@ -4074,10 +3984,9 @@ static int vfio_populate_device(VFIODevice *vbasedev)
     } else if (irq_info.count == 1) {
         vdev->pci_aer = true;
     } else {
-        error_report("vfio: %04x:%02x:%02x.%x "
+        error_report("vfio: %s "
                      "Could not enable error recovery for the device",
-                     vdev->host.domain, vdev->host.bus, vdev->host.slot,
-                     vdev->host.function);
+                     vbasedev->name);
     }
 
 error:
@@ -4294,8 +4203,7 @@ static int vfio_initfn(PCIDevice *pdev)
         return -errno;
     }
 
-    trace_vfio_initfn(vdev->host.domain, vdev->host.bus,
-                      vdev->host.slot, vdev->host.function, groupid);
+    trace_vfio_initfn(vdev->vbasedev.name, groupid);
 
     group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev));
     if (!group) {
@@ -4431,16 +4339,14 @@ static void vfio_pci_reset(DeviceState *dev)
     PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 
-    trace_vfio_pci_reset(vdev->host.domain, vdev->host.bus,
-                         vdev->host.slot, vdev->host.function);
+    trace_vfio_pci_reset(vdev->vbasedev.name);
 
     vfio_pci_pre_reset(vdev);
 
     if (vdev->vbasedev.reset_works &&
         (vdev->has_flr || !vdev->has_pm_reset) &&
         !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
-        trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
-                                  vdev->host.slot, vdev->host.function);
+        trace_vfio_pci_reset_flr(vdev->vbasedev.name);
         goto post_reset;
     }
 
@@ -4452,8 +4358,7 @@ static void vfio_pci_reset(DeviceState *dev)
     /* If nothing else works and the device supports PM reset, use it */
     if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
         !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
-        trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
-                                vdev->host.slot, vdev->host.function);
+        trace_vfio_pci_reset_pm(vdev->vbasedev.name);
         goto post_reset;
     }
 
diff --git a/trace-events b/trace-events
index 0634227..4d6f241 100644
--- a/trace-events
+++ b/trace-events
@@ -1350,68 +1350,72 @@ pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsi
 pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
 
 # hw/vfio/vfio-pci.c
-vfio_intx_interrupt(int domain, int bus, int slot, int fn, char line) "(%04x:%02x:%02x.%x) Pin %c"
-vfio_eoi(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x) EOI"
-vfio_enable_intx_kvm(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x) KVM INTx accel enabled"
-vfio_disable_intx_kvm(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x) KVM INTx accel disabled"
-vfio_update_irq(int domain, int bus, int slot, int fn, int new_irq, int target_irq) " (%04x:%02x:%02x.%x) IRQ moved %d -> %d"
-vfio_enable_intx(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x)"
-vfio_disable_intx(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x)"
-vfio_msi_interrupt(int domain, int bus, int slot, int fn, int index, uint64_t addr, int data) "(%04x:%02x:%02x.%x) vector %d 0x%"PRIx64"/0x%x"
-vfio_msix_vector_do_use(int domain, int bus, int slot, int fn, int index) "(%04x:%02x:%02x.%x) vector %d used"
-vfio_msix_vector_release(int domain, int bus, int slot, int fn, int index) "(%04x:%02x:%02x.%x) vector %d released"
-vfio_enable_msix(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x)"
-vfio_enable_msi(int domain, int bus, int slot, int fn, int nr_vectors) "(%04x:%02x:%02x.%x) Enabled %d MSI vectors"
-vfio_disable_msix(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x)"
-vfio_disable_msi(int domain, int bus, int slot, int fn) "(%04x:%02x:%02x.%x)"
-vfio_pci_load_rom(int domain, int bus, int slot, int fn, unsigned long size, unsigned long offset, unsigned long flags) "Device %04x:%02x:%02x.%x ROM:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
-vfio_rom_read(int domain, int bus, int slot, int fn, uint64_t addr, int size, uint64_t data) "(%04x:%02x:%02x.%x, 0x%"PRIx64", 0x%x) = 0x%"PRIx64
-vfio_pci_size_rom(int domain, int bus, int slot, int fn, int size) "%04x:%02x:%02x.%x ROM size 0x%x"
-vfio_vga_write(uint64_t addr, uint64_t data, int size) "(0x%"PRIx64", 0x%"PRIx64", %d)"
-vfio_vga_read(uint64_t addr, int size, uint64_t data) "(0x%"PRIx64", %d) = 0x%"PRIx64
-vfio_generic_window_quirk_read(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, int size, uint64_t data) "%s read(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", %d) = 0x%"PRIx64
-vfio_generic_window_quirk_write(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, int size) "%s write(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d)"
-vfio_generic_quirk_read(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, int size, uint64_t data) "%s read(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", %d) = 0x%"PRIx64
-vfio_generic_quirk_write(const char * region_name, int domain, int bus, int slot, int fn, int index, uint64_t addr, uint64_t data, int size) "%s write(%04x:%02x:%02x.%x:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d)"
+vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
+vfio_eoi(const char *name) " (%s) EOI"
+vfio_enable_intx_kvm(const char *name) " (%s) KVM INTx accel enabled"
+vfio_disable_intx_kvm(const char *name) " (%s) KVM INTx accel disabled"
+vfio_update_irq(const char *name, int new_irq, int target_irq) " (%s) IRQ moved %d -> %d"
+vfio_enable_intx(const char *name) " (%s)"
+vfio_disable_intx(const char *name) " (%s)"
+vfio_msi_interrupt(const char *name, int index, uint64_t addr, int data) " (%s) vector %d 0x%"PRIx64"/0x%x"
+vfio_msix_vector_do_use(const char *name, int index) " (%s) vector %d used"
+vfio_msix_vector_release(const char *name, int index) " (%s) vector %d released"
+vfio_enable_msix(const char *name) " (%s)"
+vfio_enable_msi(const char *name, int nr_vectors) " (%s) Enabled %d MSI vectors"
+vfio_disable_msix(const char *name) " (%s)"
+vfio_disable_msi(const char *name) " (%s)"
+vfio_pci_load_rom(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s ROM:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
+vfio_rom_read(const char *name, uint64_t addr, int size, uint64_t data) " (%s, 0x%"PRIx64", 0x%x) = 0x%"PRIx64
+vfio_pci_size_rom(const char *name, int size) "%s ROM size 0x%x"
+vfio_vga_write(uint64_t addr, uint64_t data, int size) " (0x%"PRIx64", 0x%"PRIx64", %d)"
+vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"PRIx64
+# remove ) =
+vfio_generic_window_quirk_read(const char * region_name, const char *name, int index, uint64_t addr, int size, uint64_t data) "%s read(%s:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
+## remove )
+vfio_generic_window_quirk_write(const char * region_name, const char *name, int index, uint64_t addr, uint64_t data, int size) "%s write(%s:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
+# remove ) =
+vfio_generic_quirk_read(const char * region_name, const char *name, int index, uint64_t addr, int size, uint64_t data) "%s read(%s:BAR%d+0x%"PRIx64", %d = 0x%"PRIx64
+# remove )
+vfio_generic_quirk_write(const char * region_name, const char *name, int index, uint64_t addr, uint64_t data, int size) "%s write(%s:BAR%d+0x%"PRIx64", 0x%"PRIx64", %d"
 vfio_ati_3c3_quirk_read(uint64_t data) " (0x3c3, 1) = 0x%"PRIx64
-vfio_vga_probe_ati_3c3_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD quirk 0x3c3 BAR4 for device %04x:%02x:%02x.%x"
-vfio_probe_ati_bar4_window_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD BAR4 window quirk for device %04x:%02x:%02x.%x"
-vfio_rtl8168_window_quirk_read_fake(const char *region_name, int domain, int bus, int slot, int fn) "%s fake read(%04x:%02x:%02x.%d)"
-vfio_rtl8168_window_quirk_read_table(const char *region_name, int domain, int bus, int slot, int fn) "%s MSI-X table read(%04x:%02x:%02x.%d)"
-vfio_rtl8168_window_quirk_read_direct(const char *region_name, int domain, int bus, int slot, int fn) "%s direct read(%04x:%02x:%02x.%d)"
-vfio_rtl8168_window_quirk_write_table(const char *region_name, int domain, int bus, int slot, int fn) "%s MSI-X table write(%04x:%02x:%02x.%d)"
-vfio_rtl8168_window_quirk_write_direct(const char *region_name, int domain, int bus, int slot, int fn) "%s direct write(%04x:%02x:%02x.%d)"
-vfio_probe_rtl8168_bar2_window_quirk(int domain, int bus, int slot, int fn) "Enabled RTL8168 BAR2 window quirk for device %04x:%02x:%02x.%x"
-vfio_probe_ati_bar2_4000_quirk(int domain, int bus, int slot, int fn) "Enabled ATI/AMD BAR2 0x4000 quirk for device %04x:%02x:%02x.%x"
+vfio_vga_probe_ati_3c3_quirk(const char *name) "Enabled ATI/AMD quirk 0x3c3 BAR4for device %s"
+vfio_probe_ati_bar4_window_quirk(const char *name) "Enabled ATI/AMD BAR4 window quirk for device %s"
+#issue with )
+vfio_rtl8168_window_quirk_read_fake(const char *region_name, const char *name) "%s fake read(%s"
+vfio_rtl8168_window_quirk_read_table(const char *region_name, const char *name) "%s MSI-X table read(%s"
+vfio_rtl8168_window_quirk_read_direct(const char *region_name, const char *name) "%s direct read(%s"
+vfio_rtl8168_window_quirk_write_table(const char *region_name, const char *name) "%s MSI-X table write(%s"
+vfio_rtl8168_window_quirk_write_direct(const char *region_name, const char *name) "%s direct write(%s"
+vfio_probe_rtl8168_bar2_window_quirk(const char *name) "Enabled RTL8168 BAR2 window quirk for device %s"
+vfio_probe_ati_bar2_4000_quirk(const char *name) "Enabled ATI/AMD BAR2 0x4000 quirk for device %s"
 vfio_nvidia_3d0_quirk_read(int size, uint64_t data) " (0x3d0, %d) = 0x%"PRIx64
 vfio_nvidia_3d0_quirk_write(uint64_t data, int size) " (0x3d0, 0x%"PRIx64", %d)"
-vfio_vga_probe_nvidia_3d0_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA VGA 0x3d0 quirk for device %04x:%02x:%02x.%x"
-vfio_probe_nvidia_bar5_window_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR5 window quirk for device %04x:%02x:%02x.%x"
-vfio_probe_nvidia_bar0_88000_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR0 0x88000 quirk for device %04x:%02x:%02x.%x"
+vfio_vga_probe_nvidia_3d0_quirk(const char *name) "Enabled NVIDIA VGA 0x3d0 quirk for device %s"
+vfio_probe_nvidia_bar5_window_quirk(const char *name) "Enabled NVIDIA BAR5 window quirk for device %s"
+vfio_probe_nvidia_bar0_88000_quirk(const char *name) "Enabled NVIDIA BAR0 0x88000 quirk for device %s"
 vfio_probe_nvidia_bar0_1800_quirk_id(int id) "Nvidia NV%02x"
-vfio_probe_nvidia_bar0_1800_quirk(int domain, int bus, int slot, int fn) "Enabled NVIDIA BAR0 0x1800 quirk for device %04x:%02x:%02x.%x"
-vfio_pci_read_config(int domain, int bus, int slot, int fn, int addr, int len, int val) " (%04x:%02x:%02x.%x, @0x%x, len=0x%x) %x"
-vfio_pci_write_config(int domain, int bus, int slot, int fn, int addr, int val, int len) " (%04x:%02x:%02x.%x, @0x%x, 0x%x, len=0x%x)"
-vfio_setup_msi(int domain, int bus, int slot, int fn, int pos) "%04x:%02x:%02x.%x PCI MSI CAP @0x%x"
-vfio_early_setup_msix(int domain, int bus, int slot, int fn, int pos, int table_bar, int offset, int entries) "%04x:%02x:%02x.%x PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
-vfio_check_pcie_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports FLR via PCIe cap"
-vfio_check_pm_reset(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports PM reset"
-vfio_check_af_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x Supports FLR via AF cap"
-vfio_pci_hot_reset(int domain, int bus, int slot, int fn, const char *type) " (%04x:%02x:%02x.%x) %s"
-vfio_pci_hot_reset_has_dep_devices(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x: hot reset dependent devices:"
+vfio_probe_nvidia_bar0_1800_quirk(const char *name) "Enabled NVIDIA BAR0 0x1800 quirk for device %s"
+vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) %x"
+vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
+vfio_setup_msi(const char *name, int pos) "%s PCI MSI CAP @0x%x"
+vfio_early_setup_msix(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
+vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
+vfio_check_pm_reset(const char *name) "%s Supports PM reset"
+vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"
+vfio_pci_hot_reset(const char *name, const char *type) " (%s) %s"
+vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset dependent devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
-vfio_pci_hot_reset_result(int domain, int bus, int slot, int fn, const char *result) "%04x:%02x:%02x.%x hot reset: %s"
+vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
 vfio_populate_device_region(const char *region_name, int index, unsigned long size, unsigned long offset, unsigned long flags) "Device %s region %d:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
-vfio_get_device(const char *name, unsigned flags, unsigned num_regions, unsigned num_irqs) "Device %s flags: %u, regions: %u, irgs: %u"
-vfio_initfn(int domain, int bus, int slot, int fn, int group_id) " (%04x:%02x:%02x.%x) group %d"
-vfio_pci_reset(int domain, int bus, int slot, int fn) " (%04x:%02x:%02x.%x)"
-vfio_pci_reset_flr(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x FLR/VFIO_DEVICE_RESET"
-vfio_pci_reset_pm(int domain, int bus, int slot, int fn) "%04x:%02x:%02x.%x PCI PM Reset"
+vfio_initfn(const char *name, int group_id) " (%s) group %d"
+vfio_pci_reset(const char *name) " (%s)"
+vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
+vfio_pci_reset_pm(const char *name) "%s PCI PM Reset"
 
 vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
-vfio_region_read(const char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
+vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING region_add %"PRIx64" - %"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add [iommu] %"PRIx64" - %"PRIx64
@@ -1420,6 +1424,7 @@ vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING region_del
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del %"PRIx64" - %"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"
 vfio_put_group(int fd) "close group->fd=%d"
+vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 #hw/acpi/memory_hotplug.c
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 08/16] hw/vfio: create common module
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (6 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 07/16] hw/vfio/pci: use name field in format strings Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, eric.auger, will.deacon,
	stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).

This patch only consists in move (no functional changes)

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v6 -> v7:
- integrate Revert "vfio: Make BARs native endian"
- remove VFIO_DEVICE_TYPE_PLATFORM in vfio-common.h,
  will come in next patch

v5 -> v6:
- follow all evolutions of original PCI code from v5 to V6
- move declaration of vfio_region_ops, vfio_memory_listener,
  vfio_group_list, vfio_address_spaces into vfio-common.h

v4 -> v5:
- integrate "sPAPR/IOMMU: Fix TCE entry permission"
- VFIOdevice .name dealloc removed from vfio_put_base_device
- add some includes according to vfio inclusion policy

v3 -> v4:
[Eric Auger]
move done after all PCI modifications to anticipate for
VFIO Platform needs. Purpose is to alleviate the whole
review process.

<= v3
First split done by Kim Phillips

Conflicts:
	hw/vfio/pci.c
---
 hw/vfio/Makefile.objs         |    1 +
 hw/vfio/common.c              |  958 ++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 1028 +----------------------------------------
 include/hw/vfio/vfio-common.h |  151 ++++++
 trace-events                  |    1 +
 5 files changed, 1112 insertions(+), 1027 deletions(-)
 create mode 100644 hw/vfio/common.c
 create mode 100644 include/hw/vfio/vfio-common.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 31c7dab..e31f30e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,3 +1,4 @@
 ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
new file mode 100644
index 0000000..fbd9e7f
--- /dev/null
+++ b/hw/vfio/common.c
@@ -0,0 +1,958 @@
+/*
+ * generic functions used by VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/vfio.h>
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/hw.h"
+#include "qemu/error-report.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+struct vfio_group_head vfio_group_list =
+    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+struct vfio_as_head vfio_address_spaces =
+    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+
+#ifdef CONFIG_KVM
+/*
+ * We have a single VFIO pseudo device per KVM VM.  Once created it lives
+ * for the life of the VM.  Closing the file descriptor only drops our
+ * reference to it and the device's reference to kvm.  Therefore once
+ * initialized, this file descriptor is only released on QEMU exit and
+ * we'll re-use it should another vfio device be attached before then.
+ */
+static int vfio_kvm_device_fd = -1;
+#endif
+
+/*
+ * Common VFIO interrupt disable
+ */
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+        .index = index,
+        .start = 0,
+        .count = 0,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
+{
+    struct vfio_irq_set irq_set = {
+        .argsz = sizeof(irq_set),
+        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+        .index = index,
+        .start = 0,
+        .count = 1,
+    };
+
+    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+/*
+ * IO Port/MMIO - Beware of the endians, VFIO is always little endian
+ */
+void vfio_region_write(void *opaque, hwaddr addr,
+                       uint64_t data, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+
+    switch (size) {
+    case 1:
+        buf.byte = data;
+        break;
+    case 2:
+        buf.word = cpu_to_le16(data);
+        break;
+    case 4:
+        buf.dword = cpu_to_le32(data);
+        break;
+    default:
+        hw_error("vfio: unsupported write size, %d bytes", size);
+        break;
+    }
+
+    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+                     ",%d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, data, size);
+    }
+
+    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
+
+    /*
+     * A read or write to a BAR always signals an INTx EOI.  This will
+     * do nothing if not pending (including not in INTx mode).  We assume
+     * that a BAR access is in response to an interrupt and that BAR
+     * accesses will service the interrupt.  Unfortunately, we don't know
+     * which access will service the interrupt, so we're potentially
+     * getting quite a few host interrupts per guest interrupt.
+     */
+    vbasedev->ops->vfio_eoi(vbasedev);
+}
+
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size)
+{
+    VFIORegion *region = opaque;
+    VFIODevice *vbasedev = region->vbasedev;
+    union {
+        uint8_t byte;
+        uint16_t word;
+        uint32_t dword;
+        uint64_t qword;
+    } buf;
+    uint64_t data = 0;
+
+    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+                     __func__, vbasedev->name, region->nr,
+                     addr, size);
+        return (uint64_t)-1;
+    }
+    switch (size) {
+    case 1:
+        data = buf.byte;
+        break;
+    case 2:
+        data = le16_to_cpu(buf.word);
+        break;
+    case 4:
+        data = le32_to_cpu(buf.dword);
+        break;
+    default:
+        hw_error("vfio: unsupported read size, %d bytes", size);
+        break;
+    }
+
+    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
+
+    /* Same as write above */
+    vbasedev->ops->vfio_eoi(vbasedev);
+
+    return data;
+}
+
+const MemoryRegionOps vfio_region_ops = {
+    .read = vfio_region_read,
+    .write = vfio_region_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_dma_unmap(VFIOContainer *container,
+                          hwaddr iova, ram_addr_t size)
+{
+    struct vfio_iommu_type1_dma_unmap unmap = {
+        .argsz = sizeof(unmap),
+        .flags = 0,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+        error_report("VFIO_UNMAP_DMA: %d\n", -errno);
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
+                        ram_addr_t size, void *vaddr, bool readonly)
+{
+    struct vfio_iommu_type1_dma_map map = {
+        .argsz = sizeof(map),
+        .flags = VFIO_DMA_MAP_FLAG_READ,
+        .vaddr = (__u64)(uintptr_t)vaddr,
+        .iova = iova,
+        .size = size,
+    };
+
+    if (!readonly) {
+        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
+    }
+
+    /*
+     * Try the mapping, if it fails with EBUSY, unmap the region and try
+     * again.  This shouldn't be necessary, but we sometimes see it in
+     * the the VGA ROM space.
+     */
+    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
+        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
+        return 0;
+    }
+
+    error_report("VFIO_MAP_DMA: %d\n", -errno);
+    return -errno;
+}
+
+static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+{
+    return (!memory_region_is_ram(section->mr) &&
+            !memory_region_is_iommu(section->mr)) ||
+           /*
+            * Sizing an enabled 64-bit BAR can cause spurious mappings to
+            * addresses in the upper part of the 64-bit address space.  These
+            * are never accessed by the CPU and beyond the address width of
+            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
+            */
+           section->offset_within_address_space & (1ULL << 63);
+}
+
+static void vfio_iommu_map_notify(Notifier *n, void *data)
+{
+    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+    VFIOContainer *container = giommu->container;
+    IOMMUTLBEntry *iotlb = data;
+    MemoryRegion *mr;
+    hwaddr xlat;
+    hwaddr len = iotlb->addr_mask + 1;
+    void *vaddr;
+    int ret;
+
+    trace_vfio_iommu_map_notify(iotlb->iova,
+                                iotlb->iova + iotlb->addr_mask);
+
+    /*
+     * The IOMMU TLB entry we have just covers translation through
+     * this IOMMU to its immediate target.  We need to translate
+     * it the rest of the way through to memory.
+     */
+    mr = address_space_translate(&address_space_memory,
+                                 iotlb->translated_addr,
+                                 &xlat, &len, iotlb->perm & IOMMU_WO);
+    if (!memory_region_is_ram(mr)) {
+        error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
+                     xlat);
+        return;
+    }
+    /*
+     * Translation truncates length to the IOMMU page size,
+     * check that it did not truncate too much.
+     */
+    if (len & iotlb->addr_mask) {
+        error_report("iommu has granularity incompatible with target AS\n");
+        return;
+    }
+
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        vaddr = memory_region_get_ram_ptr(mr) + xlat;
+        ret = vfio_dma_map(container, iotlb->iova,
+                           iotlb->addr_mask + 1, vaddr,
+                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
+        if (ret) {
+            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, vaddr, ret);
+        }
+    } else {
+        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
+        if (ret) {
+            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                         "0x%"HWADDR_PRIx") = %d (%m)",
+                         container, iotlb->iova,
+                         iotlb->addr_mask + 1, ret);
+        }
+    }
+}
+
+static void vfio_listener_region_add(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    Int128 llend;
+    void *vaddr;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        trace_vfio_listener_region_add_skip(
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    llend = int128_make64(section->offset_within_address_space);
+    llend = int128_add(llend, section->size);
+    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+    if (int128_ge(int128_make64(iova), llend)) {
+        return;
+    }
+
+    memory_region_ref(section->mr);
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        trace_vfio_listener_region_add_iommu(iova,
+                    int128_get64(int128_sub(llend, int128_one())));
+        /*
+         * FIXME: We should do some checking to see if the
+         * capabilities of the host VFIO IOMMU are adequate to model
+         * the guest IOMMU
+         *
+         * FIXME: For VFIO iommu types which have KVM acceleration to
+         * avoid bouncing all map/unmaps through qemu this way, this
+         * would be the right place to wire that up (tell the KVM
+         * device emulation the VFIO iommu handles to use).
+         */
+        /*
+         * This assumes that the guest IOMMU is empty of
+         * mappings at this point.
+         *
+         * One way of doing this is:
+         * 1. Avoid sharing IOMMUs between emulated devices or different
+         * IOMMU groups.
+         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
+         * there are some mappings in IOMMU.
+         *
+         * VFIO on SPAPR does that. Other IOMMU models may do that different,
+         * they must make sure there are no existing mappings or
+         * loop through existing mappings to map them into VFIO.
+         */
+        giommu = g_malloc0(sizeof(*giommu));
+        giommu->iommu = section->mr;
+        giommu->container = container;
+        giommu->n.notify = vfio_iommu_map_notify;
+        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+
+        return;
+    }
+
+    /* Here we assume that memory_region_is_ram(section->mr)==true */
+
+    end = int128_get64(llend);
+    vaddr = memory_region_get_ram_ptr(section->mr) +
+            section->offset_within_region +
+            (iova - section->offset_within_address_space);
+
+    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+
+    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
+    if (ret) {
+        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
+                     container, iova, end - iova, vaddr, ret);
+
+        /*
+         * On the initfn path, store the first error in the container so we
+         * can gracefully fail.  Runtime, there's not much we can do other
+         * than throw a hardware error.
+         */
+        if (!container->iommu_data.type1.initialized) {
+            if (!container->iommu_data.type1.error) {
+                container->iommu_data.type1.error = ret;
+            }
+        } else {
+            hw_error("vfio: DMA mapping failed, unable to continue");
+        }
+    }
+}
+
+static void vfio_listener_region_del(MemoryListener *listener,
+                                     MemoryRegionSection *section)
+{
+    VFIOContainer *container = container_of(listener, VFIOContainer,
+                                            iommu_data.type1.listener);
+    hwaddr iova, end;
+    int ret;
+
+    if (vfio_listener_skipped_section(section)) {
+        trace_vfio_listener_region_del_skip(
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                int128_get64(int128_sub(section->size, int128_one())));
+        return;
+    }
+
+    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
+                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+        error_report("%s received unaligned region", __func__);
+        return;
+    }
+
+    if (memory_region_is_iommu(section->mr)) {
+        VFIOGuestIOMMU *giommu;
+
+        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+            if (giommu->iommu == section->mr) {
+                memory_region_unregister_iommu_notifier(&giommu->n);
+                QLIST_REMOVE(giommu, giommu_next);
+                g_free(giommu);
+                break;
+            }
+        }
+
+        /*
+         * FIXME: We assume the one big unmap below is adequate to
+         * remove any individual page mappings in the IOMMU which
+         * might have been copied into VFIO. This works for a page table
+         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
+         * That may not be true for all IOMMU types.
+         */
+    }
+
+    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    end = (section->offset_within_address_space + int128_get64(section->size)) &
+          TARGET_PAGE_MASK;
+
+    if (iova >= end) {
+        return;
+    }
+
+    trace_vfio_listener_region_del(iova, end - 1);
+
+    ret = vfio_dma_unmap(container, iova, end - iova);
+    memory_region_unref(section->mr);
+    if (ret) {
+        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx") = %d (%m)",
+                     container, iova, end - iova, ret);
+    }
+}
+
+const MemoryListener vfio_memory_listener = {
+    .region_add = vfio_listener_region_add,
+    .region_del = vfio_listener_region_del,
+};
+
+void vfio_listener_release(VFIOContainer *container)
+{
+    memory_listener_unregister(&container->iommu_data.type1.listener);
+}
+
+int vfio_mmap_region(Object *obj, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name)
+{
+    int ret = 0;
+    VFIODevice *vbasedev = region->vbasedev;
+
+    if (VFIO_ALLOW_MMAP && size && region->flags &
+        VFIO_REGION_INFO_FLAG_MMAP) {
+        int prot = 0;
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
+            prot |= PROT_READ;
+        }
+
+        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
+            prot |= PROT_WRITE;
+        }
+
+        *map = mmap(NULL, size, prot, MAP_SHARED,
+                    vbasedev->fd,
+                    region->fd_offset + offset);
+        if (*map == MAP_FAILED) {
+            *map = NULL;
+            ret = -errno;
+            goto empty_region;
+        }
+
+        memory_region_init_ram_ptr(submem, obj, name, size, *map);
+    } else {
+empty_region:
+        /* Create a zero sized sub-region to make cleanup easy. */
+        memory_region_init(submem, obj, name, 0);
+    }
+
+    memory_region_add_subregion(mem, offset, submem);
+
+    return ret;
+}
+
+void vfio_reset_handler(void *opaque)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
+        }
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->needs_reset) {
+                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
+            }
+        }
+    }
+}
+
+static void vfio_kvm_device_add_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_ADD,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (!kvm_enabled()) {
+        return;
+    }
+
+    if (vfio_kvm_device_fd < 0) {
+        struct kvm_create_device cd = {
+            .type = KVM_DEV_TYPE_VFIO,
+        };
+
+        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
+            error_report("KVM_CREATE_DEVICE: %m\n");
+            return;
+        }
+
+        vfio_kvm_device_fd = cd.fd;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to add group %d to KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static void vfio_kvm_device_del_group(VFIOGroup *group)
+{
+#ifdef CONFIG_KVM
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_DEL,
+        .addr = (uint64_t)(unsigned long)&group->fd,
+    };
+
+    if (vfio_kvm_device_fd < 0) {
+        return;
+    }
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("Failed to remove group %d from KVM VFIO device: %m",
+                     group->groupid);
+    }
+#endif
+}
+
+static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
+{
+    VFIOAddressSpace *space;
+
+    QLIST_FOREACH(space, &vfio_address_spaces, list) {
+        if (space->as == as) {
+            return space;
+        }
+    }
+
+    /* No suitable VFIOAddressSpace, create a new one */
+    space = g_malloc0(sizeof(*space));
+    space->as = as;
+    QLIST_INIT(&space->containers);
+
+    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
+
+    return space;
+}
+
+static void vfio_put_address_space(VFIOAddressSpace *space)
+{
+    if (QLIST_EMPTY(&space->containers)) {
+        QLIST_REMOVE(space, list);
+        g_free(space);
+    }
+}
+
+static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
+{
+    VFIOContainer *container;
+    int ret, fd;
+    VFIOAddressSpace *space;
+
+    space = vfio_get_address_space(as);
+
+    QLIST_FOREACH(container, &space->containers, next) {
+        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            group->container = container;
+            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+            return 0;
+        }
+    }
+
+    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
+    if (fd < 0) {
+        error_report("vfio: failed to open /dev/vfio/vfio: %m");
+        ret = -errno;
+        goto put_space_exit;
+    }
+
+    ret = ioctl(fd, VFIO_GET_API_VERSION);
+    if (ret != VFIO_API_VERSION) {
+        error_report("vfio: supported vfio version: %d, "
+                     "reported version: %d", VFIO_API_VERSION, ret);
+        ret = -EINVAL;
+        goto close_fd_exit;
+    }
+
+    container = g_malloc0(sizeof(*container));
+    container->space = space;
+    container->fd = fd;
+    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 &address_space_memory);
+
+        if (container->iommu_data.type1.error) {
+            ret = container->iommu_data.type1.error;
+            error_report("vfio: memory listener initialization failed for container");
+            goto listener_release_exit;
+        }
+
+        container->iommu_data.type1.initialized = true;
+
+    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
+        if (ret) {
+            error_report("vfio: failed to set group container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+        if (ret) {
+            error_report("vfio: failed to set iommu for container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        /*
+         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
+         * when container fd is closed so we do not call it explicitly
+         * in this file.
+         */
+        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+        if (ret) {
+            error_report("vfio: failed to enable container: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+
+        container->iommu_data.type1.listener = vfio_memory_listener;
+        container->iommu_data.release = vfio_listener_release;
+
+        memory_listener_register(&container->iommu_data.type1.listener,
+                                 container->space->as);
+
+    } else {
+        error_report("vfio: No available IOMMU models");
+        ret = -EINVAL;
+        goto free_container_exit;
+    }
+
+    QLIST_INIT(&container->group_list);
+    QLIST_INSERT_HEAD(&space->containers, container, next);
+
+    group->container = container;
+    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
+    return 0;
+listener_release_exit:
+    vfio_listener_release(container);
+
+free_container_exit:
+    g_free(container);
+
+close_fd_exit:
+    close(fd);
+
+put_space_exit:
+    vfio_put_address_space(space);
+
+    return ret;
+}
+
+static void vfio_disconnect_container(VFIOGroup *group)
+{
+    VFIOContainer *container = group->container;
+
+    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
+        error_report("vfio: error disconnecting group %d from container",
+                     group->groupid);
+    }
+
+    QLIST_REMOVE(group, container_next);
+    group->container = NULL;
+
+    if (QLIST_EMPTY(&container->group_list)) {
+        VFIOAddressSpace *space = container->space;
+
+        if (container->iommu_data.release) {
+            container->iommu_data.release(container);
+        }
+        QLIST_REMOVE(container, next);
+        trace_vfio_disconnect_container(container->fd);
+        close(container->fd);
+        g_free(container);
+
+        vfio_put_address_space(space);
+    }
+}
+
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
+{
+    VFIOGroup *group;
+    char path[32];
+    struct vfio_group_status status = { .argsz = sizeof(status) };
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        if (group->groupid == groupid) {
+            /* Found it.  Now is it already in the right context? */
+            if (group->container->space->as == as) {
+                return group;
+            } else {
+                error_report("vfio: group %d used in multiple address spaces",
+                             group->groupid);
+                return NULL;
+            }
+        }
+    }
+
+    group = g_malloc0(sizeof(*group));
+
+    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
+    group->fd = qemu_open(path, O_RDWR);
+    if (group->fd < 0) {
+        error_report("vfio: error opening %s: %m", path);
+        goto free_group_exit;
+    }
+
+    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
+        error_report("vfio: error getting group status: %m");
+        goto close_fd_exit;
+    }
+
+    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+        error_report("vfio: error, group %d is not viable, please ensure "
+                     "all devices within the iommu_group are bound to their "
+                     "vfio bus driver.", groupid);
+        goto close_fd_exit;
+    }
+
+    group->groupid = groupid;
+    QLIST_INIT(&group->device_list);
+
+    if (vfio_connect_container(group, as)) {
+        error_report("vfio: failed to setup container for group %d", groupid);
+        goto close_fd_exit;
+    }
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_register_reset(vfio_reset_handler, NULL);
+    }
+
+    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
+
+    vfio_kvm_device_add_group(group);
+
+    return group;
+
+close_fd_exit:
+    close(group->fd);
+
+free_group_exit:
+    g_free(group);
+
+    return NULL;
+}
+
+void vfio_put_group(VFIOGroup *group)
+{
+    if (!QLIST_EMPTY(&group->device_list)) {
+        return;
+    }
+
+    vfio_kvm_device_del_group(group);
+    vfio_disconnect_container(group);
+    QLIST_REMOVE(group, next);
+    trace_vfio_put_group(group->fd);
+    close(group->fd);
+    g_free(group);
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        qemu_unregister_reset(vfio_reset_handler, NULL);
+    }
+}
+
+int vfio_get_device(VFIOGroup *group, const char *name,
+                       VFIODevice *vbasedev)
+{
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    int ret;
+
+    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+    if (ret < 0) {
+        error_report("vfio: error getting device %s from group %d: %m",
+                     name, group->groupid);
+        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
+                     "or pci-stub and not already in use\n", group->groupid);
+        return ret;
+    }
+
+    vbasedev->fd = ret;
+    vbasedev->group = group;
+    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_report("vfio: error getting device info: %m");
+        goto error;
+    }
+
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+
+    trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions,
+                          dev_info.num_irqs);
+
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+
+    ret = vbasedev->ops->vfio_populate_device(vbasedev);
+
+error:
+    if (ret) {
+        vfio_put_base_device(vbasedev);
+    }
+    return ret;
+}
+
+void vfio_put_base_device(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->group = NULL;
+    trace_vfio_put_base_device(vbasedev->fd);
+    close(vbasedev->fd);
+}
+
+static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
+                                   int req, void *param)
+{
+    VFIOGroup *group;
+    VFIOContainer *container;
+    int ret = -1;
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        error_report("vfio: group %d not registered", groupid);
+        return ret;
+    }
+
+    container = group->container;
+    if (group->container) {
+        ret = ioctl(container->fd, req, param);
+        if (ret < 0) {
+            error_report("vfio: failed to ioctl container: ret=%d, %s",
+                         ret, strerror(errno));
+        }
+    }
+
+    vfio_put_group(group);
+
+    return ret;
+}
+
+int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
+                         int req, void *param)
+{
+    /* We allow only certain ioctls to the container */
+    switch (req) {
+    case VFIO_CHECK_EXTENSION:
+    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+        break;
+    default:
+        /* Return an error on unknown requests */
+        error_report("vfio: unsupported ioctl %X", req);
+        return -1;
+    }
+
+    return vfio_container_do_ioctl(as, groupid, req, param);
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6584425..6565ef2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,17 +41,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "hw/vfio/vfio.h"
-
-/* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
-#define VFIO_ALLOW_KVM_INTX 1
-#define VFIO_ALLOW_KVM_MSI 1
-#define VFIO_ALLOW_KVM_MSIX 1
-
-enum {
-    VFIO_DEVICE_TYPE_PCI = 0,
-    VFIO_DEVICE_TYPE_PLATFORM = 1,
-};
+#include "hw/vfio/vfio-common.h"
 
 struct VFIOPCIDevice;
 
@@ -78,17 +68,6 @@ typedef struct VFIOQuirk {
     } data;
 } VFIOQuirk;
 
-typedef struct VFIORegion {
-    struct VFIODevice *vbasedev;
-    off_t fd_offset; /* offset of region within device fd */
-    MemoryRegion mem; /* slow, read/write access */
-    MemoryRegion mmap_mem; /* direct mapped access */
-    void *mmap;
-    size_t size;
-    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-    uint8_t nr; /* cache the region number for debug */
-} VFIORegion;
-
 typedef struct VFIOBAR {
     VFIORegion region;
     bool ioport;
@@ -144,45 +123,6 @@ enum {
     VFIO_INT_MSIX = 3,
 };
 
-typedef struct VFIOAddressSpace {
-    AddressSpace *as;
-    QLIST_HEAD(, VFIOContainer) containers;
-    QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
-static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
-    QLIST_HEAD_INITIALIZER(vfio_address_spaces);
-
-struct VFIOGroup;
-
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
-typedef struct VFIOContainer {
-    VFIOAddressSpace *space;
-    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
-    } iommu_data;
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
-    QLIST_HEAD(, VFIOGroup) group_list;
-    QLIST_ENTRY(VFIOContainer) next;
-} VFIOContainer;
-
-typedef struct VFIOGuestIOMMU {
-    VFIOContainer *container;
-    MemoryRegion *iommu;
-    Notifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
 /* Cache of MSI-X setup plus extra mmap and memory region for split BAR map */
 typedef struct VFIOMSIXInfo {
     uint8_t table_bar;
@@ -194,29 +134,6 @@ typedef struct VFIOMSIXInfo {
     void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODeviceOps VFIODeviceOps;
-
-typedef struct VFIODevice {
-    QLIST_ENTRY(VFIODevice) next;
-    struct VFIOGroup *group;
-    char *name;
-    int fd;
-    int type;
-    bool reset_works;
-    bool needs_reset;
-    VFIODeviceOps *ops;
-    unsigned int num_irqs;
-    unsigned int num_regions;
-    unsigned int flags;
-} VFIODevice;
-
-struct VFIODeviceOps {
-    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
-    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
-    void (*vfio_eoi)(VFIODevice *vdev);
-    int (*vfio_populate_device)(VFIODevice *vdev);
-};
-
 typedef struct VFIOPCIDevice {
     PCIDevice pdev;
     VFIODevice vbasedev;
@@ -248,15 +165,6 @@ typedef struct VFIOPCIDevice {
     bool rom_read_failed;
 } VFIOPCIDevice;
 
-typedef struct VFIOGroup {
-    int fd;
-    int groupid;
-    VFIOContainer *container;
-    QLIST_HEAD(, VFIODevice) device_list;
-    QLIST_ENTRY(VFIOGroup) next;
-    QLIST_ENTRY(VFIOGroup) container_next;
-} VFIOGroup;
-
 typedef struct VFIORomBlacklistEntry {
     uint16_t vendor_id;
     uint16_t device_id;
@@ -282,76 +190,14 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 
 #define MSIX_CAP_LENGTH 12
 
-static QLIST_HEAD(, VFIOGroup)
-    vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list);
-
-#ifdef CONFIG_KVM
-/*
- * We have a single VFIO pseudo device per KVM VM.  Once created it lives
- * for the life of the VM.  Closing the file descriptor only drops our
- * reference to it and the device's reference to kvm.  Therefore once
- * initialized, this file descriptor is only released on QEMU exit and
- * we'll re-use it should another vfio device be attached before then.
- */
-static int vfio_kvm_device_fd = -1;
-#endif
-
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
                                   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
-static void vfio_put_base_device(VFIODevice *vbasedev);
 static int vfio_populate_device(VFIODevice *vbasedev);
 
 /*
- * Common VFIO interrupt disable
- */
-static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
-        .index = index,
-        .start = 0,
-        .count = 0,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-/*
- * INTx
- */
-static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-
-#ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
-{
-    struct vfio_irq_set irq_set = {
-        .argsz = sizeof(irq_set),
-        .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-        .index = index,
-        .start = 0,
-        .count = 1,
-    };
-
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
-}
-#endif
-
-/*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
  * also be a huge overhead.  We try to get the best of both worlds by
  * waiting until an interrupt to disable mmaps (subsequent transitions
@@ -1081,105 +927,6 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
     }
 }
 
-/*
- * IO Port/MMIO - Beware of the endians, VFIO is always little endian
- */
-static void vfio_region_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-
-    switch (size) {
-    case 1:
-        buf.byte = data;
-        break;
-    case 2:
-        buf.word = cpu_to_le16(data);
-        break;
-    case 4:
-        buf.dword = cpu_to_le32(data);
-        break;
-    default:
-        hw_error("vfio: unsupported write size, %d bytes", size);
-        break;
-    }
-
-    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-                     ",%d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, data, size);
-    }
-
-    trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
-
-    /*
-     * A read or write to a BAR always signals an INTx EOI.  This will
-     * do nothing if not pending (including not in INTx mode).  We assume
-     * that a BAR access is in response to an interrupt and that BAR
-     * accesses will service the interrupt.  Unfortunately, we don't know
-     * which access will service the interrupt, so we're potentially
-     * getting quite a few host interrupts per guest interrupt.
-     */
-    vbasedev->ops->vfio_eoi(vbasedev);
-}
-
-static uint64_t vfio_region_read(void *opaque,
-                                 hwaddr addr, unsigned size)
-{
-    VFIORegion *region = opaque;
-    VFIODevice *vbasedev = region->vbasedev;
-    union {
-        uint8_t byte;
-        uint16_t word;
-        uint32_t dword;
-        uint64_t qword;
-    } buf;
-    uint64_t data = 0;
-
-    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
-                     __func__, vbasedev->name, region->nr,
-                     addr, size);
-        return (uint64_t)-1;
-    }
-
-    switch (size) {
-    case 1:
-        data = buf.byte;
-        break;
-    case 2:
-        data = le16_to_cpu(buf.word);
-        break;
-    case 4:
-        data = le32_to_cpu(buf.dword);
-        break;
-    default:
-        hw_error("vfio: unsupported read size, %d bytes", size);
-        break;
-    }
-
-    trace_vfio_region_read(vbasedev->name, region->nr, addr, size, data);
-
-    /* Same as write above */
-    vbasedev->ops->vfio_eoi(vbasedev);
-
-    return data;
-}
-
-static const MemoryRegionOps vfio_region_ops = {
-    .read = vfio_region_read,
-    .write = vfio_region_write,
-    .endianness = DEVICE_LITTLE_ENDIAN,
-};
-
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     struct vfio_region_info reg_info = {
@@ -2378,305 +2125,6 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
 }
 
 /*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_dma_unmap(VFIOContainer *container,
-                          hwaddr iova, ram_addr_t size)
-{
-    struct vfio_iommu_type1_dma_unmap unmap = {
-        .argsz = sizeof(unmap),
-        .flags = 0,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-        error_report("VFIO_UNMAP_DMA: %d\n", -errno);
-        return -errno;
-    }
-
-    return 0;
-}
-
-static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                        ram_addr_t size, void *vaddr, bool readonly)
-{
-    struct vfio_iommu_type1_dma_map map = {
-        .argsz = sizeof(map),
-        .flags = VFIO_DMA_MAP_FLAG_READ,
-        .vaddr = (__u64)(uintptr_t)vaddr,
-        .iova = iova,
-        .size = size,
-    };
-
-    if (!readonly) {
-        map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
-    }
-
-    /*
-     * Try the mapping, if it fails with EBUSY, unmap the region and try
-     * again.  This shouldn't be necessary, but we sometimes see it in
-     * the the VGA ROM space.
-     */
-    if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
-         ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
-        return 0;
-    }
-
-    error_report("VFIO_MAP_DMA: %d\n", -errno);
-    return -errno;
-}
-
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
-{
-    return (!memory_region_is_ram(section->mr) &&
-            !memory_region_is_iommu(section->mr)) ||
-           /*
-            * Sizing an enabled 64-bit BAR can cause spurious mappings to
-            * addresses in the upper part of the 64-bit address space.  These
-            * are never accessed by the CPU and beyond the address width of
-            * some IOMMU hardware.  TODO: VFIO should tell us the IOMMU width.
-            */
-           section->offset_within_address_space & (1ULL << 63);
-}
-
-static void vfio_iommu_map_notify(Notifier *n, void *data)
-{
-    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainer *container = giommu->container;
-    IOMMUTLBEntry *iotlb = data;
-    MemoryRegion *mr;
-    hwaddr xlat;
-    hwaddr len = iotlb->addr_mask + 1;
-    void *vaddr;
-    int ret;
-
-    trace_vfio_iommu_map_notify(iotlb->iova,
-                                iotlb->iova + iotlb->addr_mask);
-
-    /*
-     * The IOMMU TLB entry we have just covers translation through
-     * this IOMMU to its immediate target.  We need to translate
-     * it the rest of the way through to memory.
-     */
-    mr = address_space_translate(&address_space_memory,
-                                 iotlb->translated_addr,
-                                 &xlat, &len, iotlb->perm & IOMMU_WO);
-    if (!memory_region_is_ram(mr)) {
-        error_report("iommu map to non memory area %"HWADDR_PRIx"\n",
-                     xlat);
-        return;
-    }
-    /*
-     * Translation truncates length to the IOMMU page size,
-     * check that it did not truncate too much.
-     */
-    if (len & iotlb->addr_mask) {
-        error_report("iommu has granularity incompatible with target AS\n");
-        return;
-    }
-
-    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-        vaddr = memory_region_get_ram_ptr(mr) + xlat;
-
-        ret = vfio_dma_map(container, iotlb->iova,
-                           iotlb->addr_mask + 1, vaddr,
-                           !(iotlb->perm & IOMMU_WO) || mr->readonly);
-        if (ret) {
-            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, vaddr, ret);
-        }
-    } else {
-        ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
-        if (ret) {
-            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%m)",
-                         container, iotlb->iova,
-                         iotlb->addr_mask + 1, ret);
-        }
-    }
-}
-
-static void vfio_listener_region_add(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    Int128 llend;
-    void *vaddr;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_add_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    llend = int128_make64(section->offset_within_address_space);
-    llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
-
-    if (int128_ge(int128_make64(iova), llend)) {
-        return;
-    }
-
-    memory_region_ref(section->mr);
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        trace_vfio_listener_region_add_iommu(iova,
-                    int128_get64(int128_sub(llend, int128_one())));
-        /*
-         * FIXME: We should do some checking to see if the
-         * capabilities of the host VFIO IOMMU are adequate to model
-         * the guest IOMMU
-         *
-         * FIXME: For VFIO iommu types which have KVM acceleration to
-         * avoid bouncing all map/unmaps through qemu this way, this
-         * would be the right place to wire that up (tell the KVM
-         * device emulation the VFIO iommu handles to use).
-         */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
-        giommu = g_malloc0(sizeof(*giommu));
-        giommu->iommu = section->mr;
-        giommu->container = container;
-        giommu->n.notify = vfio_iommu_map_notify;
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
-        memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
-
-        return;
-    }
-
-    /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-    end = int128_get64(llend);
-    vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
-
-    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
-
-    ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-    if (ret) {
-        error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx", %p) = %d (%m)",
-                     container, iova, end - iova, vaddr, ret);
-
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
-        }
-    }
-}
-
-static void vfio_listener_region_del(MemoryListener *listener,
-                                     MemoryRegionSection *section)
-{
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
-    hwaddr iova, end;
-    int ret;
-
-    if (vfio_listener_skipped_section(section)) {
-        trace_vfio_listener_region_del_skip(
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                int128_get64(int128_sub(section->size, int128_one())));
-        return;
-    }
-
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
-        error_report("%s received unaligned region", __func__);
-        return;
-    }
-
-    if (memory_region_is_iommu(section->mr)) {
-        VFIOGuestIOMMU *giommu;
-
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-            if (giommu->iommu == section->mr) {
-                memory_region_unregister_iommu_notifier(&giommu->n);
-                QLIST_REMOVE(giommu, giommu_next);
-                g_free(giommu);
-                break;
-            }
-        }
-
-        /*
-         * FIXME: We assume the one big unmap below is adequate to
-         * remove any individual page mappings in the IOMMU which
-         * might have been copied into VFIO. This works for a page table
-         * based IOMMU where a big unmap flattens a large range of IO-PTEs.
-         * That may not be true for all IOMMU types.
-         */
-    }
-
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-    end = (section->offset_within_address_space + int128_get64(section->size)) &
-          TARGET_PAGE_MASK;
-
-    if (iova >= end) {
-        return;
-    }
-
-    trace_vfio_listener_region_del(iova, end - 1);
-
-    ret = vfio_dma_unmap(container, iova, end - iova);
-    memory_region_unref(section->mr);
-    if (ret) {
-        error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx") = %d (%m)",
-                     container, iova, end - iova, ret);
-    }
-}
-
-static MemoryListener vfio_memory_listener = {
-    .region_add = vfio_listener_region_add,
-    .region_del = vfio_listener_region_del,
-};
-
-static void vfio_listener_release(VFIOContainer *container)
-{
-    memory_listener_unregister(&container->iommu_data.type1.listener);
-}
-
-/*
  * Interrupt setup
  */
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
@@ -2850,46 +2298,6 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
     }
 }
 
-static int vfio_mmap_region(Object *obj, VFIORegion *region,
-                            MemoryRegion *mem, MemoryRegion *submem,
-                            void **map, size_t size, off_t offset,
-                            const char *name)
-{
-    int ret = 0;
-    VFIODevice *vbasedev = region->vbasedev;
-
-    if (VFIO_ALLOW_MMAP && size && region->flags &
-        VFIO_REGION_INFO_FLAG_MMAP) {
-        int prot = 0;
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
-            prot |= PROT_READ;
-        }
-
-        if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
-            prot |= PROT_WRITE;
-        }
-
-        *map = mmap(NULL, size, prot, MAP_SHARED,
-                    vbasedev->fd, region->fd_offset + offset);
-        if (*map == MAP_FAILED) {
-            *map = NULL;
-            ret = -errno;
-            goto empty_region;
-        }
-
-        memory_region_init_ram_ptr(submem, obj, name, size, *map);
-    } else {
-empty_region:
-        /* Create a zero sized sub-region to make cleanup easy. */
-        memory_region_init(submem, obj, name, 0);
-    }
-
-    memory_region_add_subregion(mem, offset, submem);
-
-    return ret;
-}
-
 static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
 {
     VFIOBAR *bar = &vdev->bars[nr];
@@ -3530,345 +2938,6 @@ static VFIODeviceOps vfio_pci_ops = {
     .vfio_populate_device = vfio_populate_device,
 };
 
-static void vfio_reset_handler(void *opaque)
-{
-    VFIOGroup *group;
-    VFIODevice *vbasedev;
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
-        }
-    }
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        QLIST_FOREACH(vbasedev, &group->device_list, next) {
-            if (vbasedev->needs_reset) {
-                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
-            }
-        }
-    }
-}
-
-static void vfio_kvm_device_add_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_ADD,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (!kvm_enabled()) {
-        return;
-    }
-
-    if (vfio_kvm_device_fd < 0) {
-        struct kvm_create_device cd = {
-            .type = KVM_DEV_TYPE_VFIO,
-        };
-
-        if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd)) {
-            error_report("KVM_CREATE_DEVICE: %m\n");
-            return;
-        }
-
-        vfio_kvm_device_fd = cd.fd;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to add group %d to KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static void vfio_kvm_device_del_group(VFIOGroup *group)
-{
-#ifdef CONFIG_KVM
-    struct kvm_device_attr attr = {
-        .group = KVM_DEV_VFIO_GROUP,
-        .attr = KVM_DEV_VFIO_GROUP_DEL,
-        .addr = (uint64_t)(unsigned long)&group->fd,
-    };
-
-    if (vfio_kvm_device_fd < 0) {
-        return;
-    }
-
-    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-        error_report("Failed to remove group %d from KVM VFIO device: %m",
-                     group->groupid);
-    }
-#endif
-}
-
-static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
-{
-    VFIOAddressSpace *space;
-
-    QLIST_FOREACH(space, &vfio_address_spaces, list) {
-        if (space->as == as) {
-            return space;
-        }
-    }
-
-    /* No suitable VFIOAddressSpace, create a new one */
-    space = g_malloc0(sizeof(*space));
-    space->as = as;
-    QLIST_INIT(&space->containers);
-
-    QLIST_INSERT_HEAD(&vfio_address_spaces, space, list);
-
-    return space;
-}
-
-static void vfio_put_address_space(VFIOAddressSpace *space)
-{
-    if (QLIST_EMPTY(&space->containers)) {
-        QLIST_REMOVE(space, list);
-        g_free(space);
-    }
-}
-
-static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
-{
-    VFIOContainer *container;
-    int ret, fd;
-    VFIOAddressSpace *space;
-
-    space = vfio_get_address_space(as);
-
-    QLIST_FOREACH(container, &space->containers, next) {
-        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
-            group->container = container;
-            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-            return 0;
-        }
-    }
-
-    fd = qemu_open("/dev/vfio/vfio", O_RDWR);
-    if (fd < 0) {
-        error_report("vfio: failed to open /dev/vfio/vfio: %m");
-        ret = -errno;
-        goto put_space_exit;
-    }
-
-    ret = ioctl(fd, VFIO_GET_API_VERSION);
-    if (ret != VFIO_API_VERSION) {
-        error_report("vfio: supported vfio version: %d, "
-                     "reported version: %d", VFIO_API_VERSION, ret);
-        ret = -EINVAL;
-        goto close_fd_exit;
-    }
-
-    container = g_malloc0(sizeof(*container));
-    container->space = space;
-    container->fd = fd;
-
-    if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 &address_space_memory);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
-    } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
-        ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
-        if (ret) {
-            error_report("vfio: failed to set group container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
-        if (ret) {
-            error_report("vfio: failed to set iommu for container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        /*
-         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
-         * when container fd is closed so we do not call it explicitly
-         * in this file.
-         */
-        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-        if (ret) {
-            error_report("vfio: failed to enable container: %m");
-            ret = -errno;
-            goto free_container_exit;
-        }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-    } else {
-        error_report("vfio: No available IOMMU models");
-        ret = -EINVAL;
-        goto free_container_exit;
-    }
-
-    QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
-
-    group->container = container;
-    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
-    return 0;
-
-listener_release_exit:
-    vfio_listener_release(container);
-
-free_container_exit:
-    g_free(container);
-
-close_fd_exit:
-    close(fd);
-
-put_space_exit:
-    vfio_put_address_space(space);
-
-    return ret;
-}
-
-static void vfio_disconnect_container(VFIOGroup *group)
-{
-    VFIOContainer *container = group->container;
-
-    if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
-        error_report("vfio: error disconnecting group %d from container",
-                     group->groupid);
-    }
-
-    QLIST_REMOVE(group, container_next);
-    group->container = NULL;
-
-    if (QLIST_EMPTY(&container->group_list)) {
-        VFIOAddressSpace *space = container->space;
-
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
-        QLIST_REMOVE(container, next);
-        trace_vfio_disconnect_container(container->fd);
-        close(container->fd);
-        g_free(container);
-
-        vfio_put_address_space(space);
-    }
-}
-
-static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
-{
-    VFIOGroup *group;
-    char path[32];
-    struct vfio_group_status status = { .argsz = sizeof(status) };
-
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        if (group->groupid == groupid) {
-            /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
-                return group;
-            } else {
-                error_report("vfio: group %d used in multiple address spaces",
-                             group->groupid);
-                return NULL;
-            }
-        }
-    }
-
-    group = g_malloc0(sizeof(*group));
-
-    snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
-    group->fd = qemu_open(path, O_RDWR);
-    if (group->fd < 0) {
-        error_report("vfio: error opening %s: %m", path);
-        goto free_group_exit;
-    }
-
-    if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &status)) {
-        error_report("vfio: error getting group status: %m");
-        goto close_fd_exit;
-    }
-
-    if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
-        error_report("vfio: error, group %d is not viable, please ensure "
-                     "all devices within the iommu_group are bound to their "
-                     "vfio bus driver.", groupid);
-        goto close_fd_exit;
-    }
-
-    group->groupid = groupid;
-    QLIST_INIT(&group->device_list);
-
-    if (vfio_connect_container(group, as)) {
-        error_report("vfio: failed to setup container for group %d", groupid);
-        goto close_fd_exit;
-    }
-
-    if (QLIST_EMPTY(&vfio_group_list)) {
-        qemu_register_reset(vfio_reset_handler, NULL);
-    }
-
-    QLIST_INSERT_HEAD(&vfio_group_list, group, next);
-
-    vfio_kvm_device_add_group(group);
-
-    return group;
-
-close_fd_exit:
-    close(group->fd);
-
-free_group_exit:
-    g_free(group);
-
-    return NULL;
-}
-
-static void vfio_put_group(VFIOGroup *group)
-{
-    if (!QLIST_EMPTY(&group->device_list)) {
-        return;
-    }
-
-    vfio_kvm_device_del_group(group);
-    vfio_disconnect_container(group);
-    QLIST_REMOVE(group, next);
-    trace_vfio_put_group(group->fd);
-    close(group->fd);
-    g_free(group);
-
-    if (QLIST_EMPTY(&vfio_group_list)) {
-        qemu_unregister_reset(vfio_reset_handler, NULL);
-    }
-}
-
 static int vfio_populate_device(VFIODevice *vbasedev)
 {
     VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
@@ -3993,57 +3062,6 @@ error:
     return ret;
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-                           VFIODevice *vbasedev)
-{
-    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
-    int ret;
-
-    ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-    if (ret < 0) {
-        error_report("vfio: error getting device %s from group %d: %m",
-                     name, group->groupid);
-        error_printf("Verify all devices in group %d are bound to vfio-<bus> "
-                     "or pci-stub and not already in use\n", group->groupid);
-        return ret;
-    }
-
-    vbasedev->fd = ret;
-    vbasedev->group = group;
-    QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
-
-    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
-    if (ret) {
-        error_report("vfio: error getting device info: %m");
-        goto error;
-    }
-
-    vbasedev->num_irqs = dev_info.num_irqs;
-    vbasedev->num_regions = dev_info.num_regions;
-    vbasedev->flags = dev_info.flags;
-
-    trace_vfio_get_device(name, dev_info.flags,
-                          dev_info.num_regions, dev_info.num_irqs);
-
-    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-    ret = vbasedev->ops->vfio_populate_device(vbasedev);
-
-error:
-    if (ret) {
-        vfio_put_base_device(vbasedev);
-    }
-    return ret;
-}
-
-void vfio_put_base_device(VFIODevice *vbasedev)
-{
-    QLIST_REMOVE(vbasedev, next);
-    vbasedev->group = NULL;
-    trace_vfio_put_base_device(vbasedev->fd);
-    close(vbasedev->fd);
-}
-
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
     g_free(vdev->vbasedev.name);
@@ -4427,47 +3445,3 @@ static void register_vfio_pci_dev_type(void)
 }
 
 type_init(register_vfio_pci_dev_type)
-
-static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
-                                   int req, void *param)
-{
-    VFIOGroup *group;
-    VFIOContainer *container;
-    int ret = -1;
-
-    group = vfio_get_group(groupid, as);
-    if (!group) {
-        error_report("vfio: group %d not registered", groupid);
-        return ret;
-    }
-
-    container = group->container;
-    if (group->container) {
-        ret = ioctl(container->fd, req, param);
-        if (ret < 0) {
-            error_report("vfio: failed to ioctl container: ret=%d, %s",
-                         ret, strerror(errno));
-        }
-    }
-
-    vfio_put_group(group);
-
-    return ret;
-}
-
-int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
-                         int req, void *param)
-{
-    /* We allow only certain ioctls to the container */
-    switch (req) {
-    case VFIO_CHECK_EXTENSION:
-    case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
-        break;
-    default:
-        /* Return an error on unknown requests */
-        error_report("vfio: unsupported ioctl %X", req);
-        return -1;
-    }
-
-    return vfio_container_do_ioctl(as, groupid, req, param);
-}
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
new file mode 100644
index 0000000..e7fc280
--- /dev/null
+++ b/include/hw/vfio/vfio-common.h
@@ -0,0 +1,151 @@
+/*
+ * common header for vfio based device assignment support
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
+ */
+#ifndef HW_VFIO_VFIO_COMMON_H
+#define HW_VFIO_VFIO_COMMON_H
+
+#include "qemu-common.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "qemu/notify.h"
+
+/*#define DEBUG_VFIO*/
+#ifdef DEBUG_VFIO
+#define DPRINTF(fmt, ...) \
+    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+/* Extra debugging, trap acceleration paths for more logging */
+#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_KVM_INTX 1
+#define VFIO_ALLOW_KVM_MSI 1
+#define VFIO_ALLOW_KVM_MSIX 1
+
+enum {
+    VFIO_DEVICE_TYPE_PCI = 0,
+};
+
+typedef struct VFIORegion {
+    struct VFIODevice *vbasedev;
+    off_t fd_offset; /* offset of region within device fd */
+    MemoryRegion mem; /* slow, read/write access */
+    MemoryRegion mmap_mem; /* direct mapped access */
+    void *mmap;
+    size_t size;
+    uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
+    uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOAddressSpace {
+    AddressSpace *as;
+    QLIST_HEAD(, VFIOContainer) containers;
+    QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
+struct VFIOGroup;
+
+typedef struct VFIOType1 {
+    MemoryListener listener;
+    int error;
+    bool initialized;
+} VFIOType1;
+
+typedef struct VFIOContainer {
+    VFIOAddressSpace *space;
+    int fd; /* /dev/vfio/vfio, empowered by the attached groups */
+    struct {
+        /* enable abstraction to support various iommu backends */
+        union {
+            VFIOType1 type1;
+        };
+        void (*release)(struct VFIOContainer *);
+    } iommu_data;
+    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIOGroup) group_list;
+    QLIST_ENTRY(VFIOContainer) next;
+} VFIOContainer;
+
+typedef struct VFIOGuestIOMMU {
+    VFIOContainer *container;
+    MemoryRegion *iommu;
+    Notifier n;
+    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
+typedef struct VFIODeviceOps VFIODeviceOps;
+
+typedef struct VFIODevice {
+    QLIST_ENTRY(VFIODevice) next;
+    struct VFIOGroup *group;
+    char *name;
+    int fd;
+    int type;
+    bool reset_works;
+    bool needs_reset;
+    VFIODeviceOps *ops;
+    unsigned int num_irqs;
+    unsigned int num_regions;
+    unsigned int flags;
+} VFIODevice;
+
+struct VFIODeviceOps {
+    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
+    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+    void (*vfio_eoi)(VFIODevice *vdev);
+    int (*vfio_populate_device)(VFIODevice *vdev);
+};
+
+typedef struct VFIOGroup {
+    int fd;
+    int groupid;
+    VFIOContainer *container;
+    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_ENTRY(VFIOGroup) next;
+    QLIST_ENTRY(VFIOGroup) container_next;
+} VFIOGroup;
+
+void vfio_put_base_device(VFIODevice *vbasedev);
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
+void vfio_unmask_irqindex(VFIODevice *vbasedev, int index);
+void vfio_mask_irqindex(VFIODevice *vbasedev, int index);
+void vfio_region_write(void *opaque, hwaddr addr,
+                           uint64_t data, unsigned size);
+uint64_t vfio_region_read(void *opaque,
+                          hwaddr addr, unsigned size);
+void vfio_listener_release(VFIOContainer *container);
+int vfio_mmap_region(Object *vdev, VFIORegion *region,
+                     MemoryRegion *mem, MemoryRegion *submem,
+                     void **map, size_t size, off_t offset,
+                     const char *name);
+void vfio_reset_handler(void *opaque);
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as);
+void vfio_put_group(VFIOGroup *group);
+int vfio_get_device(VFIOGroup *group, const char *name,
+                    VFIODevice *vbasedev);
+
+extern const MemoryRegionOps vfio_region_ops;
+extern const MemoryListener vfio_memory_listener;
+extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;
+extern QLIST_HEAD(vfio_as_head, VFIOAddressSpace) vfio_address_spaces;
+
+#endif /* !HW_VFIO_VFIO_COMMON_H */
diff --git a/trace-events b/trace-events
index 4d6f241..255971a 100644
--- a/trace-events
+++ b/trace-events
@@ -1414,6 +1414,7 @@ vfio_pci_reset(const char *name) " (%s)"
 vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
 vfio_pci_reset_pm(const char *name) "%s PCI PM Reset"
 
+# hw/vfio/vfio-common.c
 vfio_region_write(const char *name, int index, uint64_t addr, uint64_t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
 vfio_region_read(char *name, int index, uint64_t addr, unsigned size, uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
 vfio_iommu_map_notify(uint64_t iova_start, uint64_t iova_end) "iommu map @ %"PRIx64" - %"PRIx64
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (7 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 08/16] hw/vfio: create common module Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-11-05 10:29   ` Alexander Graf
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device Eric Auger
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, eric.auger, will.deacon,
	stuart.yoder, Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Minimal VFIO platform implementation supporting
- register space user mapping,
- IRQ assignment based on eventfds handled on qemu side.

irqfd kernel acceleration comes in a subsequent patch.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v6 -> v7:
- compat is not exposed anymore as a user option. Rationale is
  the vfio device became abstract and a specialization is needed
  anyway. The derived device must set the compat string.
- in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
  A new function dubbed vfio_register_irq_starter replaces it. It
  registers a machine init done notifier that programs & starts
  all dynamic VFIO device IRQs. This function is supposed to be
  called by the machine file. A set of static helper routines are
  added too. It must be called before the creation of the platform
  bus device.

v5 -> v6:
- vfio_device property renamed into host property
- correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
  and remove PCI related comment
- remove declaration of vfio_setup_irqfd and irqfd_allowed
  property.Both belong to next patch (irqfd)
- remove declaration of vfio_intp_interrupt in vfio-platform.h
- functions that can be static get this characteristic
- remove declarations of vfio_region_ops, vfio_memory_listener,
  group_list, vfio_address_spaces. All are moved to vfio-common.h
- remove vfio_put_device declaration and definition
- print_regions removed. code moved into vfio_populate_regions
- replace DPRINTF by trace events
- new helper routine to set the trigger eventfd
- dissociate intp init from the injection enablement:
  vfio_enable_intp renamed into vfio_init_intp and new function
  named vfio_start_eventfd_injection
- injection start moved to vfio_start_irq_injection (not anymore
  in vfio_populate_interrupt)
- new start_irq_fn field in VFIOPlatformDevice corresponding to
  the function that will be used for starting injection
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs thanks to fake interrupt modality
- VFIOPlatformDeviceClass becomes abstract
- add error_setg in vfio_platform_realize

v4 -> v5:
- vfio-plaform.h included first
- cleanup error handling in *populate*, vfio_get_device,
  vfio_enable_intp
- vfio_put_device not called anymore
- add some includes to follow vfio policy

v3 -> v4:
[Eric Auger]
- merge of "vfio: Add initial IRQ support in platform device"
  to get a full functional patch although perfs are limited.
- removal of unrealize function since I currently understand
  it is only used with device hot-plug feature.

v2 -> v3:
[Eric Auger]
- further factorization between PCI and platform (VFIORegion,
  VFIODevice). same level of functionality.

<= v2:
[Kim Philipps]
- Initial Creation of the device supporting register space mapping
---
 hw/vfio/Makefile.objs           |   1 +
 hw/vfio/platform.c              | 672 ++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-common.h   |   1 +
 include/hw/vfio/vfio-platform.h |  87 ++++++
 trace-events                    |  12 +
 5 files changed, 773 insertions(+)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-platform.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..c5c76fe 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,5 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_SOFTMMU) += platform.o
 endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
new file mode 100644
index 0000000..9f66610
--- /dev/null
+++ b/hw/vfio/platform.c
@@ -0,0 +1,672 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#include <linux/vfio.h>
+#include <sys/ioctl.h>
+
+#include "hw/vfio/vfio-platform.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "exec/memory.h"
+#include "qemu/queue.h"
+#include "hw/sysbus.h"
+#include "trace.h"
+#include "hw/platform-bus.h"
+
+static void vfio_intp_interrupt(VFIOINTp *intp);
+typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
+static int vfio_set_trigger_eventfd(VFIOINTp *intp,
+                                    eventfd_user_side_handler_t handler);
+
+/*
+ * Functions only used when eventfd are handled on user-side
+ * ie. without irqfd
+ */
+
+/**
+ * vfio_platform_eoi - IRQ completion routine
+ * @vbasedev: the VFIO device
+ *
+ * de-asserts the active virtual IRQ and unmask the physical IRQ
+ * (masked by the  VFIO driver). Handle pending IRQs if any.
+ * eoi function is called on the first access to any MMIO region
+ * after an IRQ was triggered. It is assumed this access corresponds
+ * to the IRQ status register reset. With such a mechanism, a single
+ * IRQ can be handled at a time since there is no way to know which
+ * IRQ was completed by the guest (we would need additional details
+ * about the IRQ status register mask)
+ */
+static void vfio_platform_eoi(VFIODevice *vbasedev)
+{
+    VFIOINTp *intp;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->state == VFIO_IRQ_ACTIVE) {
+            trace_vfio_platform_eoi(intp->pin,
+                                event_notifier_get_fd(&intp->interrupt));
+            intp->state = VFIO_IRQ_INACTIVE;
+
+            /* deassert the virtual IRQ and unmask physical one */
+            qemu_set_irq(intp->qemuirq, 0);
+            vfio_unmask_irqindex(vbasedev, intp->pin);
+
+            /* a single IRQ can be active at a time */
+            break;
+        }
+    }
+    /* in case there are pending IRQs, handle them one at a time */
+    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
+        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
+        trace_vfio_platform_eoi_handle_pending(intp->pin);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+        vfio_intp_interrupt(intp);
+        qemu_mutex_lock(&vdev->intp_mutex);
+        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+    } else {
+        qemu_mutex_unlock(&vdev->intp_mutex);
+    }
+}
+
+/**
+ * vfio_mmap_set_enabled - enable/disable the fast path mode
+ * @vdev: the VFIO platform device
+ * @enabled: the target mmap state
+ *
+ * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
+ * false ~ slow path = MMIO region is trapped and region callbacks
+ * are called slow path enables to trap the IRQ status register
+ * guest reset
+*/
+
+static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
+{
+    VFIORegion *region;
+    int i;
+
+    trace_vfio_platform_mmap_set_enabled(enabled);
+
+    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
+        region = vdev->regions[i];
+
+        /* register space is unmapped to trap EOI */
+        memory_region_set_enabled(&region->mmap_mem, enabled);
+    }
+}
+
+/**
+ * vfio_intp_mmap_enable - timer function, restores the fast path
+ * if there is no more active IRQ
+ * @opaque: actually points to the VFIO platform device
+ *
+ * Called on mmap timer timout, this function checks whether the
+ * IRQ is still active and in the negative restores the fast path.
+ * by construction a single eventfd is handled at a time.
+ * if the IRQ is still active, the timer is restarted.
+ */
+static void vfio_intp_mmap_enable(void *opaque)
+{
+    VFIOINTp *tmp;
+    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+        if (tmp->state == VFIO_IRQ_ACTIVE) {
+            trace_vfio_platform_intp_mmap_enable(tmp->pin);
+            /* re-program the timer to check active status later */
+            timer_mod(vdev->mmap_timer,
+                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                          vdev->mmap_timeout);
+            qemu_mutex_unlock(&vdev->intp_mutex);
+            return;
+        }
+    }
+    vfio_mmap_set_enabled(vdev, true);
+    qemu_mutex_unlock(&vdev->intp_mutex);
+}
+
+/**
+ * vfio_intp_interrupt - The user-side eventfd handler
+ * @opaque: opaque pointer which in practice is the VFIOINTp*
+ *
+ * the function can be entered
+ * - in event handler context: this IRQ is inactive
+ *   in that case, the vIRQ is injected into the guest if there
+ *   is no other active or pending IRQ.
+ * - in IOhandler context: this IRQ is pending.
+ *   there is no ACTIVE IRQ
+ */
+static void vfio_intp_interrupt(VFIOINTp *intp)
+{
+    int ret;
+    VFIOINTp *tmp;
+    VFIOPlatformDevice *vdev = intp->vdev;
+    bool delay_handling = false;
+
+    qemu_mutex_lock(&vdev->intp_mutex);
+    if (intp->state == VFIO_IRQ_INACTIVE) {
+        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
+            if (tmp->state == VFIO_IRQ_ACTIVE ||
+                tmp->state == VFIO_IRQ_PENDING) {
+                delay_handling = true;
+                break;
+            }
+        }
+    }
+    if (delay_handling) {
+        /*
+         * the new IRQ gets a pending status and is pushed in
+         * the pending queue
+         */
+        intp->state = VFIO_IRQ_PENDING;
+        trace_vfio_intp_interrupt_set_pending(intp->pin);
+        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
+                             intp, pqnext);
+        ret = event_notifier_test_and_clear(&intp->interrupt);
+        qemu_mutex_unlock(&vdev->intp_mutex);
+        return;
+    }
+
+    /* no active IRQ, the new IRQ can be forwarded to the guest */
+    trace_vfio_platform_intp_interrupt(intp->pin,
+                              event_notifier_get_fd(&intp->interrupt));
+
+    if (intp->state == VFIO_IRQ_INACTIVE) {
+        ret = event_notifier_test_and_clear(&intp->interrupt);
+        if (!ret) {
+            error_report("Error when clearing fd=%d (ret = %d)\n",
+                         event_notifier_get_fd(&intp->interrupt), ret);
+        }
+    } /* else this is a pending IRQ that moves to ACTIVE state */
+
+    intp->state = VFIO_IRQ_ACTIVE;
+
+    /* sets slow path */
+    vfio_mmap_set_enabled(vdev, false);
+
+    /* trigger the virtual IRQ */
+    qemu_set_irq(intp->qemuirq, 1);
+
+    /* schedule the mmap timer which will restore mmap path after EOI*/
+    if (vdev->mmap_timeout) {
+        timer_mod(vdev->mmap_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                      vdev->mmap_timeout);
+    }
+    qemu_mutex_unlock(&vdev->intp_mutex);
+}
+
+/**
+ * vfio_start_eventfd_injection - starts the virtual IRQ injection using
+ * user-side handled eventfds
+ * @intp: the IRQ struct pointer
+ */
+
+static int vfio_start_eventfd_injection(VFIOINTp *intp)
+{
+    int ret;
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+
+    vfio_mask_irqindex(vbasedev, intp->pin);
+
+    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
+    if (ret) {
+        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
+        vfio_unmask_irqindex(vbasedev, intp->pin);
+        return ret;
+    }
+    vfio_unmask_irqindex(vbasedev, intp->pin);
+    return 0;
+}
+
+/*
+ * Functions used whatever the injection method
+ */
+
+/**
+ * vfio_set_trigger_eventfd - set VFIO eventfd handling
+ * ie. program the VFIO driver to associates a given IRQ index
+ * with a fd handler
+ *
+ * @intp: IRQ struct pointer
+ * @handler: handler to be called on eventfd trigger
+ */
+static int vfio_set_trigger_eventfd(VFIOINTp *intp,
+                                    eventfd_user_side_handler_t handler)
+{
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+    struct vfio_irq_set *irq_set;
+    int argsz, ret;
+    int32_t *pfd;
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+    *pfd = event_notifier_get_fd(&intp->interrupt);
+    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret < 0) {
+        error_report("vfio: Failed to set trigger eventfd: %m");
+        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+    }
+    return ret;
+}
+
+/* not implemented yet */
+static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
+{
+return false;
+}
+
+/* not implemented yet */
+static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
+{
+return 0;
+}
+
+/**
+ * vfio_init_intp - allocate, initialize the IRQ struct pointer
+ * and add it into the list of IRQ
+ * @vbasedev: the VFIO device
+ * @index: VFIO device IRQ index
+ */
+static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
+{
+    int ret;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
+    VFIOINTp *intp;
+
+    /* allocate and populate a new VFIOINTp structure put in a queue list */
+    intp = g_malloc0(sizeof(*intp));
+    intp->vdev = vdev;
+    intp->pin = index;
+    intp->state = VFIO_IRQ_INACTIVE;
+    sysbus_init_irq(sbdev, &intp->qemuirq);
+
+    /* Get an eventfd for trigger */
+    ret = event_notifier_init(&intp->interrupt, 0);
+    if (ret) {
+        g_free(intp);
+        error_report("vfio: Error: trigger event_notifier_init failed ");
+        return NULL;
+    }
+
+    /* store the new intp in qlist */
+    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
+    return intp;
+}
+
+/**
+ * vfio_populate_device - initialize MMIO region and IRQ
+ * @vbasedev: the VFIO device
+ *
+ * query the VFIO device for exposed MMIO regions and IRQ and
+ * populate the associated fields in the device struct
+ */
+static int vfio_populate_device(VFIODevice *vbasedev)
+{
+    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+    VFIOINTp *intp;
+    int i, ret = 0;
+    VFIOPlatformDevice *vdev =
+        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
+
+    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
+        reg_info.index = i;
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
+        if (ret) {
+            error_report("vfio: Error getting region %d info: %m", i);
+            goto error;
+        }
+        vdev->regions[i]->flags = reg_info.flags;
+        vdev->regions[i]->size = reg_info.size;
+        vdev->regions[i]->fd_offset = reg_info.offset;
+        vdev->regions[i]->nr = i;
+        vdev->regions[i]->vbasedev = vbasedev;
+
+        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
+                            (unsigned long)vdev->regions[i]->flags,
+                            (unsigned long)vdev->regions[i]->size,
+                            vdev->regions[i]->vbasedev->fd,
+                            (unsigned long)vdev->regions[i]->fd_offset);
+    }
+
+    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                    vfio_intp_mmap_enable, vdev);
+
+    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        irq.index = i;
+
+        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+        if (ret) {
+            error_printf("vfio: error getting device %s irq info",
+                         vbasedev->name);
+            return ret;
+        } else {
+            trace_vfio_platform_populate_interrupts(irq.index,
+                                                    irq.count,
+                                                    irq.flags);
+            intp = vfio_init_intp(vbasedev, irq.index);
+            if (!intp) {
+                error_report("vfio: Error installing IRQ %d up", i);
+                return ret;
+            }
+        }
+    }
+    return 0;
+error:
+    return ret;
+}
+
+/*
+ * vfio_start_irq_injection - associates a virtual irq to a
+ * VFIO IRQ index and start the injection of this IRQ
+ * @s: SysBus Device
+ * @index: VFIO IRQ index
+ * @virq: the virtual IRQ number, aka gsi
+ *
+ * this function is called when the device tree is built
+ */
+static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
+{
+    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
+    VFIOINTp *intp;
+
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->pin == index) {
+            intp->virtualID = virq;
+            vdev->start_irq_fn(intp);
+        }
+    }
+}
+
+/* specialized functions ofr VFIO Platform devices */
+static VFIODeviceOps vfio_platform_ops = {
+    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
+    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
+    .vfio_eoi = vfio_platform_eoi,
+    .vfio_populate_device = vfio_populate_device,
+};
+
+/**
+ * vfio_base_device_init - implements some of the VFIO mechanics
+ * @vbasedev: the VFIO device
+ *
+ * retrieves the group the device belongs to and get the device fd
+ * returns the VFIO device fd
+ * precondition: the device name must be initialized
+ */
+static int vfio_base_device_init(VFIODevice *vbasedev)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev_iter;
+    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
+    ssize_t len;
+    struct stat st;
+    int groupid;
+    int ret;
+
+    /* name must be set prior to the call */
+    if (!vbasedev->name) {
+        return -EINVAL;
+    }
+
+    /* Check that the host device exists */
+    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
+             vbasedev->name);
+
+    if (stat(path, &st) < 0) {
+        error_report("vfio: error: no such host device: %s", path);
+        return -errno;
+    }
+
+    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
+    len = readlink(path, iommu_group_path, sizeof(path));
+    if (len <= 0 || len >= sizeof(path)) {
+        error_report("vfio: error no iommu_group for device");
+        return len < 0 ? -errno : ENAMETOOLONG;
+    }
+
+    iommu_group_path[len] = 0;
+    group_name = basename(iommu_group_path);
+
+    if (sscanf(group_name, "%d", &groupid) != 1) {
+        error_report("vfio: error reading %s: %m", path);
+        return -errno;
+    }
+
+    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
+
+    group = vfio_get_group(groupid, &address_space_memory);
+    if (!group) {
+        error_report("vfio: failed to get group %d", groupid);
+        return -ENOENT;
+    }
+
+    snprintf(path, sizeof(path), "%s", vbasedev->name);
+
+    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
+            error_report("vfio: error: device %s is already attached", path);
+            vfio_put_group(group);
+            return -EBUSY;
+        }
+    }
+    ret = vfio_get_device(group, path, vbasedev);
+    if (ret) {
+        error_report("vfio: failed to get device %s", path);
+        vfio_put_group(group);
+    }
+    return ret;
+}
+
+/**
+ * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
+ * given index
+ * @vdev: the VFIO platform device
+ * @nr: the index of the region
+ *
+ * init the top memory region and the mmapped memroy region beneath
+ * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
+ * and could not be passed to memory region functions
+*/
+static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
+{
+    VFIORegion *region = vdev->regions[nr];
+    unsigned size = region->size;
+    char name[64];
+
+    if (!size) {
+        return;
+    }
+
+    snprintf(name, sizeof(name), "VFIO %s region %d",
+             vdev->vbasedev.name, nr);
+
+    /* A "slow" read/write mapping underlies all regions */
+    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
+                          region, name, size);
+
+    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
+
+    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
+                         &region->mmap_mem, &region->mmap, size, 0, name)) {
+        error_report("%s unsupported. Performance may be slow", name);
+    }
+}
+
+/**
+ * vfio_platform_realize  - the device realize function
+ * @dev: device state pointer
+ * @errp: error
+ *
+ * initialize the device, its memory regions and IRQ structures
+ * IRQ are started separately
+ */
+static void vfio_platform_realize(DeviceState *dev, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    int i, ret;
+
+    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
+    vbasedev->ops = &vfio_platform_ops;
+    vdev->start_irq_fn = vfio_start_eventfd_injection;
+
+    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
+
+    ret = vfio_base_device_init(vbasedev);
+    if (ret) {
+        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
+                   vbasedev->name);
+        return;
+    }
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        vfio_map_region(vdev, i);
+        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
+    }
+}
+
+/*
+ * Mechanics to program/start irq injection on machine init done notifier:
+ * this is needed since at finalize time, the device IRQ are not yet
+ * bound to the platform bus IRQ. It is assumed here dynamic instantiation
+ * always is used. Binding to the platform bus IRQ happens on a machine
+ * init done notifier registered by the machine file. After its execution
+ * we execute a new notifier that actually starts the injection. When using
+ * irqfd, programming the injection consists in associating eventfds to
+ * GSI number,ie. virtual IRQ number
+ */
+
+typedef struct VfioIrqStarterNotifierParams {
+    unsigned int platform_bus_first_irq;
+    Notifier notifier;
+} VfioIrqStarterNotifierParams;
+
+typedef struct VfioIrqStartParams {
+    PlatformBusDevice *pbus;
+    int platform_bus_first_irq;
+} VfioIrqStartParams;
+
+/* Start injection of IRQ for a specific VFIO device */
+static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
+{
+    int i;
+    VfioIrqStartParams *p = opaque;
+    VFIOPlatformDevice *vdev;
+    VFIODevice *vbasedev;
+    uint64_t irq_number;
+    PlatformBusDevice *pbus = p->pbus;
+    int platform_bus_first_irq = p->platform_bus_first_irq;
+
+    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
+        vdev = VFIO_PLATFORM_DEVICE(sbdev);
+        vbasedev = &vdev->vbasedev;
+        for (i = 0; i < vbasedev->num_irqs; i++) {
+            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
+                             + platform_bus_first_irq;
+            vfio_start_irq_injection(sbdev, i, irq_number);
+        }
+    }
+    return 0;
+}
+
+/* loop on all VFIO platform devices and start their IRQ injection */
+static void vfio_irq_starter_notify(Notifier *notifier, void *data)
+{
+    VfioIrqStarterNotifierParams *p =
+        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
+    DeviceState *dev =
+        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
+    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
+
+    if (pbus->done_gathering) {
+        VfioIrqStartParams data = {
+            .pbus = pbus,
+            .platform_bus_first_irq = p->platform_bus_first_irq,
+        };
+
+        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
+    }
+}
+
+/* registers the machine init done notifier that will start VFIO IRQ */
+void vfio_register_irq_starter(int platform_bus_first_irq)
+{
+    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
+
+    p->platform_bus_first_irq = platform_bus_first_irq;
+    p->notifier.notify = vfio_irq_starter_notify;
+    qemu_add_machine_init_done_notifier(&p->notifier);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+    .name = TYPE_VFIO_PLATFORM,
+    .unmigratable = 1,
+};
+
+static Property vfio_platform_dev_properties[] = {
+    DEFINE_PROP_STRING("host", VFIOPlatformDevice, vbasedev.name),
+    DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
+                       mmap_timeout, 1100),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vfio_platform_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = vfio_platform_realize;
+    dc->props = vfio_platform_dev_properties;
+    dc->vmsd = &vfio_platform_vmstate;
+    dc->desc = "VFIO-based platform device assignment";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vfio_platform_dev_info = {
+    .name = TYPE_VFIO_PLATFORM,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VFIOPlatformDevice),
+    .class_init = vfio_platform_class_init,
+    .class_size = sizeof(VFIOPlatformDeviceClass),
+    .abstract   = true,
+};
+
+static void register_vfio_platform_dev_type(void)
+{
+    type_register_static(&vfio_platform_dev_info);
+}
+
+type_init(register_vfio_platform_dev_type)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e7fc280..83c7876 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -43,6 +43,7 @@
 
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
+    VFIO_DEVICE_TYPE_PLATFORM = 1,
 };
 
 typedef struct VFIORegion {
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
new file mode 100644
index 0000000..18e6807
--- /dev/null
+++ b/include/hw/vfio/vfio-platform.h
@@ -0,0 +1,87 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips <kim.phillips@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI device assignment support:
+ *  Copyright Red Hat, Inc. 2012
+ */
+
+#ifndef HW_VFIO_VFIO_PLATFORM_H
+#define HW_VFIO_VFIO_PLATFORM_H
+
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio-common.h"
+#include "qemu/event_notifier.h"
+#include "qemu/queue.h"
+#include "hw/irq.h"
+
+#define TYPE_VFIO_PLATFORM "vfio-platform"
+
+enum {
+    VFIO_IRQ_INACTIVE = 0,
+    VFIO_IRQ_PENDING = 1,
+    VFIO_IRQ_ACTIVE = 2,
+    /* VFIO_IRQ_ACTIVE_AND_PENDING cannot happen with VFIO */
+};
+
+typedef struct VFIOINTp {
+    QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
+    QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
+    EventNotifier interrupt; /* eventfd triggered on interrupt */
+    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
+    qemu_irq qemuirq;
+    struct VFIOPlatformDevice *vdev; /* back pointer to device */
+    int state; /* inactive, pending, active */
+    bool kvm_accel; /* set when QEMU bypass through KVM enabled */
+    uint8_t pin; /* index */
+    uint8_t virtualID; /* virtual IRQ */
+} VFIOINTp;
+
+typedef int (*start_irq_fn_t)(VFIOINTp *intp);
+
+typedef struct VFIOPlatformDevice {
+    SysBusDevice sbdev;
+    VFIODevice vbasedev; /* not a QOM object */
+    VFIORegion **regions;
+    QLIST_HEAD(, VFIOINTp) intp_list; /* list of IRQ */
+    /* queue of pending IRQ */
+    QSIMPLEQ_HEAD(pending_intp_queue, VFIOINTp) pending_intp_queue;
+    char *compat; /* compatibility string */
+    uint32_t mmap_timeout; /* delay to re-enable mmaps after interrupt */
+    QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
+    start_irq_fn_t start_irq_fn;
+    QemuMutex  intp_mutex;
+} VFIOPlatformDevice;
+
+
+typedef struct VFIOPlatformDeviceClass {
+    /*< private >*/
+    SysBusDeviceClass parent_class;
+    /*< public >*/
+} VFIOPlatformDeviceClass;
+
+#define VFIO_PLATFORM_DEVICE(obj) \
+     OBJECT_CHECK(VFIOPlatformDevice, (obj), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(VFIOPlatformDeviceClass, (klass), TYPE_VFIO_PLATFORM)
+#define VFIO_PLATFORM_DEVICE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(VFIOPlatformDeviceClass, (obj), TYPE_VFIO_PLATFORM)
+
+/**
+ * vfio_register_irq_starter - registers a machine init done notifier that
+ * starts IRQ injection for VFIO dynamic sysbus devices attached to the
+ * platform bus.
+ *
+ * @platform_bus_first_irq: the number of the first irq assigned to the
+ *  platform bus (index in machine file global qemu_irq array)
+ */
+void vfio_register_irq_starter(int platform_bus_first_irq);
+
+#endif /*HW_VFIO_VFIO_PLATFORM_H*/
diff --git a/trace-events b/trace-events
index 255971a..54d998c 100644
--- a/trace-events
+++ b/trace-events
@@ -1428,6 +1428,18 @@ vfio_put_group(int fd) "close group->fd=%d"
 vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
+# hw/vfio/platform.c
+vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
+vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
+vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow path"
+vfio_platform_intp_interrupt(int pin, int fd) "Handle IRQ #%d (fd = %d)"
+vfio_platform_populate_interrupts(int pin, int count, int flags) "- IRQ index %d: count %d, flags=0x%x"
+vfio_platform_populate_regions(int region_index, unsigned long flag, unsigned long size, int fd, unsigned long offset) "- region %d flags = 0x%lx, size = 0x%lx, fd= %d, offset = 0x%lx"
+vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
+vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
+vfio_intp_interrupt_set_pending(int index) "irq %d is set PENDING"
+vfio_platform_eoi_handle_pending(int index) "handle PENDING IRQ %d"
+
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
 mhp_acpi_read_addr_lo(uint32_t slot, uint32_t addr) "slot[0x%"PRIx32"] addr lo: 0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (8 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-11-05 10:26   ` Alexander Graf
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 11/16] hw/arm/virt: add support for VFIO devices Eric Auger
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

The platform device class has become abstract. The device can be be
instantiated on command line using such option.

-device vfio-calxeda-xgmac,host="fff51000.ethernet"

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v5 -> v6
- back again following Alex Graf advises
- fix a bug related to compat override

v4 -> v5:
removed since device tree was moved to hw/arm/dyn_sysbus_devtree.c

v4: creation for device tree specialization
---
 hw/vfio/Makefile.objs                |  1 +
 hw/vfio/calxeda_xgmac.c              | 54 ++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-calxeda-xgmac.h | 41 +++++++++++++++++++++++++++
 3 files changed, 96 insertions(+)
 create mode 100644 hw/vfio/calxeda_xgmac.c
 create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index c5c76fe..913ab14 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 obj-$(CONFIG_SOFTMMU) += platform.o
+obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o
 endif
diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c
new file mode 100644
index 0000000..199e076
--- /dev/null
+++ b/hw/vfio/calxeda_xgmac.c
@@ -0,0 +1,54 @@
+/*
+ * calxeda xgmac example VFIO device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger <eric.auger@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/vfio/vfio-calxeda-xgmac.h"
+
+static void calxeda_xgmac_realize(DeviceState *dev, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+    VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev);
+
+    vdev->compat = g_strdup("calxeda,hb-xgmac");
+
+    k->parent_realize(dev, errp);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+    .name = TYPE_VFIO_CALXEDA_XGMAC,
+    .unmigratable = 1,
+};
+
+static void vfio_calxeda_xgmac_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VFIOCalxedaXgmacDeviceClass *vcxc =
+        VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass);
+    vcxc->parent_realize = dc->realize;
+    dc->realize = calxeda_xgmac_realize;
+    dc->desc = "VFIO Calxeda XGMAC";
+}
+
+static const TypeInfo vfio_calxeda_xgmac_dev_info = {
+    .name = TYPE_VFIO_CALXEDA_XGMAC,
+    .parent = TYPE_VFIO_PLATFORM,
+    .instance_size = sizeof(VFIOCalxedaXgmacDevice),
+    .class_init = vfio_calxeda_xgmac_class_init,
+    .class_size = sizeof(VFIOCalxedaXgmacDeviceClass),
+};
+
+static void register_calxeda_xgmac_dev_type(void)
+{
+    type_register_static(&vfio_calxeda_xgmac_dev_info);
+}
+
+type_init(register_calxeda_xgmac_dev_type)
diff --git a/include/hw/vfio/vfio-calxeda-xgmac.h b/include/hw/vfio/vfio-calxeda-xgmac.h
new file mode 100644
index 0000000..1529cf5
--- /dev/null
+++ b/include/hw/vfio/vfio-calxeda-xgmac.h
@@ -0,0 +1,41 @@
+/*
+ * VFIO calxeda xgmac device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger <eric.auger@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef HW_VFIO_VFIO_CALXEDA_XGMAC_H
+#define HW_VFIO_VFIO_CALXEDA_XGMAC_H
+
+#include "hw/vfio/vfio-platform.h"
+
+#define TYPE_VFIO_CALXEDA_XGMAC "vfio-calxeda-xgmac"
+
+typedef struct VFIOCalxedaXgmacDevice {
+    VFIOPlatformDevice vdev;
+} VFIOCalxedaXgmacDevice;
+
+typedef struct VFIOCalxedaXgmacDeviceClass {
+    /*< private >*/
+    VFIOPlatformDeviceClass parent_class;
+    /*< public >*/
+    DeviceRealize parent_realize;
+} VFIOCalxedaXgmacDeviceClass;
+
+#define VFIO_CALXEDA_XGMAC_DEVICE(obj) \
+     OBJECT_CHECK(VFIOCalxedaXgmacDevice, (obj), TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass) \
+     OBJECT_CLASS_CHECK(VFIOCalxedaXgmacDeviceClass, (klass), \
+                        TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(VFIOCalxedaXgmacDeviceClass, (obj), \
+                      TYPE_VFIO_CALXEDA_XGMAC)
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 11/16] hw/arm/virt: add support for VFIO devices
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (9 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

VFIO devices are dynamic sysbus devices. They could already be
instantiated. However for them to be functional, IRQ injection must
be programmed and started. This programming must happen after the
sysbus devices are attached to the platform bus and IRQ are bound.
Only at that time the GSI they are connected to are identified and
irqfd can be programmed.

Binding happens in a machine init done notifier registered by the
platform bus init. The IRQ start is done in another notifier that
must be registered before the platform bus creation.

This patchs adds the registration of the IRQ start notifier in machvirt.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

The registration of the IRQ start notifier could also happen in
the platform bus.
---
 hw/arm/virt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3a09d58..911dbfc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -44,6 +44,7 @@
 #include "qemu/error-report.h"
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/platform-bus.h"
+#include "hw/vfio/vfio-platform.h"
 
 #define NUM_VIRTIO_TRANSPORTS 32
 
@@ -546,6 +547,14 @@ static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic,
     MemoryRegion *sysmem = get_system_memory();
 
     /*
+     * Registers a notifier that starts VFIO IRQ injection. The notifier
+     * must be registered before the platform bus device creation. This
+     * latter registers another notifier that binds the dynamic sysbus
+     * devices to the platform bus.
+     */
+    vfio_register_irq_starter(system_params->platform_bus_first_irq);
+
+    /*
      * register the notifier that will update the device tree with
      * the platform bus and device tree nodes. Must be done before
      * the instantiation of the platform bus device that registers
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (10 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 11/16] hw/arm/virt: add support for VFIO devices Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-11-05 10:59   ` Alexander Graf
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 13/16] hw/vfio/platform: Add irqfd support Eric Auger
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

vfio-calxeda-xgmac now can be instantiated using the -device option.
The node creation function generates a very basic dt node composed
of the compat, reg and interrupts properties

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v6 -> v7:
- compat string re-formatting removed since compat string is not exposed
  anymore as a user option
- VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
  device
---
 hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index d5476f1..f8b310b 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -27,6 +27,8 @@
 #include "hw/platform-bus.h"
 #include "sysemu/sysemu.h"
 #include "hw/platform-bus.h"
+#include "hw/vfio/vfio-platform.h"
+#include "hw/vfio/vfio-calxeda-xgmac.h"
 
 /*
  * internal struct that contains the information to create dynamic
@@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
     int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
 } NodeCreationPair;
 
+static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
+
 /* list of supported dynamic sysbus devices */
 NodeCreationPair add_fdt_node_functions[] = {
+        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
         {"", NULL}, /*last element*/
 };
 
@@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
 }
 
 /**
+ * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
+ *
+ * set properties are:
+ * - compatible string
+ * - regs
+ * - interrupts
+ */
+static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformBusFdtData *data = opaque;
+    PlatformBusDevice *pbus = data->pbus;
+    void *fdt = data->fdt;
+    const char *parent_node = data->pbus_node_name;
+    int compat_str_len;
+    char *nodename;
+    int i, ret;
+    uint32_t *irq_attr;
+    uint64_t *reg_attr;
+    uint64_t mmio_base;
+    uint64_t irq_number;
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    Object *obj = OBJECT(sbdev);
+
+    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+                               vbasedev->name,
+                               mmio_base);
+
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    compat_str_len = strlen(vdev->compat) + 1;
+    qemu_fdt_setprop(fdt, nodename, "compatible",
+                          vdev->compat, compat_str_len);
+
+    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
+        reg_attr[4*i] = 1;
+        reg_attr[4*i+1] = mmio_base;
+        reg_attr[4*i+2] = 1;
+        reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
+    }
+
+    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+                     vbasedev->num_regions*2, reg_attr);
+    if (ret < 0) {
+        error_report("could not set reg property of node %s", nodename);
+        goto fail;
+    }
+
+    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+    for (i = 0; i < vbasedev->num_irqs; i++) {
+        irq_number = platform_bus_get_irqn(pbus, sbdev , i)
+                         + data->irq_start;
+        irq_attr[3*i] = cpu_to_be32(0);
+        irq_attr[3*i+1] = cpu_to_be32(irq_number);
+        irq_attr[3*i+2] = cpu_to_be32(0x4);
+    }
+
+   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+    if (ret < 0) {
+        error_report("could not set interrupts property of node %s",
+                     nodename);
+        goto fail;
+    }
+
+    g_free(nodename);
+    g_free(irq_attr);
+    g_free(reg_attr);
+
+    return 0;
+
+fail:
+
+   return -1;
+}
+
+/**
  * add_all_platform_bus_fdt_nodes - create all the platform bus nodes
  *
  * builds the parent platform bus node and all the nodes of dynamic
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 13/16] hw/vfio/platform: Add irqfd support
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (11 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

This patch aims at optimizing IRQ handling using irqfd framework.

Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.

the virtual IRQ completion is trapped at interrupt controller
This removes the need for fast/slow path swap.

Overall this brings significant performance improvements.

it depends on host kernel KVM irqfd.

Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v5 -> v6
- rely on kvm_irqfds_enabled() and kvm_resamplefds_enabled()
- guard KVM code with #ifdef CONFIG_KVM

v3 -> v4:
[Alvise Rigo]
Use of VFIO Platform driver v6 unmask/virqfd feature and removal
of resamplefd handler. Physical IRQ unmasking is now done in
VFIO driver.

v3:
[Eric Auger]
initial support with resamplefd handled on QEMU side since the
unmask was not supported on VFIO platform driver v5.

Conflicts:
	hw/vfio/platform.c
---
 hw/vfio/platform.c              | 96 +++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  1 +
 trace-events                    |  2 +
 3 files changed, 99 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 9f66610..bdd5c93 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -25,6 +25,7 @@
 #include "hw/sysbus.h"
 #include "trace.h"
 #include "hw/platform-bus.h"
+#include "sysemu/kvm.h"
 
 static void vfio_intp_interrupt(VFIOINTp *intp);
 typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
@@ -236,6 +237,83 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp)
 }
 
 /*
+ * Functions used for irqfd
+ */
+
+#ifdef CONFIG_KVM
+
+/**
+ * vfio_set_resample_eventfd - sets the resamplefd for an IRQ
+ * @intp: the IRQ struct pointer
+ * programs the VFIO driver to unmask this IRQ when the
+ * intp->unmask eventfd is triggered
+ */
+static int vfio_set_resample_eventfd(VFIOINTp *intp)
+{
+    VFIODevice *vbasedev = &intp->vdev->vbasedev;
+    struct vfio_irq_set *irq_set;
+    int argsz, ret;
+    int32_t *pfd;
+
+    argsz = sizeof(*irq_set) + sizeof(*pfd);
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+    irq_set->index = intp->pin;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    pfd = (int32_t *)&irq_set->data;
+    *pfd = event_notifier_get_fd(&intp->unmask);
+    qemu_set_fd_handler(*pfd, NULL, NULL, intp);
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    g_free(irq_set);
+    if (ret < 0) {
+        error_report("vfio: Failed to set resample eventfd: %m");
+        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+    }
+    return ret;
+}
+
+/**
+ * vfio_start_irqfd_injection - starts irqfd injection for an IRQ
+ * programs VFIO driver with both the trigger and resamplefd
+ * programs KVM with the gsi, trigger & resample eventfds
+ */
+static int vfio_start_irqfd_injection(VFIOINTp *intp)
+{
+    struct kvm_irqfd irqfd = {
+        .fd = event_notifier_get_fd(&intp->interrupt),
+        .resamplefd = event_notifier_get_fd(&intp->unmask),
+        .gsi = intp->virtualID,
+        .flags = KVM_IRQFD_FLAG_RESAMPLE,
+    };
+
+    if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
+        error_report("vfio: Error: Failed to assign the irqfd: %m");
+        goto fail_irqfd;
+    }
+    if (vfio_set_trigger_eventfd(intp, NULL) < 0) {
+        goto fail_vfio;
+    }
+    if (vfio_set_resample_eventfd(intp) < 0) {
+        goto fail_vfio;
+    }
+
+    intp->kvm_accel = true;
+    trace_vfio_platform_start_irqfd_injection(intp->pin, intp->virtualID,
+                                     irqfd.fd, irqfd.resamplefd);
+    return 0;
+
+fail_vfio:
+    irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+    kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+fail_irqfd:
+    return -1;
+}
+
+#endif
+
+/*
  * Functions used whatever the injection method
  */
 
@@ -314,6 +392,13 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
         error_report("vfio: Error: trigger event_notifier_init failed ");
         return NULL;
     }
+    /* Get an eventfd for resample/unmask */
+    ret = event_notifier_init(&intp->unmask, 0);
+    if (ret) {
+        g_free(intp);
+        error_report("vfio: Error: resample event_notifier_init failed eoi");
+        return NULL;
+    }
 
     /* store the new intp in qlist */
     QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
@@ -542,7 +627,17 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
 
     vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
     vbasedev->ops = &vfio_platform_ops;
+
+#ifdef CONFIG_KVM
+    if (kvm_irqfds_enabled() && kvm_resamplefds_enabled() &&
+        vdev->irqfd_allowed) {
+        vdev->start_irq_fn = vfio_start_irqfd_injection;
+    } else {
+        vdev->start_irq_fn = vfio_start_eventfd_injection;
+    }
+#else
     vdev->start_irq_fn = vfio_start_eventfd_injection;
+#endif
 
     trace_vfio_platform_realize(vbasedev->name, vdev->compat);
 
@@ -641,6 +736,7 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_STRING("host", VFIOPlatformDevice, vbasedev.name),
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
+    DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index 18e6807..26ddba7 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -58,6 +58,7 @@ typedef struct VFIOPlatformDevice {
     QEMUTimer *mmap_timer; /* enable mmaps after periods w/o interrupts */
     start_irq_fn_t start_irq_fn;
     QemuMutex  intp_mutex;
+    bool irqfd_allowed; /* debug option to force irqfd on/off */
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index 54d998c..a05ed80 100644
--- a/trace-events
+++ b/trace-events
@@ -1439,6 +1439,8 @@ vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d
 vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
 vfio_intp_interrupt_set_pending(int index) "irq %d is set PENDING"
 vfio_platform_eoi_handle_pending(int index) "handle PENDING IRQ %d"
+vfio_platform_start_irqfd_injection(int index, int gsi, int fd, int resamplefd) "IRQ index=%d, gsi =%d, fd = %d, resamplefd = %d"
+vfio_start_eventfd_injection(int index, int fd) "IRQ index=%d, fd = %d"
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (12 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 13/16] hw/vfio/platform: Add irqfd support Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 15/16] hw/vfio/common: vfio_kvm_device_fd moved in the common header Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 16/16] hw/vfio/platform: add forwarded irq support Eric Auger
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Syncup KVM related linux headers from linux-next tree using
scripts/update-linux-headers.sh.

Integrate updated KVM-VFIO API related to forwarded IRQ

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 linux-headers/linux/kvm.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 2669938..239b380 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -947,6 +947,12 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
+struct kvm_arch_forwarded_irq {
+        __u32 fd; /* file desciptor of the VFIO device */
+        __u32 index; /* VFIO device IRQ index */
+        __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 #define KVM_DEV_TYPE_FSL_MPIC_20	1
 #define KVM_DEV_TYPE_FSL_MPIC_42	2
 #define KVM_DEV_TYPE_XICS		3
@@ -954,6 +960,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_DEVICE			2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
 #define KVM_DEV_TYPE_ARM_VGIC_V2	5
 #define KVM_DEV_TYPE_FLIC		6
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 15/16] hw/vfio/common: vfio_kvm_device_fd moved in the common header
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (13 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 16/16] hw/vfio/platform: add forwarded irq support Eric Auger
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

the device is now used in platform for forwarded IRQ setup

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/common.c              | 3 ++-
 include/hw/vfio/vfio-common.h | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fbd9e7f..99ff89a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,9 +44,10 @@ struct vfio_as_head vfio_address_spaces =
  * initialized, this file descriptor is only released on QEMU exit and
  * we'll re-use it should another vfio device be attached before then.
  */
-static int vfio_kvm_device_fd = -1;
+int vfio_kvm_device_fd = -1;
 #endif
 
+
 /*
  * Common VFIO interrupt disable
  */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 83c7876..0ae0153 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -41,6 +41,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+#ifdef CONFIG_KVM
+extern int vfio_kvm_device_fd;
+#endif
+
+
 enum {
     VFIO_DEVICE_TYPE_PCI = 0,
     VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v7 16/16] hw/vfio/platform: add forwarded irq support
  2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
                   ` (14 preceding siblings ...)
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 15/16] hw/vfio/common: vfio_kvm_device_fd moved in the common header Eric Auger
@ 2014-10-31 14:05 ` Eric Auger
  15 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-10-31 14:05 UTC (permalink / raw)
  To: eric.auger, christoffer.dall, qemu-devel, agraf, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, eric.auger, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

Tests whether the forwarded IRQ modality is available.
In the positive device IRQs are forwarded. This control is
achieved with KVM-VFIO device. with such a modality injection
still is handled through irqfds. However end of interrupt is
not trapped anymore. As soon as the guest completes its virtual
IRQ, the corresponding physical IRQ is completed and the same
physical IRQ can hit again.

A new x-forward property enables to force forwarding off although
enabled by the kernel.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 hw/vfio/platform.c              | 52 +++++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-platform.h |  2 ++
 trace-events                    |  1 +
 3 files changed, 55 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index bdd5c93..f7ed209 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -237,6 +237,52 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp)
 }
 
 /*
+ * Functions used with forwarding capability
+ */
+
+#ifdef CONFIG_KVM
+
+static bool has_kvm_vfio_forward_capability(void)
+{
+    struct kvm_device_attr attr = {
+         .group = KVM_DEV_VFIO_DEVICE,
+         .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+    if (ioctl(vfio_kvm_device_fd, KVM_HAS_DEVICE_ATTR, &attr) == 0) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
+static int vfio_set_forwarding(VFIOINTp *intp)
+{
+    int ret;
+    struct kvm_device_attr attr = {
+         .group = KVM_DEV_VFIO_DEVICE,
+         .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+    intp->fwd_irq = g_malloc0(sizeof(*intp->fwd_irq));
+    intp->fwd_irq->fd = intp->vdev->vbasedev.fd;
+    intp->fwd_irq->index = intp->pin;
+    intp->fwd_irq->gsi = intp->virtualID;
+
+    attr.addr = (uint64_t)(unsigned long)intp->fwd_irq;
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+            error_report("Failed to forward IRQ %d through KVM VFIO device",
+                         intp->pin);
+            g_free(intp->fwd_irq);
+            return -errno;
+    }
+    trace_vfio_start_fwd_injection(intp->pin);
+
+    return ret;
+}
+
+#endif
+
+/*
  * Functions used for irqfd
  */
 
@@ -288,6 +334,11 @@ static int vfio_start_irqfd_injection(VFIOINTp *intp)
         .flags = KVM_IRQFD_FLAG_RESAMPLE,
     };
 
+    if (has_kvm_vfio_forward_capability() &&
+                 intp->vdev->forward_allowed) {
+        vfio_set_forwarding(intp);
+    }
+
     if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
         error_report("vfio: Error: Failed to assign the irqfd: %m");
         goto fail_irqfd;
@@ -737,6 +788,7 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
     DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+    DEFINE_PROP_BOOL("x-forward", VFIOPlatformDevice, forward_allowed, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index 26ddba7..d22eb0e 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -42,6 +42,7 @@ typedef struct VFIOINTp {
     bool kvm_accel; /* set when QEMU bypass through KVM enabled */
     uint8_t pin; /* index */
     uint8_t virtualID; /* virtual IRQ */
+    struct kvm_arch_forwarded_irq *fwd_irq;
 } VFIOINTp;
 
 typedef int (*start_irq_fn_t)(VFIOINTp *intp);
@@ -59,6 +60,7 @@ typedef struct VFIOPlatformDevice {
     start_irq_fn_t start_irq_fn;
     QemuMutex  intp_mutex;
     bool irqfd_allowed; /* debug option to force irqfd on/off */
+    bool forward_allowed; /* debug option to force forwarding on/off */
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index a05ed80..df3b71b 100644
--- a/trace-events
+++ b/trace-events
@@ -1429,6 +1429,7 @@ vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions,
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 # hw/vfio/platform.c
+vfio_start_fwd_injection(int pin) "forwarding set for IRQ pin %d"
 vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
 vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
 vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow path"
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device Eric Auger
@ 2014-11-05 10:26   ` Alexander Graf
  0 siblings, 0 replies; 43+ messages in thread
From: Alexander Graf @ 2014-11-05 10:26 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 31.10.14 15:05, Eric Auger wrote:
> The platform device class has become abstract. The device can be be
> instantiated on command line using such option.
> 
> -device vfio-calxeda-xgmac,host="fff51000.ethernet"
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v5 -> v6
> - back again following Alex Graf advises
> - fix a bug related to compat override
> 
> v4 -> v5:
> removed since device tree was moved to hw/arm/dyn_sysbus_devtree.c
> 
> v4: creation for device tree specialization
> ---
>  hw/vfio/Makefile.objs                |  1 +
>  hw/vfio/calxeda_xgmac.c              | 54 ++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio-calxeda-xgmac.h | 41 +++++++++++++++++++++++++++
>  3 files changed, 96 insertions(+)
>  create mode 100644 hw/vfio/calxeda_xgmac.c
>  create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index c5c76fe..913ab14 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
>  obj-$(CONFIG_SOFTMMU) += platform.o
> +obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o
>  endif
> diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c
> new file mode 100644
> index 0000000..199e076
> --- /dev/null
> +++ b/hw/vfio/calxeda_xgmac.c
> @@ -0,0 +1,54 @@
> +/*
> + * calxeda xgmac example VFIO device
> + *
> + * Copyright Linaro Limited, 2014
> + *
> + * Authors:
> + *  Eric Auger <eric.auger@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "hw/vfio/vfio-calxeda-xgmac.h"
> +
> +static void calxeda_xgmac_realize(DeviceState *dev, Error **errp)
> +{
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> +    VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev);
> +
> +    vdev->compat = g_strdup("calxeda,hb-xgmac");
> +
> +    k->parent_realize(dev, errp);

Since MMIO and IRQ line exposure happens in the parent, I would like to
see a comment here explaining the semantics of each region here. That
way users at least have the chance to figure out what each MMIO number
and IRQ number mean.

Also, since this device will probably get used as example code for
others, I'd like to make sure we set a proper precedence, even if it's
"trivial" in this case.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
@ 2014-11-05 10:29   ` Alexander Graf
  2014-11-05 12:03     ` Eric Auger
  2014-11-26  9:45     ` Eric Auger
  0 siblings, 2 replies; 43+ messages in thread
From: Alexander Graf @ 2014-11-05 10:29 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 31.10.14 15:05, Eric Auger wrote:
> Minimal VFIO platform implementation supporting
> - register space user mapping,
> - IRQ assignment based on eventfds handled on qemu side.
> 
> irqfd kernel acceleration comes in a subsequent patch.
> 
> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> v6 -> v7:
> - compat is not exposed anymore as a user option. Rationale is
>   the vfio device became abstract and a specialization is needed
>   anyway. The derived device must set the compat string.
> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
>   A new function dubbed vfio_register_irq_starter replaces it. It
>   registers a machine init done notifier that programs & starts
>   all dynamic VFIO device IRQs. This function is supposed to be
>   called by the machine file. A set of static helper routines are
>   added too. It must be called before the creation of the platform
>   bus device.
> 
> v5 -> v6:
> - vfio_device property renamed into host property
> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
>   and remove PCI related comment
> - remove declaration of vfio_setup_irqfd and irqfd_allowed
>   property.Both belong to next patch (irqfd)
> - remove declaration of vfio_intp_interrupt in vfio-platform.h
> - functions that can be static get this characteristic
> - remove declarations of vfio_region_ops, vfio_memory_listener,
>   group_list, vfio_address_spaces. All are moved to vfio-common.h
> - remove vfio_put_device declaration and definition
> - print_regions removed. code moved into vfio_populate_regions
> - replace DPRINTF by trace events
> - new helper routine to set the trigger eventfd
> - dissociate intp init from the injection enablement:
>   vfio_enable_intp renamed into vfio_init_intp and new function
>   named vfio_start_eventfd_injection
> - injection start moved to vfio_start_irq_injection (not anymore
>   in vfio_populate_interrupt)
> - new start_irq_fn field in VFIOPlatformDevice corresponding to
>   the function that will be used for starting injection
> - user handled eventfd:
>   x add mutex to protect IRQ state & list manipulation,
>   x correct misleading comment in vfio_intp_interrupt.
>   x Fix bugs thanks to fake interrupt modality
> - VFIOPlatformDeviceClass becomes abstract
> - add error_setg in vfio_platform_realize
> 
> v4 -> v5:
> - vfio-plaform.h included first
> - cleanup error handling in *populate*, vfio_get_device,
>   vfio_enable_intp
> - vfio_put_device not called anymore
> - add some includes to follow vfio policy
> 
> v3 -> v4:
> [Eric Auger]
> - merge of "vfio: Add initial IRQ support in platform device"
>   to get a full functional patch although perfs are limited.
> - removal of unrealize function since I currently understand
>   it is only used with device hot-plug feature.
> 
> v2 -> v3:
> [Eric Auger]
> - further factorization between PCI and platform (VFIORegion,
>   VFIODevice). same level of functionality.
> 
> <= v2:
> [Kim Philipps]
> - Initial Creation of the device supporting register space mapping
> ---
>  hw/vfio/Makefile.objs           |   1 +
>  hw/vfio/platform.c              | 672 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio-common.h   |   1 +
>  include/hw/vfio/vfio-platform.h |  87 ++++++
>  trace-events                    |  12 +
>  5 files changed, 773 insertions(+)
>  create mode 100644 hw/vfio/platform.c
>  create mode 100644 include/hw/vfio/vfio-platform.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..c5c76fe 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,5 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o
> +obj-$(CONFIG_SOFTMMU) += platform.o
>  endif
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> new file mode 100644
> index 0000000..9f66610
> --- /dev/null
> +++ b/hw/vfio/platform.c
> @@ -0,0 +1,672 @@
> +/*
> + * vfio based device assignment support - platform devices
> + *
> + * Copyright Linaro Limited, 2014
> + *
> + * Authors:
> + *  Kim Phillips <kim.phillips@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Based on vfio based PCI device assignment support:
> + *  Copyright Red Hat, Inc. 2012
> + */
> +
> +#include <linux/vfio.h>
> +#include <sys/ioctl.h>
> +
> +#include "hw/vfio/vfio-platform.h"
> +#include "qemu/error-report.h"
> +#include "qemu/range.h"
> +#include "sysemu/sysemu.h"
> +#include "exec/memory.h"
> +#include "qemu/queue.h"
> +#include "hw/sysbus.h"
> +#include "trace.h"
> +#include "hw/platform-bus.h"
> +
> +static void vfio_intp_interrupt(VFIOINTp *intp);
> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
> +                                    eventfd_user_side_handler_t handler);
> +
> +/*
> + * Functions only used when eventfd are handled on user-side
> + * ie. without irqfd
> + */
> +
> +/**
> + * vfio_platform_eoi - IRQ completion routine
> + * @vbasedev: the VFIO device
> + *
> + * de-asserts the active virtual IRQ and unmask the physical IRQ
> + * (masked by the  VFIO driver). Handle pending IRQs if any.
> + * eoi function is called on the first access to any MMIO region
> + * after an IRQ was triggered. It is assumed this access corresponds
> + * to the IRQ status register reset. With such a mechanism, a single
> + * IRQ can be handled at a time since there is no way to know which
> + * IRQ was completed by the guest (we would need additional details
> + * about the IRQ status register mask)
> + */
> +static void vfio_platform_eoi(VFIODevice *vbasedev)
> +{
> +    VFIOINTp *intp;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    qemu_mutex_lock(&vdev->intp_mutex);
> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
> +        if (intp->state == VFIO_IRQ_ACTIVE) {
> +            trace_vfio_platform_eoi(intp->pin,
> +                                event_notifier_get_fd(&intp->interrupt));
> +            intp->state = VFIO_IRQ_INACTIVE;
> +
> +            /* deassert the virtual IRQ and unmask physical one */
> +            qemu_set_irq(intp->qemuirq, 0);
> +            vfio_unmask_irqindex(vbasedev, intp->pin);
> +
> +            /* a single IRQ can be active at a time */
> +            break;
> +        }
> +    }
> +    /* in case there are pending IRQs, handle them one at a time */
> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
> +        trace_vfio_platform_eoi_handle_pending(intp->pin);
> +        qemu_mutex_unlock(&vdev->intp_mutex);
> +        vfio_intp_interrupt(intp);
> +        qemu_mutex_lock(&vdev->intp_mutex);
> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
> +        qemu_mutex_unlock(&vdev->intp_mutex);
> +    } else {
> +        qemu_mutex_unlock(&vdev->intp_mutex);
> +    }
> +}
> +
> +/**
> + * vfio_mmap_set_enabled - enable/disable the fast path mode
> + * @vdev: the VFIO platform device
> + * @enabled: the target mmap state
> + *
> + * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
> + * false ~ slow path = MMIO region is trapped and region callbacks
> + * are called slow path enables to trap the IRQ status register
> + * guest reset
> +*/
> +
> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
> +{
> +    VFIORegion *region;
> +    int i;
> +
> +    trace_vfio_platform_mmap_set_enabled(enabled);
> +
> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
> +        region = vdev->regions[i];
> +
> +        /* register space is unmapped to trap EOI */
> +        memory_region_set_enabled(&region->mmap_mem, enabled);
> +    }
> +}
> +
> +/**
> + * vfio_intp_mmap_enable - timer function, restores the fast path
> + * if there is no more active IRQ
> + * @opaque: actually points to the VFIO platform device
> + *
> + * Called on mmap timer timout, this function checks whether the
> + * IRQ is still active and in the negative restores the fast path.
> + * by construction a single eventfd is handled at a time.
> + * if the IRQ is still active, the timer is restarted.
> + */
> +static void vfio_intp_mmap_enable(void *opaque)
> +{
> +    VFIOINTp *tmp;
> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
> +
> +    qemu_mutex_lock(&vdev->intp_mutex);
> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
> +            trace_vfio_platform_intp_mmap_enable(tmp->pin);
> +            /* re-program the timer to check active status later */
> +            timer_mod(vdev->mmap_timer,
> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                          vdev->mmap_timeout);
> +            qemu_mutex_unlock(&vdev->intp_mutex);
> +            return;
> +        }
> +    }
> +    vfio_mmap_set_enabled(vdev, true);
> +    qemu_mutex_unlock(&vdev->intp_mutex);
> +}
> +
> +/**
> + * vfio_intp_interrupt - The user-side eventfd handler
> + * @opaque: opaque pointer which in practice is the VFIOINTp*
> + *
> + * the function can be entered
> + * - in event handler context: this IRQ is inactive
> + *   in that case, the vIRQ is injected into the guest if there
> + *   is no other active or pending IRQ.
> + * - in IOhandler context: this IRQ is pending.
> + *   there is no ACTIVE IRQ
> + */
> +static void vfio_intp_interrupt(VFIOINTp *intp)
> +{
> +    int ret;
> +    VFIOINTp *tmp;
> +    VFIOPlatformDevice *vdev = intp->vdev;
> +    bool delay_handling = false;
> +
> +    qemu_mutex_lock(&vdev->intp_mutex);
> +    if (intp->state == VFIO_IRQ_INACTIVE) {
> +        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
> +            if (tmp->state == VFIO_IRQ_ACTIVE ||
> +                tmp->state == VFIO_IRQ_PENDING) {
> +                delay_handling = true;
> +                break;
> +            }
> +        }
> +    }
> +    if (delay_handling) {
> +        /*
> +         * the new IRQ gets a pending status and is pushed in
> +         * the pending queue
> +         */
> +        intp->state = VFIO_IRQ_PENDING;
> +        trace_vfio_intp_interrupt_set_pending(intp->pin);
> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
> +                             intp, pqnext);
> +        ret = event_notifier_test_and_clear(&intp->interrupt);
> +        qemu_mutex_unlock(&vdev->intp_mutex);
> +        return;
> +    }
> +
> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
> +    trace_vfio_platform_intp_interrupt(intp->pin,
> +                              event_notifier_get_fd(&intp->interrupt));
> +
> +    if (intp->state == VFIO_IRQ_INACTIVE) {
> +        ret = event_notifier_test_and_clear(&intp->interrupt);
> +        if (!ret) {
> +            error_report("Error when clearing fd=%d (ret = %d)\n",
> +                         event_notifier_get_fd(&intp->interrupt), ret);
> +        }
> +    } /* else this is a pending IRQ that moves to ACTIVE state */
> +
> +    intp->state = VFIO_IRQ_ACTIVE;
> +
> +    /* sets slow path */
> +    vfio_mmap_set_enabled(vdev, false);
> +
> +    /* trigger the virtual IRQ */
> +    qemu_set_irq(intp->qemuirq, 1);
> +
> +    /* schedule the mmap timer which will restore mmap path after EOI*/
> +    if (vdev->mmap_timeout) {
> +        timer_mod(vdev->mmap_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                      vdev->mmap_timeout);
> +    }
> +    qemu_mutex_unlock(&vdev->intp_mutex);
> +}
> +
> +/**
> + * vfio_start_eventfd_injection - starts the virtual IRQ injection using
> + * user-side handled eventfds
> + * @intp: the IRQ struct pointer
> + */
> +
> +static int vfio_start_eventfd_injection(VFIOINTp *intp)
> +{
> +    int ret;
> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
> +
> +    vfio_mask_irqindex(vbasedev, intp->pin);
> +
> +    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
> +    if (ret) {
> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
> +        vfio_unmask_irqindex(vbasedev, intp->pin);
> +        return ret;
> +    }
> +    vfio_unmask_irqindex(vbasedev, intp->pin);
> +    return 0;
> +}
> +
> +/*
> + * Functions used whatever the injection method
> + */
> +
> +/**
> + * vfio_set_trigger_eventfd - set VFIO eventfd handling
> + * ie. program the VFIO driver to associates a given IRQ index
> + * with a fd handler
> + *
> + * @intp: IRQ struct pointer
> + * @handler: handler to be called on eventfd trigger
> + */
> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
> +                                    eventfd_user_side_handler_t handler)
> +{
> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
> +    struct vfio_irq_set *irq_set;
> +    int argsz, ret;
> +    int32_t *pfd;
> +
> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = intp->pin;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    pfd = (int32_t *)&irq_set->data;
> +    *pfd = event_notifier_get_fd(&intp->interrupt);
> +    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    g_free(irq_set);
> +    if (ret < 0) {
> +        error_report("vfio: Failed to set trigger eventfd: %m");
> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
> +    }
> +    return ret;
> +}
> +
> +/* not implemented yet */
> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
> +{
> +return false;
> +}
> +
> +/* not implemented yet */
> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
> +{
> +return 0;
> +}
> +
> +/**
> + * vfio_init_intp - allocate, initialize the IRQ struct pointer
> + * and add it into the list of IRQ
> + * @vbasedev: the VFIO device
> + * @index: VFIO device IRQ index
> + */
> +static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
> +{
> +    int ret;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
> +    VFIOINTp *intp;
> +
> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
> +    intp = g_malloc0(sizeof(*intp));
> +    intp->vdev = vdev;
> +    intp->pin = index;
> +    intp->state = VFIO_IRQ_INACTIVE;
> +    sysbus_init_irq(sbdev, &intp->qemuirq);
> +
> +    /* Get an eventfd for trigger */
> +    ret = event_notifier_init(&intp->interrupt, 0);
> +    if (ret) {
> +        g_free(intp);
> +        error_report("vfio: Error: trigger event_notifier_init failed ");
> +        return NULL;
> +    }
> +
> +    /* store the new intp in qlist */
> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
> +    return intp;
> +}
> +
> +/**
> + * vfio_populate_device - initialize MMIO region and IRQ
> + * @vbasedev: the VFIO device
> + *
> + * query the VFIO device for exposed MMIO regions and IRQ and
> + * populate the associated fields in the device struct
> + */
> +static int vfio_populate_device(VFIODevice *vbasedev)
> +{
> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> +    VFIOINTp *intp;
> +    int i, ret = 0;
> +    VFIOPlatformDevice *vdev =
> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
> +
> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
> +        reg_info.index = i;
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> +        if (ret) {
> +            error_report("vfio: Error getting region %d info: %m", i);
> +            goto error;
> +        }
> +        vdev->regions[i]->flags = reg_info.flags;
> +        vdev->regions[i]->size = reg_info.size;
> +        vdev->regions[i]->fd_offset = reg_info.offset;
> +        vdev->regions[i]->nr = i;
> +        vdev->regions[i]->vbasedev = vbasedev;
> +
> +        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
> +                            (unsigned long)vdev->regions[i]->flags,
> +                            (unsigned long)vdev->regions[i]->size,
> +                            vdev->regions[i]->vbasedev->fd,
> +                            (unsigned long)vdev->regions[i]->fd_offset);
> +    }
> +
> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                    vfio_intp_mmap_enable, vdev);
> +
> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        irq.index = i;
> +
> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> +        if (ret) {
> +            error_printf("vfio: error getting device %s irq info",
> +                         vbasedev->name);
> +            return ret;
> +        } else {
> +            trace_vfio_platform_populate_interrupts(irq.index,
> +                                                    irq.count,
> +                                                    irq.flags);
> +            intp = vfio_init_intp(vbasedev, irq.index);
> +            if (!intp) {
> +                error_report("vfio: Error installing IRQ %d up", i);
> +                return ret;
> +            }
> +        }
> +    }
> +    return 0;
> +error:
> +    return ret;
> +}
> +
> +/*
> + * vfio_start_irq_injection - associates a virtual irq to a
> + * VFIO IRQ index and start the injection of this IRQ
> + * @s: SysBus Device
> + * @index: VFIO IRQ index
> + * @virq: the virtual IRQ number, aka gsi
> + *
> + * this function is called when the device tree is built
> + */
> +static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
> +{
> +    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
> +    VFIOINTp *intp;
> +
> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
> +        if (intp->pin == index) {
> +            intp->virtualID = virq;
> +            vdev->start_irq_fn(intp);
> +        }
> +    }
> +}
> +
> +/* specialized functions ofr VFIO Platform devices */
> +static VFIODeviceOps vfio_platform_ops = {
> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
> +    .vfio_eoi = vfio_platform_eoi,
> +    .vfio_populate_device = vfio_populate_device,
> +};
> +
> +/**
> + * vfio_base_device_init - implements some of the VFIO mechanics
> + * @vbasedev: the VFIO device
> + *
> + * retrieves the group the device belongs to and get the device fd
> + * returns the VFIO device fd
> + * precondition: the device name must be initialized
> + */
> +static int vfio_base_device_init(VFIODevice *vbasedev)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev_iter;
> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
> +    ssize_t len;
> +    struct stat st;
> +    int groupid;
> +    int ret;
> +
> +    /* name must be set prior to the call */
> +    if (!vbasedev->name) {
> +        return -EINVAL;
> +    }
> +
> +    /* Check that the host device exists */
> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
> +             vbasedev->name);
> +
> +    if (stat(path, &st) < 0) {
> +        error_report("vfio: error: no such host device: %s", path);
> +        return -errno;
> +    }
> +
> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
> +    len = readlink(path, iommu_group_path, sizeof(path));
> +    if (len <= 0 || len >= sizeof(path)) {
> +        error_report("vfio: error no iommu_group for device");
> +        return len < 0 ? -errno : ENAMETOOLONG;
> +    }
> +
> +    iommu_group_path[len] = 0;
> +    group_name = basename(iommu_group_path);
> +
> +    if (sscanf(group_name, "%d", &groupid) != 1) {
> +        error_report("vfio: error reading %s: %m", path);
> +        return -errno;
> +    }
> +
> +    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
> +
> +    group = vfio_get_group(groupid, &address_space_memory);
> +    if (!group) {
> +        error_report("vfio: failed to get group %d", groupid);
> +        return -ENOENT;
> +    }
> +
> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
> +
> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
> +            error_report("vfio: error: device %s is already attached", path);
> +            vfio_put_group(group);
> +            return -EBUSY;
> +        }
> +    }
> +    ret = vfio_get_device(group, path, vbasedev);
> +    if (ret) {
> +        error_report("vfio: failed to get device %s", path);
> +        vfio_put_group(group);
> +    }
> +    return ret;
> +}
> +
> +/**
> + * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
> + * given index
> + * @vdev: the VFIO platform device
> + * @nr: the index of the region
> + *
> + * init the top memory region and the mmapped memroy region beneath
> + * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
> + * and could not be passed to memory region functions
> +*/
> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
> +{
> +    VFIORegion *region = vdev->regions[nr];
> +    unsigned size = region->size;
> +    char name[64];
> +
> +    if (!size) {
> +        return;
> +    }
> +
> +    snprintf(name, sizeof(name), "VFIO %s region %d",
> +             vdev->vbasedev.name, nr);
> +
> +    /* A "slow" read/write mapping underlies all regions */
> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
> +                          region, name, size);
> +
> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
> +
> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
> +        error_report("%s unsupported. Performance may be slow", name);
> +    }
> +}
> +
> +/**
> + * vfio_platform_realize  - the device realize function
> + * @dev: device state pointer
> + * @errp: error
> + *
> + * initialize the device, its memory regions and IRQ structures
> + * IRQ are started separately
> + */
> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
> +{
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    int i, ret;
> +
> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
> +    vbasedev->ops = &vfio_platform_ops;
> +    vdev->start_irq_fn = vfio_start_eventfd_injection;
> +
> +    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
> +
> +    ret = vfio_base_device_init(vbasedev);
> +    if (ret) {
> +        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
> +                   vbasedev->name);
> +        return;
> +    }
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        vfio_map_region(vdev, i);
> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
> +    }
> +}
> +
> +/*
> + * Mechanics to program/start irq injection on machine init done notifier:
> + * this is needed since at finalize time, the device IRQ are not yet
> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
> + * always is used. Binding to the platform bus IRQ happens on a machine
> + * init done notifier registered by the machine file. After its execution
> + * we execute a new notifier that actually starts the injection. When using
> + * irqfd, programming the injection consists in associating eventfds to
> + * GSI number,ie. virtual IRQ number
> + */
> +
> +typedef struct VfioIrqStarterNotifierParams {
> +    unsigned int platform_bus_first_irq;
> +    Notifier notifier;
> +} VfioIrqStarterNotifierParams;
> +
> +typedef struct VfioIrqStartParams {
> +    PlatformBusDevice *pbus;
> +    int platform_bus_first_irq;
> +} VfioIrqStartParams;
> +
> +/* Start injection of IRQ for a specific VFIO device */
> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
> +{
> +    int i;
> +    VfioIrqStartParams *p = opaque;
> +    VFIOPlatformDevice *vdev;
> +    VFIODevice *vbasedev;
> +    uint64_t irq_number;
> +    PlatformBusDevice *pbus = p->pbus;
> +    int platform_bus_first_irq = p->platform_bus_first_irq;
> +
> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +        vbasedev = &vdev->vbasedev;
> +        for (i = 0; i < vbasedev->num_irqs; i++) {
> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
> +                             + platform_bus_first_irq;
> +            vfio_start_irq_injection(sbdev, i, irq_number);
> +        }
> +    }
> +    return 0;
> +}
> +
> +/* loop on all VFIO platform devices and start their IRQ injection */
> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
> +{
> +    VfioIrqStarterNotifierParams *p =
> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
> +    DeviceState *dev =
> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
> +
> +    if (pbus->done_gathering) {
> +        VfioIrqStartParams data = {
> +            .pbus = pbus,
> +            .platform_bus_first_irq = p->platform_bus_first_irq,
> +        };
> +
> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
> +    }
> +}
> +
> +/* registers the machine init done notifier that will start VFIO IRQ */
> +void vfio_register_irq_starter(int platform_bus_first_irq)
> +{
> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
> +
> +    p->platform_bus_first_irq = platform_bus_first_irq;
> +    p->notifier.notify = vfio_irq_starter_notify;
> +    qemu_add_machine_init_done_notifier(&p->notifier);

Could you add a notifier for each device instead? Then the notifier
would be part of the vfio device struct and not some dangling random
pointer :).

Of course instead of foreach_dynamic_sysbus_device() you would directly
know the device you're dealing with and only handle a single device per
notifier.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
@ 2014-11-05 10:59   ` Alexander Graf
  2014-11-05 12:31     ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-05 10:59 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 31.10.14 15:05, Eric Auger wrote:
> vfio-calxeda-xgmac now can be instantiated using the -device option.
> The node creation function generates a very basic dt node composed
> of the compat, reg and interrupts properties
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v6 -> v7:
> - compat string re-formatting removed since compat string is not exposed
>   anymore as a user option
> - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
>   device
> ---
>  hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
> 
> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
> index d5476f1..f8b310b 100644
> --- a/hw/arm/sysbus-fdt.c
> +++ b/hw/arm/sysbus-fdt.c
> @@ -27,6 +27,8 @@
>  #include "hw/platform-bus.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/platform-bus.h"
> +#include "hw/vfio/vfio-platform.h"
> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>  
>  /*
>   * internal struct that contains the information to create dynamic
> @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
>      int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
>  } NodeCreationPair;
>  
> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
> +
>  /* list of supported dynamic sysbus devices */
>  NodeCreationPair add_fdt_node_functions[] = {
> +        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
>          {"", NULL}, /*last element*/
>  };

Can you maybe place the list somewhere smartly to make sure we don't
need forward declarations? Either put it in between the "generic" and
"device specific" code or at the end of the file with a single forward
declaration for the array?

>  
> @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
>  }
>  
>  /**
> + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
> + *
> + * set properties are:
> + * - compatible string
> + * - regs
> + * - interrupts
> + */
> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
> +{
> +    PlatformBusFdtData *data = opaque;
> +    PlatformBusDevice *pbus = data->pbus;
> +    void *fdt = data->fdt;
> +    const char *parent_node = data->pbus_node_name;
> +    int compat_str_len;
> +    char *nodename;
> +    int i, ret;
> +    uint32_t *irq_attr;
> +    uint64_t *reg_attr;
> +    uint64_t mmio_base;
> +    uint64_t irq_number;
> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    Object *obj = OBJECT(sbdev);
> +
> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
> +
> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +                               vbasedev->name,
> +                               mmio_base);
> +
> +    qemu_fdt_add_subnode(fdt, nodename);
> +
> +    compat_str_len = strlen(vdev->compat) + 1;
> +    qemu_fdt_setprop(fdt, nodename, "compatible",
> +                          vdev->compat, compat_str_len);

What if there are multiple compatibles?

> +
> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
> +        reg_attr[4*i] = 1;

What is the 1 here?

> +        reg_attr[4*i+1] = mmio_base;
> +        reg_attr[4*i+2] = 1;

and here?

> +        reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
> +    }
> +
> +    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
> +                     vbasedev->num_regions*2, reg_attr);
> +    if (ret < 0) {
> +        error_report("could not set reg property of node %s", nodename);
> +        goto fail;
> +    }
> +
> +    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
> +
> +    for (i = 0; i < vbasedev->num_irqs; i++) {
> +        irq_number = platform_bus_get_irqn(pbus, sbdev , i)
> +                         + data->irq_start;
> +        irq_attr[3*i] = cpu_to_be32(0);
> +        irq_attr[3*i+1] = cpu_to_be32(irq_number);
> +        irq_attr[3*i+2] = cpu_to_be32(0x4);

Why 0x4? How do you know whether an IRQ is edge or level triggered?

I'm still not convinced we can make anything "generic" on the VFIO path.
How about you call the function xgmac specific for now, but keep the
code as dynamic as it is?


Alex

> +    }
> +
> +   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
> +                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
> +    if (ret < 0) {
> +        error_report("could not set interrupts property of node %s",
> +                     nodename);
> +        goto fail;
> +    }
> +
> +    g_free(nodename);
> +    g_free(irq_attr);
> +    g_free(reg_attr);
> +
> +    return 0;
> +
> +fail:
> +
> +   return -1;
> +}
> +
> +/**
>   * add_all_platform_bus_fdt_nodes - create all the platform bus nodes
>   *
>   * builds the parent platform bus node and all the nodes of dynamic
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-05 10:29   ` Alexander Graf
@ 2014-11-05 12:03     ` Eric Auger
  2014-11-05 13:05       ` Alexander Graf
  2014-11-26  9:45     ` Eric Auger
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-05 12:03 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/05/2014 11:29 AM, Alexander Graf wrote:
> 
> 
> On 31.10.14 15:05, Eric Auger wrote:
>> Minimal VFIO platform implementation supporting
>> - register space user mapping,
>> - IRQ assignment based on eventfds handled on qemu side.
>>
>> irqfd kernel acceleration comes in a subsequent patch.
>>
>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v6 -> v7:
>> - compat is not exposed anymore as a user option. Rationale is
>>   the vfio device became abstract and a specialization is needed
>>   anyway. The derived device must set the compat string.
>> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
>>   A new function dubbed vfio_register_irq_starter replaces it. It
>>   registers a machine init done notifier that programs & starts
>>   all dynamic VFIO device IRQs. This function is supposed to be
>>   called by the machine file. A set of static helper routines are
>>   added too. It must be called before the creation of the platform
>>   bus device.
>>
>> v5 -> v6:
>> - vfio_device property renamed into host property
>> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
>>   and remove PCI related comment
>> - remove declaration of vfio_setup_irqfd and irqfd_allowed
>>   property.Both belong to next patch (irqfd)
>> - remove declaration of vfio_intp_interrupt in vfio-platform.h
>> - functions that can be static get this characteristic
>> - remove declarations of vfio_region_ops, vfio_memory_listener,
>>   group_list, vfio_address_spaces. All are moved to vfio-common.h
>> - remove vfio_put_device declaration and definition
>> - print_regions removed. code moved into vfio_populate_regions
>> - replace DPRINTF by trace events
>> - new helper routine to set the trigger eventfd
>> - dissociate intp init from the injection enablement:
>>   vfio_enable_intp renamed into vfio_init_intp and new function
>>   named vfio_start_eventfd_injection
>> - injection start moved to vfio_start_irq_injection (not anymore
>>   in vfio_populate_interrupt)
>> - new start_irq_fn field in VFIOPlatformDevice corresponding to
>>   the function that will be used for starting injection
>> - user handled eventfd:
>>   x add mutex to protect IRQ state & list manipulation,
>>   x correct misleading comment in vfio_intp_interrupt.
>>   x Fix bugs thanks to fake interrupt modality
>> - VFIOPlatformDeviceClass becomes abstract
>> - add error_setg in vfio_platform_realize
>>
>> v4 -> v5:
>> - vfio-plaform.h included first
>> - cleanup error handling in *populate*, vfio_get_device,
>>   vfio_enable_intp
>> - vfio_put_device not called anymore
>> - add some includes to follow vfio policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> - merge of "vfio: Add initial IRQ support in platform device"
>>   to get a full functional patch although perfs are limited.
>> - removal of unrealize function since I currently understand
>>   it is only used with device hot-plug feature.
>>
>> v2 -> v3:
>> [Eric Auger]
>> - further factorization between PCI and platform (VFIORegion,
>>   VFIODevice). same level of functionality.
>>
>> <= v2:
>> [Kim Philipps]
>> - Initial Creation of the device supporting register space mapping
>> ---
>>  hw/vfio/Makefile.objs           |   1 +
>>  hw/vfio/platform.c              | 672 ++++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio-common.h   |   1 +
>>  include/hw/vfio/vfio-platform.h |  87 ++++++
>>  trace-events                    |  12 +
>>  5 files changed, 773 insertions(+)
>>  create mode 100644 hw/vfio/platform.c
>>  create mode 100644 include/hw/vfio/vfio-platform.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..c5c76fe 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,5 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>  endif
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> new file mode 100644
>> index 0000000..9f66610
>> --- /dev/null
>> +++ b/hw/vfio/platform.c
>> @@ -0,0 +1,672 @@
>> +/*
>> + * vfio based device assignment support - platform devices
>> + *
>> + * Copyright Linaro Limited, 2014
>> + *
>> + * Authors:
>> + *  Kim Phillips <kim.phillips@linaro.org>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on vfio based PCI device assignment support:
>> + *  Copyright Red Hat, Inc. 2012
>> + */
>> +
>> +#include <linux/vfio.h>
>> +#include <sys/ioctl.h>
>> +
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/range.h"
>> +#include "sysemu/sysemu.h"
>> +#include "exec/memory.h"
>> +#include "qemu/queue.h"
>> +#include "hw/sysbus.h"
>> +#include "trace.h"
>> +#include "hw/platform-bus.h"
>> +
>> +static void vfio_intp_interrupt(VFIOINTp *intp);
>> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>> +                                    eventfd_user_side_handler_t handler);
>> +
>> +/*
>> + * Functions only used when eventfd are handled on user-side
>> + * ie. without irqfd
>> + */
>> +
>> +/**
>> + * vfio_platform_eoi - IRQ completion routine
>> + * @vbasedev: the VFIO device
>> + *
>> + * de-asserts the active virtual IRQ and unmask the physical IRQ
>> + * (masked by the  VFIO driver). Handle pending IRQs if any.
>> + * eoi function is called on the first access to any MMIO region
>> + * after an IRQ was triggered. It is assumed this access corresponds
>> + * to the IRQ status register reset. With such a mechanism, a single
>> + * IRQ can be handled at a time since there is no way to know which
>> + * IRQ was completed by the guest (we would need additional details
>> + * about the IRQ status register mask)
>> + */
>> +static void vfio_platform_eoi(VFIODevice *vbasedev)
>> +{
>> +    VFIOINTp *intp;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>> +        if (intp->state == VFIO_IRQ_ACTIVE) {
>> +            trace_vfio_platform_eoi(intp->pin,
>> +                                event_notifier_get_fd(&intp->interrupt));
>> +            intp->state = VFIO_IRQ_INACTIVE;
>> +
>> +            /* deassert the virtual IRQ and unmask physical one */
>> +            qemu_set_irq(intp->qemuirq, 0);
>> +            vfio_unmask_irqindex(vbasedev, intp->pin);
>> +
>> +            /* a single IRQ can be active at a time */
>> +            break;
>> +        }
>> +    }
>> +    /* in case there are pending IRQs, handle them one at a time */
>> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
>> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
>> +        trace_vfio_platform_eoi_handle_pending(intp->pin);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +        vfio_intp_interrupt(intp);
>> +        qemu_mutex_lock(&vdev->intp_mutex);
>> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +    } else {
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_mmap_set_enabled - enable/disable the fast path mode
>> + * @vdev: the VFIO platform device
>> + * @enabled: the target mmap state
>> + *
>> + * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
>> + * false ~ slow path = MMIO region is trapped and region callbacks
>> + * are called slow path enables to trap the IRQ status register
>> + * guest reset
>> +*/
>> +
>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
>> +{
>> +    VFIORegion *region;
>> +    int i;
>> +
>> +    trace_vfio_platform_mmap_set_enabled(enabled);
>> +
>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>> +        region = vdev->regions[i];
>> +
>> +        /* register space is unmapped to trap EOI */
>> +        memory_region_set_enabled(&region->mmap_mem, enabled);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_intp_mmap_enable - timer function, restores the fast path
>> + * if there is no more active IRQ
>> + * @opaque: actually points to the VFIO platform device
>> + *
>> + * Called on mmap timer timout, this function checks whether the
>> + * IRQ is still active and in the negative restores the fast path.
>> + * by construction a single eventfd is handled at a time.
>> + * if the IRQ is still active, the timer is restarted.
>> + */
>> +static void vfio_intp_mmap_enable(void *opaque)
>> +{
>> +    VFIOINTp *tmp;
>> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>> +            trace_vfio_platform_intp_mmap_enable(tmp->pin);
>> +            /* re-program the timer to check active status later */
>> +            timer_mod(vdev->mmap_timer,
>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                          vdev->mmap_timeout);
>> +            qemu_mutex_unlock(&vdev->intp_mutex);
>> +            return;
>> +        }
>> +    }
>> +    vfio_mmap_set_enabled(vdev, true);
>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>> +}
>> +
>> +/**
>> + * vfio_intp_interrupt - The user-side eventfd handler
>> + * @opaque: opaque pointer which in practice is the VFIOINTp*
>> + *
>> + * the function can be entered
>> + * - in event handler context: this IRQ is inactive
>> + *   in that case, the vIRQ is injected into the guest if there
>> + *   is no other active or pending IRQ.
>> + * - in IOhandler context: this IRQ is pending.
>> + *   there is no ACTIVE IRQ
>> + */
>> +static void vfio_intp_interrupt(VFIOINTp *intp)
>> +{
>> +    int ret;
>> +    VFIOINTp *tmp;
>> +    VFIOPlatformDevice *vdev = intp->vdev;
>> +    bool delay_handling = false;
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>> +        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +            if (tmp->state == VFIO_IRQ_ACTIVE ||
>> +                tmp->state == VFIO_IRQ_PENDING) {
>> +                delay_handling = true;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +    if (delay_handling) {
>> +        /*
>> +         * the new IRQ gets a pending status and is pushed in
>> +         * the pending queue
>> +         */
>> +        intp->state = VFIO_IRQ_PENDING;
>> +        trace_vfio_intp_interrupt_set_pending(intp->pin);
>> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
>> +                             intp, pqnext);
>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +        return;
>> +    }
>> +
>> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
>> +    trace_vfio_platform_intp_interrupt(intp->pin,
>> +                              event_notifier_get_fd(&intp->interrupt));
>> +
>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>> +        if (!ret) {
>> +            error_report("Error when clearing fd=%d (ret = %d)\n",
>> +                         event_notifier_get_fd(&intp->interrupt), ret);
>> +        }
>> +    } /* else this is a pending IRQ that moves to ACTIVE state */
>> +
>> +    intp->state = VFIO_IRQ_ACTIVE;
>> +
>> +    /* sets slow path */
>> +    vfio_mmap_set_enabled(vdev, false);
>> +
>> +    /* trigger the virtual IRQ */
>> +    qemu_set_irq(intp->qemuirq, 1);
>> +
>> +    /* schedule the mmap timer which will restore mmap path after EOI*/
>> +    if (vdev->mmap_timeout) {
>> +        timer_mod(vdev->mmap_timer,
>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                      vdev->mmap_timeout);
>> +    }
>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>> +}
>> +
>> +/**
>> + * vfio_start_eventfd_injection - starts the virtual IRQ injection using
>> + * user-side handled eventfds
>> + * @intp: the IRQ struct pointer
>> + */
>> +
>> +static int vfio_start_eventfd_injection(VFIOINTp *intp)
>> +{
>> +    int ret;
>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>> +
>> +    vfio_mask_irqindex(vbasedev, intp->pin);
>> +
>> +    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
>> +    if (ret) {
>> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
>> +        vfio_unmask_irqindex(vbasedev, intp->pin);
>> +        return ret;
>> +    }
>> +    vfio_unmask_irqindex(vbasedev, intp->pin);
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Functions used whatever the injection method
>> + */
>> +
>> +/**
>> + * vfio_set_trigger_eventfd - set VFIO eventfd handling
>> + * ie. program the VFIO driver to associates a given IRQ index
>> + * with a fd handler
>> + *
>> + * @intp: IRQ struct pointer
>> + * @handler: handler to be called on eventfd trigger
>> + */
>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>> +                                    eventfd_user_side_handler_t handler)
>> +{
>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>> +    struct vfio_irq_set *irq_set;
>> +    int argsz, ret;
>> +    int32_t *pfd;
>> +
>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>> +    irq_set = g_malloc0(argsz);
>> +    irq_set->argsz = argsz;
>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
>> +    irq_set->index = intp->pin;
>> +    irq_set->start = 0;
>> +    irq_set->count = 1;
>> +    pfd = (int32_t *)&irq_set->data;
>> +    *pfd = event_notifier_get_fd(&intp->interrupt);
>> +    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    g_free(irq_set);
>> +    if (ret < 0) {
>> +        error_report("vfio: Failed to set trigger eventfd: %m");
>> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
>> +    }
>> +    return ret;
>> +}
>> +
>> +/* not implemented yet */
>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
>> +{
>> +return false;
>> +}
>> +
>> +/* not implemented yet */
>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
>> +{
>> +return 0;
>> +}
>> +
>> +/**
>> + * vfio_init_intp - allocate, initialize the IRQ struct pointer
>> + * and add it into the list of IRQ
>> + * @vbasedev: the VFIO device
>> + * @index: VFIO device IRQ index
>> + */
>> +static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
>> +{
>> +    int ret;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
>> +    VFIOINTp *intp;
>> +
>> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
>> +    intp = g_malloc0(sizeof(*intp));
>> +    intp->vdev = vdev;
>> +    intp->pin = index;
>> +    intp->state = VFIO_IRQ_INACTIVE;
>> +    sysbus_init_irq(sbdev, &intp->qemuirq);
>> +
>> +    /* Get an eventfd for trigger */
>> +    ret = event_notifier_init(&intp->interrupt, 0);
>> +    if (ret) {
>> +        g_free(intp);
>> +        error_report("vfio: Error: trigger event_notifier_init failed ");
>> +        return NULL;
>> +    }
>> +
>> +    /* store the new intp in qlist */
>> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>> +    return intp;
>> +}
>> +
>> +/**
>> + * vfio_populate_device - initialize MMIO region and IRQ
>> + * @vbasedev: the VFIO device
>> + *
>> + * query the VFIO device for exposed MMIO regions and IRQ and
>> + * populate the associated fields in the device struct
>> + */
>> +static int vfio_populate_device(VFIODevice *vbasedev)
>> +{
>> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> +    VFIOINTp *intp;
>> +    int i, ret = 0;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
>> +        reg_info.index = i;
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>> +        if (ret) {
>> +            error_report("vfio: Error getting region %d info: %m", i);
>> +            goto error;
>> +        }
>> +        vdev->regions[i]->flags = reg_info.flags;
>> +        vdev->regions[i]->size = reg_info.size;
>> +        vdev->regions[i]->fd_offset = reg_info.offset;
>> +        vdev->regions[i]->nr = i;
>> +        vdev->regions[i]->vbasedev = vbasedev;
>> +
>> +        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
>> +                            (unsigned long)vdev->regions[i]->flags,
>> +                            (unsigned long)vdev->regions[i]->size,
>> +                            vdev->regions[i]->vbasedev->fd,
>> +                            (unsigned long)vdev->regions[i]->fd_offset);
>> +    }
>> +
>> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +                                    vfio_intp_mmap_enable, vdev);
>> +
>> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        irq.index = i;
>> +
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>> +        if (ret) {
>> +            error_printf("vfio: error getting device %s irq info",
>> +                         vbasedev->name);
>> +            return ret;
>> +        } else {
>> +            trace_vfio_platform_populate_interrupts(irq.index,
>> +                                                    irq.count,
>> +                                                    irq.flags);
>> +            intp = vfio_init_intp(vbasedev, irq.index);
>> +            if (!intp) {
>> +                error_report("vfio: Error installing IRQ %d up", i);
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +error:
>> +    return ret;
>> +}
>> +
>> +/*
>> + * vfio_start_irq_injection - associates a virtual irq to a
>> + * VFIO IRQ index and start the injection of this IRQ
>> + * @s: SysBus Device
>> + * @index: VFIO IRQ index
>> + * @virq: the virtual IRQ number, aka gsi
>> + *
>> + * this function is called when the device tree is built
>> + */
>> +static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
>> +{
>> +    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
>> +    VFIOINTp *intp;
>> +
>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>> +        if (intp->pin == index) {
>> +            intp->virtualID = virq;
>> +            vdev->start_irq_fn(intp);
>> +        }
>> +    }
>> +}
>> +
>> +/* specialized functions ofr VFIO Platform devices */
>> +static VFIODeviceOps vfio_platform_ops = {
>> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
>> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
>> +    .vfio_eoi = vfio_platform_eoi,
>> +    .vfio_populate_device = vfio_populate_device,
>> +};
>> +
>> +/**
>> + * vfio_base_device_init - implements some of the VFIO mechanics
>> + * @vbasedev: the VFIO device
>> + *
>> + * retrieves the group the device belongs to and get the device fd
>> + * returns the VFIO device fd
>> + * precondition: the device name must be initialized
>> + */
>> +static int vfio_base_device_init(VFIODevice *vbasedev)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev_iter;
>> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>> +    ssize_t len;
>> +    struct stat st;
>> +    int groupid;
>> +    int ret;
>> +
>> +    /* name must be set prior to the call */
>> +    if (!vbasedev->name) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Check that the host device exists */
>> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
>> +             vbasedev->name);
>> +
>> +    if (stat(path, &st) < 0) {
>> +        error_report("vfio: error: no such host device: %s", path);
>> +        return -errno;
>> +    }
>> +
>> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>> +    len = readlink(path, iommu_group_path, sizeof(path));
>> +    if (len <= 0 || len >= sizeof(path)) {
>> +        error_report("vfio: error no iommu_group for device");
>> +        return len < 0 ? -errno : ENAMETOOLONG;
>> +    }
>> +
>> +    iommu_group_path[len] = 0;
>> +    group_name = basename(iommu_group_path);
>> +
>> +    if (sscanf(group_name, "%d", &groupid) != 1) {
>> +        error_report("vfio: error reading %s: %m", path);
>> +        return -errno;
>> +    }
>> +
>> +    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
>> +
>> +    group = vfio_get_group(groupid, &address_space_memory);
>> +    if (!group) {
>> +        error_report("vfio: failed to get group %d", groupid);
>> +        return -ENOENT;
>> +    }
>> +
>> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
>> +
>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>> +            error_report("vfio: error: device %s is already attached", path);
>> +            vfio_put_group(group);
>> +            return -EBUSY;
>> +        }
>> +    }
>> +    ret = vfio_get_device(group, path, vbasedev);
>> +    if (ret) {
>> +        error_report("vfio: failed to get device %s", path);
>> +        vfio_put_group(group);
>> +    }
>> +    return ret;
>> +}
>> +
>> +/**
>> + * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
>> + * given index
>> + * @vdev: the VFIO platform device
>> + * @nr: the index of the region
>> + *
>> + * init the top memory region and the mmapped memroy region beneath
>> + * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
>> + * and could not be passed to memory region functions
>> +*/
>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
>> +{
>> +    VFIORegion *region = vdev->regions[nr];
>> +    unsigned size = region->size;
>> +    char name[64];
>> +
>> +    if (!size) {
>> +        return;
>> +    }
>> +
>> +    snprintf(name, sizeof(name), "VFIO %s region %d",
>> +             vdev->vbasedev.name, nr);
>> +
>> +    /* A "slow" read/write mapping underlies all regions */
>> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
>> +                          region, name, size);
>> +
>> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>> +
>> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
>> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
>> +        error_report("%s unsupported. Performance may be slow", name);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_platform_realize  - the device realize function
>> + * @dev: device state pointer
>> + * @errp: error
>> + *
>> + * initialize the device, its memory regions and IRQ structures
>> + * IRQ are started separately
>> + */
>> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
>> +{
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    int i, ret;
>> +
>> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
>> +    vbasedev->ops = &vfio_platform_ops;
>> +    vdev->start_irq_fn = vfio_start_eventfd_injection;
>> +
>> +    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
>> +
>> +    ret = vfio_base_device_init(vbasedev);
>> +    if (ret) {
>> +        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
>> +                   vbasedev->name);
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vfio_map_region(vdev, i);
>> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
>> +    }
>> +}
>> +
>> +/*
>> + * Mechanics to program/start irq injection on machine init done notifier:
>> + * this is needed since at finalize time, the device IRQ are not yet
>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>> + * always is used. Binding to the platform bus IRQ happens on a machine
>> + * init done notifier registered by the machine file. After its execution
>> + * we execute a new notifier that actually starts the injection. When using
>> + * irqfd, programming the injection consists in associating eventfds to
>> + * GSI number,ie. virtual IRQ number
>> + */
>> +
>> +typedef struct VfioIrqStarterNotifierParams {
>> +    unsigned int platform_bus_first_irq;
>> +    Notifier notifier;
>> +} VfioIrqStarterNotifierParams;
>> +
>> +typedef struct VfioIrqStartParams {
>> +    PlatformBusDevice *pbus;
>> +    int platform_bus_first_irq;
>> +} VfioIrqStartParams;
>> +
>> +/* Start injection of IRQ for a specific VFIO device */
>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    int i;
>> +    VfioIrqStartParams *p = opaque;
>> +    VFIOPlatformDevice *vdev;
>> +    VFIODevice *vbasedev;
>> +    uint64_t irq_number;
>> +    PlatformBusDevice *pbus = p->pbus;
>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>> +
>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +        vbasedev = &vdev->vbasedev;
>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>> +                             + platform_bus_first_irq;
>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* loop on all VFIO platform devices and start their IRQ injection */
>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>> +{
>> +    VfioIrqStarterNotifierParams *p =
>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>> +    DeviceState *dev =
>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>> +
>> +    if (pbus->done_gathering) {
>> +        VfioIrqStartParams data = {
>> +            .pbus = pbus,
>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>> +        };
>> +
>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>> +    }
>> +}
>> +
>> +/* registers the machine init done notifier that will start VFIO IRQ */
>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>> +{
>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>> +
>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>> +    p->notifier.notify = vfio_irq_starter_notify;
>> +    qemu_add_machine_init_done_notifier(&p->notifier);
> 
> Could you add a notifier for each device instead? Then the notifier
> would be part of the vfio device struct and not some dangling random
> pointer :).
> 
> Of course instead of foreach_dynamic_sysbus_device() you would directly
> know the device you're dealing with and only handle a single device per
> notifier.

Hi Alex,

Indeed I can do that and put the foreach in the machine file instead.
This means however more code in virt.c, in the create_platform_bus
function. If Peter agrees with that I will proceed.

I take the opportunity to ask a question I did not dare to ask yet about
qemu_irq ;-). Wouldn't it make sense to create an accessor to be able to
retrieve the IRQ number (n field). Indeed I currently do some gym to
pass the platform bus first irq and it would be definitively simpler to
directly retrieve n from qemu_irq. Besides I think we also have this
need when setting up irqfd for vhost net to associate the gsi with guest
notifier.

Thank you in advance

Best Regards

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-11-05 10:59   ` Alexander Graf
@ 2014-11-05 12:31     ` Eric Auger
  2014-11-05 22:23       ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-05 12:31 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/05/2014 11:59 AM, Alexander Graf wrote:
> 
> 
> On 31.10.14 15:05, Eric Auger wrote:
>> vfio-calxeda-xgmac now can be instantiated using the -device option.
>> The node creation function generates a very basic dt node composed
>> of the compat, reg and interrupts properties
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v6 -> v7:
>> - compat string re-formatting removed since compat string is not exposed
>>   anymore as a user option
>> - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
>>   device
>> ---
>>  hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 88 insertions(+)
>>
>> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
>> index d5476f1..f8b310b 100644
>> --- a/hw/arm/sysbus-fdt.c
>> +++ b/hw/arm/sysbus-fdt.c
>> @@ -27,6 +27,8 @@
>>  #include "hw/platform-bus.h"
>>  #include "sysemu/sysemu.h"
>>  #include "hw/platform-bus.h"
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>>  
>>  /*
>>   * internal struct that contains the information to create dynamic
>> @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
>>      int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
>>  } NodeCreationPair;
>>  
>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
>> +
>>  /* list of supported dynamic sysbus devices */
>>  NodeCreationPair add_fdt_node_functions[] = {
>> +        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
>>          {"", NULL}, /*last element*/
>>  };
> 
> Can you maybe place the list somewhere smartly to make sure we don't
> need forward declarations? Either put it in between the "generic" and
> "device specific" code or at the end of the file with a single forward
> declaration for the array?

sure
> 
>>  
>> @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
>>  }
>>  
>>  /**
>> + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
>> + *
>> + * set properties are:
>> + * - compatible string
>> + * - regs
>> + * - interrupts
>> + */
>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    PlatformBusFdtData *data = opaque;
>> +    PlatformBusDevice *pbus = data->pbus;
>> +    void *fdt = data->fdt;
>> +    const char *parent_node = data->pbus_node_name;
>> +    int compat_str_len;
>> +    char *nodename;
>> +    int i, ret;
>> +    uint32_t *irq_attr;
>> +    uint64_t *reg_attr;
>> +    uint64_t mmio_base;
>> +    uint64_t irq_number;
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    Object *obj = OBJECT(sbdev);
>> +
>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>> +
>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>> +                               vbasedev->name,
>> +                               mmio_base);
>> +
>> +    qemu_fdt_add_subnode(fdt, nodename);
>> +
>> +    compat_str_len = strlen(vdev->compat) + 1;
>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>> +                          vdev->compat, compat_str_len);
> 
> What if there are multiple compatibles?
My purpose here was absolutely not to come back again on a proposal
where we could have a generic node creation. I understand that it is not
realistic. I rather tried to put some common property creation in this
function but you're right even the interrupt prop depend on the device.

About your question, I think the specialized VFIO device would set its
compat string including the various substrings. This was done in the
past for PL330 which required arm,pl330;arm,primecell.

> 
>> +
>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
>> +        reg_attr[4*i] = 1;
> 
> What is the 1 here?
address-cells? since the bus is < 4GB, 1 32b reg is required to specify
the base address. But since you put #size-cells already in the parent
node maybe I can remove it.

> 
>> +        reg_attr[4*i+1] = mmio_base;
>> +        reg_attr[4*i+2] = 1;
> 
> and here?
size-cells for this reg. same remark as above
> 
>> +        reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
>> +    }
>> +
>> +    ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
>> +                     vbasedev->num_regions*2, reg_attr);
>> +    if (ret < 0) {
>> +        error_report("could not set reg property of node %s", nodename);
>> +        goto fail;
>> +    }
>> +
>> +    irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        irq_number = platform_bus_get_irqn(pbus, sbdev , i)
>> +                         + data->irq_start;
>> +        irq_attr[3*i] = cpu_to_be32(0);
>> +        irq_attr[3*i+1] = cpu_to_be32(irq_number);
>> +        irq_attr[3*i+2] = cpu_to_be32(0x4);
> 
> Why 0x4? How do you know whether an IRQ is edge or level triggered?

this is indeed device specific. In the future I might be able to read
the host dt using Antonios patch but I did not want to add new feature
now and add extra dependencies as per the discussion we had with Alex W.
> 
> I'm still not convinced we can make anything "generic" on the VFIO path.
> How about you call the function xgmac specific for now, but keep the
> code as dynamic as it is?

yes I will.
> 
> 
> Alex
> 
>> +    }
>> +
>> +   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
>> +                     irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
>> +    if (ret < 0) {
>> +        error_report("could not set interrupts property of node %s",
>> +                     nodename);
>> +        goto fail;
>> +    }
>> +
>> +    g_free(nodename);
>> +    g_free(irq_attr);
>> +    g_free(reg_attr);
>> +
>> +    return 0;
>> +
>> +fail:
>> +
>> +   return -1;
>> +}
>> +
>> +/**
>>   * add_all_platform_bus_fdt_nodes - create all the platform bus nodes
>>   *
>>   * builds the parent platform bus node and all the nodes of dynamic
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-05 12:03     ` Eric Auger
@ 2014-11-05 13:05       ` Alexander Graf
  0 siblings, 0 replies; 43+ messages in thread
From: Alexander Graf @ 2014-11-05 13:05 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 05.11.14 13:03, Eric Auger wrote:
> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>
>>
>> On 31.10.14 15:05, Eric Auger wrote:
>>> Minimal VFIO platform implementation supporting
>>> - register space user mapping,
>>> - IRQ assignment based on eventfds handled on qemu side.
>>>
>>> irqfd kernel acceleration comes in a subsequent patch.
>>>
>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>> v6 -> v7:
>>> - compat is not exposed anymore as a user option. Rationale is
>>>   the vfio device became abstract and a specialization is needed
>>>   anyway. The derived device must set the compat string.
>>> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
>>>   A new function dubbed vfio_register_irq_starter replaces it. It
>>>   registers a machine init done notifier that programs & starts
>>>   all dynamic VFIO device IRQs. This function is supposed to be
>>>   called by the machine file. A set of static helper routines are
>>>   added too. It must be called before the creation of the platform
>>>   bus device.
>>>
>>> v5 -> v6:
>>> - vfio_device property renamed into host property
>>> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
>>>   and remove PCI related comment
>>> - remove declaration of vfio_setup_irqfd and irqfd_allowed
>>>   property.Both belong to next patch (irqfd)
>>> - remove declaration of vfio_intp_interrupt in vfio-platform.h
>>> - functions that can be static get this characteristic
>>> - remove declarations of vfio_region_ops, vfio_memory_listener,
>>>   group_list, vfio_address_spaces. All are moved to vfio-common.h
>>> - remove vfio_put_device declaration and definition
>>> - print_regions removed. code moved into vfio_populate_regions
>>> - replace DPRINTF by trace events
>>> - new helper routine to set the trigger eventfd
>>> - dissociate intp init from the injection enablement:
>>>   vfio_enable_intp renamed into vfio_init_intp and new function
>>>   named vfio_start_eventfd_injection
>>> - injection start moved to vfio_start_irq_injection (not anymore
>>>   in vfio_populate_interrupt)
>>> - new start_irq_fn field in VFIOPlatformDevice corresponding to
>>>   the function that will be used for starting injection
>>> - user handled eventfd:
>>>   x add mutex to protect IRQ state & list manipulation,
>>>   x correct misleading comment in vfio_intp_interrupt.
>>>   x Fix bugs thanks to fake interrupt modality
>>> - VFIOPlatformDeviceClass becomes abstract
>>> - add error_setg in vfio_platform_realize
>>>
>>> v4 -> v5:
>>> - vfio-plaform.h included first
>>> - cleanup error handling in *populate*, vfio_get_device,
>>>   vfio_enable_intp
>>> - vfio_put_device not called anymore
>>> - add some includes to follow vfio policy
>>>
>>> v3 -> v4:
>>> [Eric Auger]
>>> - merge of "vfio: Add initial IRQ support in platform device"
>>>   to get a full functional patch although perfs are limited.
>>> - removal of unrealize function since I currently understand
>>>   it is only used with device hot-plug feature.
>>>
>>> v2 -> v3:
>>> [Eric Auger]
>>> - further factorization between PCI and platform (VFIORegion,
>>>   VFIODevice). same level of functionality.
>>>
>>> <= v2:
>>> [Kim Philipps]
>>> - Initial Creation of the device supporting register space mapping
>>> ---
>>>  hw/vfio/Makefile.objs           |   1 +
>>>  hw/vfio/platform.c              | 672 ++++++++++++++++++++++++++++++++++++++++
>>>  include/hw/vfio/vfio-common.h   |   1 +
>>>  include/hw/vfio/vfio-platform.h |  87 ++++++
>>>  trace-events                    |  12 +
>>>  5 files changed, 773 insertions(+)
>>>  create mode 100644 hw/vfio/platform.c
>>>  create mode 100644 include/hw/vfio/vfio-platform.h
>>>
>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>> index e31f30e..c5c76fe 100644
>>> --- a/hw/vfio/Makefile.objs
>>> +++ b/hw/vfio/Makefile.objs
>>> @@ -1,4 +1,5 @@
>>>  ifeq ($(CONFIG_LINUX), y)
>>>  obj-$(CONFIG_SOFTMMU) += common.o
>>>  obj-$(CONFIG_PCI) += pci.o
>>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>>  endif
>>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>>> new file mode 100644
>>> index 0000000..9f66610
>>> --- /dev/null
>>> +++ b/hw/vfio/platform.c
>>> @@ -0,0 +1,672 @@
>>> +/*
>>> + * vfio based device assignment support - platform devices
>>> + *
>>> + * Copyright Linaro Limited, 2014
>>> + *
>>> + * Authors:
>>> + *  Kim Phillips <kim.phillips@linaro.org>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>> + * the COPYING file in the top-level directory.
>>> + *
>>> + * Based on vfio based PCI device assignment support:
>>> + *  Copyright Red Hat, Inc. 2012
>>> + */
>>> +
>>> +#include <linux/vfio.h>
>>> +#include <sys/ioctl.h>
>>> +
>>> +#include "hw/vfio/vfio-platform.h"
>>> +#include "qemu/error-report.h"
>>> +#include "qemu/range.h"
>>> +#include "sysemu/sysemu.h"
>>> +#include "exec/memory.h"
>>> +#include "qemu/queue.h"
>>> +#include "hw/sysbus.h"
>>> +#include "trace.h"
>>> +#include "hw/platform-bus.h"
>>> +
>>> +static void vfio_intp_interrupt(VFIOINTp *intp);
>>> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
>>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>>> +                                    eventfd_user_side_handler_t handler);
>>> +
>>> +/*
>>> + * Functions only used when eventfd are handled on user-side
>>> + * ie. without irqfd
>>> + */
>>> +
>>> +/**
>>> + * vfio_platform_eoi - IRQ completion routine
>>> + * @vbasedev: the VFIO device
>>> + *
>>> + * de-asserts the active virtual IRQ and unmask the physical IRQ
>>> + * (masked by the  VFIO driver). Handle pending IRQs if any.
>>> + * eoi function is called on the first access to any MMIO region
>>> + * after an IRQ was triggered. It is assumed this access corresponds
>>> + * to the IRQ status register reset. With such a mechanism, a single
>>> + * IRQ can be handled at a time since there is no way to know which
>>> + * IRQ was completed by the guest (we would need additional details
>>> + * about the IRQ status register mask)
>>> + */
>>> +static void vfio_platform_eoi(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOINTp *intp;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +
>>> +    qemu_mutex_lock(&vdev->intp_mutex);
>>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>>> +        if (intp->state == VFIO_IRQ_ACTIVE) {
>>> +            trace_vfio_platform_eoi(intp->pin,
>>> +                                event_notifier_get_fd(&intp->interrupt));
>>> +            intp->state = VFIO_IRQ_INACTIVE;
>>> +
>>> +            /* deassert the virtual IRQ and unmask physical one */
>>> +            qemu_set_irq(intp->qemuirq, 0);
>>> +            vfio_unmask_irqindex(vbasedev, intp->pin);
>>> +
>>> +            /* a single IRQ can be active at a time */
>>> +            break;
>>> +        }
>>> +    }
>>> +    /* in case there are pending IRQs, handle them one at a time */
>>> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
>>> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
>>> +        trace_vfio_platform_eoi_handle_pending(intp->pin);
>>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>>> +        vfio_intp_interrupt(intp);
>>> +        qemu_mutex_lock(&vdev->intp_mutex);
>>> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
>>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>>> +    } else {
>>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * vfio_mmap_set_enabled - enable/disable the fast path mode
>>> + * @vdev: the VFIO platform device
>>> + * @enabled: the target mmap state
>>> + *
>>> + * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
>>> + * false ~ slow path = MMIO region is trapped and region callbacks
>>> + * are called slow path enables to trap the IRQ status register
>>> + * guest reset
>>> +*/
>>> +
>>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
>>> +{
>>> +    VFIORegion *region;
>>> +    int i;
>>> +
>>> +    trace_vfio_platform_mmap_set_enabled(enabled);
>>> +
>>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>>> +        region = vdev->regions[i];
>>> +
>>> +        /* register space is unmapped to trap EOI */
>>> +        memory_region_set_enabled(&region->mmap_mem, enabled);
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * vfio_intp_mmap_enable - timer function, restores the fast path
>>> + * if there is no more active IRQ
>>> + * @opaque: actually points to the VFIO platform device
>>> + *
>>> + * Called on mmap timer timout, this function checks whether the
>>> + * IRQ is still active and in the negative restores the fast path.
>>> + * by construction a single eventfd is handled at a time.
>>> + * if the IRQ is still active, the timer is restarted.
>>> + */
>>> +static void vfio_intp_mmap_enable(void *opaque)
>>> +{
>>> +    VFIOINTp *tmp;
>>> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
>>> +
>>> +    qemu_mutex_lock(&vdev->intp_mutex);
>>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>>> +            trace_vfio_platform_intp_mmap_enable(tmp->pin);
>>> +            /* re-program the timer to check active status later */
>>> +            timer_mod(vdev->mmap_timer,
>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>> +                          vdev->mmap_timeout);
>>> +            qemu_mutex_unlock(&vdev->intp_mutex);
>>> +            return;
>>> +        }
>>> +    }
>>> +    vfio_mmap_set_enabled(vdev, true);
>>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>>> +}
>>> +
>>> +/**
>>> + * vfio_intp_interrupt - The user-side eventfd handler
>>> + * @opaque: opaque pointer which in practice is the VFIOINTp*
>>> + *
>>> + * the function can be entered
>>> + * - in event handler context: this IRQ is inactive
>>> + *   in that case, the vIRQ is injected into the guest if there
>>> + *   is no other active or pending IRQ.
>>> + * - in IOhandler context: this IRQ is pending.
>>> + *   there is no ACTIVE IRQ
>>> + */
>>> +static void vfio_intp_interrupt(VFIOINTp *intp)
>>> +{
>>> +    int ret;
>>> +    VFIOINTp *tmp;
>>> +    VFIOPlatformDevice *vdev = intp->vdev;
>>> +    bool delay_handling = false;
>>> +
>>> +    qemu_mutex_lock(&vdev->intp_mutex);
>>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>>> +        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>>> +            if (tmp->state == VFIO_IRQ_ACTIVE ||
>>> +                tmp->state == VFIO_IRQ_PENDING) {
>>> +                delay_handling = true;
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +    if (delay_handling) {
>>> +        /*
>>> +         * the new IRQ gets a pending status and is pushed in
>>> +         * the pending queue
>>> +         */
>>> +        intp->state = VFIO_IRQ_PENDING;
>>> +        trace_vfio_intp_interrupt_set_pending(intp->pin);
>>> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
>>> +                             intp, pqnext);
>>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>>> +        return;
>>> +    }
>>> +
>>> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
>>> +    trace_vfio_platform_intp_interrupt(intp->pin,
>>> +                              event_notifier_get_fd(&intp->interrupt));
>>> +
>>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>>> +        if (!ret) {
>>> +            error_report("Error when clearing fd=%d (ret = %d)\n",
>>> +                         event_notifier_get_fd(&intp->interrupt), ret);
>>> +        }
>>> +    } /* else this is a pending IRQ that moves to ACTIVE state */
>>> +
>>> +    intp->state = VFIO_IRQ_ACTIVE;
>>> +
>>> +    /* sets slow path */
>>> +    vfio_mmap_set_enabled(vdev, false);
>>> +
>>> +    /* trigger the virtual IRQ */
>>> +    qemu_set_irq(intp->qemuirq, 1);
>>> +
>>> +    /* schedule the mmap timer which will restore mmap path after EOI*/
>>> +    if (vdev->mmap_timeout) {
>>> +        timer_mod(vdev->mmap_timer,
>>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>> +                      vdev->mmap_timeout);
>>> +    }
>>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>>> +}
>>> +
>>> +/**
>>> + * vfio_start_eventfd_injection - starts the virtual IRQ injection using
>>> + * user-side handled eventfds
>>> + * @intp: the IRQ struct pointer
>>> + */
>>> +
>>> +static int vfio_start_eventfd_injection(VFIOINTp *intp)
>>> +{
>>> +    int ret;
>>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>>> +
>>> +    vfio_mask_irqindex(vbasedev, intp->pin);
>>> +
>>> +    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
>>> +    if (ret) {
>>> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
>>> +        vfio_unmask_irqindex(vbasedev, intp->pin);
>>> +        return ret;
>>> +    }
>>> +    vfio_unmask_irqindex(vbasedev, intp->pin);
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>> + * Functions used whatever the injection method
>>> + */
>>> +
>>> +/**
>>> + * vfio_set_trigger_eventfd - set VFIO eventfd handling
>>> + * ie. program the VFIO driver to associates a given IRQ index
>>> + * with a fd handler
>>> + *
>>> + * @intp: IRQ struct pointer
>>> + * @handler: handler to be called on eventfd trigger
>>> + */
>>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>>> +                                    eventfd_user_side_handler_t handler)
>>> +{
>>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>>> +    struct vfio_irq_set *irq_set;
>>> +    int argsz, ret;
>>> +    int32_t *pfd;
>>> +
>>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +    irq_set = g_malloc0(argsz);
>>> +    irq_set->argsz = argsz;
>>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
>>> +    irq_set->index = intp->pin;
>>> +    irq_set->start = 0;
>>> +    irq_set->count = 1;
>>> +    pfd = (int32_t *)&irq_set->data;
>>> +    *pfd = event_notifier_get_fd(&intp->interrupt);
>>> +    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
>>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>> +    g_free(irq_set);
>>> +    if (ret < 0) {
>>> +        error_report("vfio: Failed to set trigger eventfd: %m");
>>> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +/* not implemented yet */
>>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
>>> +{
>>> +return false;
>>> +}
>>> +
>>> +/* not implemented yet */
>>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
>>> +{
>>> +return 0;
>>> +}
>>> +
>>> +/**
>>> + * vfio_init_intp - allocate, initialize the IRQ struct pointer
>>> + * and add it into the list of IRQ
>>> + * @vbasedev: the VFIO device
>>> + * @index: VFIO device IRQ index
>>> + */
>>> +static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
>>> +{
>>> +    int ret;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
>>> +    VFIOINTp *intp;
>>> +
>>> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
>>> +    intp = g_malloc0(sizeof(*intp));
>>> +    intp->vdev = vdev;
>>> +    intp->pin = index;
>>> +    intp->state = VFIO_IRQ_INACTIVE;
>>> +    sysbus_init_irq(sbdev, &intp->qemuirq);
>>> +
>>> +    /* Get an eventfd for trigger */
>>> +    ret = event_notifier_init(&intp->interrupt, 0);
>>> +    if (ret) {
>>> +        g_free(intp);
>>> +        error_report("vfio: Error: trigger event_notifier_init failed ");
>>> +        return NULL;
>>> +    }
>>> +
>>> +    /* store the new intp in qlist */
>>> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>>> +    return intp;
>>> +}
>>> +
>>> +/**
>>> + * vfio_populate_device - initialize MMIO region and IRQ
>>> + * @vbasedev: the VFIO device
>>> + *
>>> + * query the VFIO device for exposed MMIO regions and IRQ and
>>> + * populate the associated fields in the device struct
>>> + */
>>> +static int vfio_populate_device(VFIODevice *vbasedev)
>>> +{
>>> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
>>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>>> +    VFIOINTp *intp;
>>> +    int i, ret = 0;
>>> +    VFIOPlatformDevice *vdev =
>>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>>> +
>>> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
>>> +        reg_info.index = i;
>>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>>> +        if (ret) {
>>> +            error_report("vfio: Error getting region %d info: %m", i);
>>> +            goto error;
>>> +        }
>>> +        vdev->regions[i]->flags = reg_info.flags;
>>> +        vdev->regions[i]->size = reg_info.size;
>>> +        vdev->regions[i]->fd_offset = reg_info.offset;
>>> +        vdev->regions[i]->nr = i;
>>> +        vdev->regions[i]->vbasedev = vbasedev;
>>> +
>>> +        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
>>> +                            (unsigned long)vdev->regions[i]->flags,
>>> +                            (unsigned long)vdev->regions[i]->size,
>>> +                            vdev->regions[i]->vbasedev->fd,
>>> +                            (unsigned long)vdev->regions[i]->fd_offset);
>>> +    }
>>> +
>>> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>>> +                                    vfio_intp_mmap_enable, vdev);
>>> +
>>> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>>> +
>>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>>> +        irq.index = i;
>>> +
>>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>>> +        if (ret) {
>>> +            error_printf("vfio: error getting device %s irq info",
>>> +                         vbasedev->name);
>>> +            return ret;
>>> +        } else {
>>> +            trace_vfio_platform_populate_interrupts(irq.index,
>>> +                                                    irq.count,
>>> +                                                    irq.flags);
>>> +            intp = vfio_init_intp(vbasedev, irq.index);
>>> +            if (!intp) {
>>> +                error_report("vfio: Error installing IRQ %d up", i);
>>> +                return ret;
>>> +            }
>>> +        }
>>> +    }
>>> +    return 0;
>>> +error:
>>> +    return ret;
>>> +}
>>> +
>>> +/*
>>> + * vfio_start_irq_injection - associates a virtual irq to a
>>> + * VFIO IRQ index and start the injection of this IRQ
>>> + * @s: SysBus Device
>>> + * @index: VFIO IRQ index
>>> + * @virq: the virtual IRQ number, aka gsi
>>> + *
>>> + * this function is called when the device tree is built
>>> + */
>>> +static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
>>> +{
>>> +    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
>>> +    VFIOINTp *intp;
>>> +
>>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>>> +        if (intp->pin == index) {
>>> +            intp->virtualID = virq;
>>> +            vdev->start_irq_fn(intp);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/* specialized functions ofr VFIO Platform devices */
>>> +static VFIODeviceOps vfio_platform_ops = {
>>> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
>>> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
>>> +    .vfio_eoi = vfio_platform_eoi,
>>> +    .vfio_populate_device = vfio_populate_device,
>>> +};
>>> +
>>> +/**
>>> + * vfio_base_device_init - implements some of the VFIO mechanics
>>> + * @vbasedev: the VFIO device
>>> + *
>>> + * retrieves the group the device belongs to and get the device fd
>>> + * returns the VFIO device fd
>>> + * precondition: the device name must be initialized
>>> + */
>>> +static int vfio_base_device_init(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOGroup *group;
>>> +    VFIODevice *vbasedev_iter;
>>> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>>> +    ssize_t len;
>>> +    struct stat st;
>>> +    int groupid;
>>> +    int ret;
>>> +
>>> +    /* name must be set prior to the call */
>>> +    if (!vbasedev->name) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /* Check that the host device exists */
>>> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
>>> +             vbasedev->name);
>>> +
>>> +    if (stat(path, &st) < 0) {
>>> +        error_report("vfio: error: no such host device: %s", path);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>>> +    len = readlink(path, iommu_group_path, sizeof(path));
>>> +    if (len <= 0 || len >= sizeof(path)) {
>>> +        error_report("vfio: error no iommu_group for device");
>>> +        return len < 0 ? -errno : ENAMETOOLONG;
>>> +    }
>>> +
>>> +    iommu_group_path[len] = 0;
>>> +    group_name = basename(iommu_group_path);
>>> +
>>> +    if (sscanf(group_name, "%d", &groupid) != 1) {
>>> +        error_report("vfio: error reading %s: %m", path);
>>> +        return -errno;
>>> +    }
>>> +
>>> +    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
>>> +
>>> +    group = vfio_get_group(groupid, &address_space_memory);
>>> +    if (!group) {
>>> +        error_report("vfio: failed to get group %d", groupid);
>>> +        return -ENOENT;
>>> +    }
>>> +
>>> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
>>> +
>>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>>> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>>> +            error_report("vfio: error: device %s is already attached", path);
>>> +            vfio_put_group(group);
>>> +            return -EBUSY;
>>> +        }
>>> +    }
>>> +    ret = vfio_get_device(group, path, vbasedev);
>>> +    if (ret) {
>>> +        error_report("vfio: failed to get device %s", path);
>>> +        vfio_put_group(group);
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +/**
>>> + * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
>>> + * given index
>>> + * @vdev: the VFIO platform device
>>> + * @nr: the index of the region
>>> + *
>>> + * init the top memory region and the mmapped memroy region beneath
>>> + * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
>>> + * and could not be passed to memory region functions
>>> +*/
>>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
>>> +{
>>> +    VFIORegion *region = vdev->regions[nr];
>>> +    unsigned size = region->size;
>>> +    char name[64];
>>> +
>>> +    if (!size) {
>>> +        return;
>>> +    }
>>> +
>>> +    snprintf(name, sizeof(name), "VFIO %s region %d",
>>> +             vdev->vbasedev.name, nr);
>>> +
>>> +    /* A "slow" read/write mapping underlies all regions */
>>> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
>>> +                          region, name, size);
>>> +
>>> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>>> +
>>> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
>>> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
>>> +        error_report("%s unsupported. Performance may be slow", name);
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * vfio_platform_realize  - the device realize function
>>> + * @dev: device state pointer
>>> + * @errp: error
>>> + *
>>> + * initialize the device, its memory regions and IRQ structures
>>> + * IRQ are started separately
>>> + */
>>> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
>>> +{
>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
>>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +    int i, ret;
>>> +
>>> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
>>> +    vbasedev->ops = &vfio_platform_ops;
>>> +    vdev->start_irq_fn = vfio_start_eventfd_injection;
>>> +
>>> +    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
>>> +
>>> +    ret = vfio_base_device_init(vbasedev);
>>> +    if (ret) {
>>> +        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
>>> +                   vbasedev->name);
>>> +        return;
>>> +    }
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        vfio_map_region(vdev, i);
>>> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>> + * this is needed since at finalize time, the device IRQ are not yet
>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>> + * init done notifier registered by the machine file. After its execution
>>> + * we execute a new notifier that actually starts the injection. When using
>>> + * irqfd, programming the injection consists in associating eventfds to
>>> + * GSI number,ie. virtual IRQ number
>>> + */
>>> +
>>> +typedef struct VfioIrqStarterNotifierParams {
>>> +    unsigned int platform_bus_first_irq;
>>> +    Notifier notifier;
>>> +} VfioIrqStarterNotifierParams;
>>> +
>>> +typedef struct VfioIrqStartParams {
>>> +    PlatformBusDevice *pbus;
>>> +    int platform_bus_first_irq;
>>> +} VfioIrqStartParams;
>>> +
>>> +/* Start injection of IRQ for a specific VFIO device */
>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +    int i;
>>> +    VfioIrqStartParams *p = opaque;
>>> +    VFIOPlatformDevice *vdev;
>>> +    VFIODevice *vbasedev;
>>> +    uint64_t irq_number;
>>> +    PlatformBusDevice *pbus = p->pbus;
>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>> +
>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +        vbasedev = &vdev->vbasedev;
>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>> +                             + platform_bus_first_irq;
>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>> +{
>>> +    VfioIrqStarterNotifierParams *p =
>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>> +    DeviceState *dev =
>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>> +
>>> +    if (pbus->done_gathering) {
>>> +        VfioIrqStartParams data = {
>>> +            .pbus = pbus,
>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>> +        };
>>> +
>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>> +    }
>>> +}
>>> +
>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>> +{
>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>> +
>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>
>> Could you add a notifier for each device instead? Then the notifier
>> would be part of the vfio device struct and not some dangling random
>> pointer :).
>>
>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>> know the device you're dealing with and only handle a single device per
>> notifier.
> 
> Hi Alex,
> 
> Indeed I can do that and put the foreach in the machine file instead.
> This means however more code in virt.c, in the create_platform_bus
> function. If Peter agrees with that I will proceed.
> 
> I take the opportunity to ask a question I did not dare to ask yet about
> qemu_irq ;-). Wouldn't it make sense to create an accessor to be able to
> retrieve the IRQ number (n field). Indeed I currently do some gym to
> pass the platform bus first irq and it would be definitively simpler to
> directly retrieve n from qemu_irq. Besides I think we also have this
> need when setting up irqfd for vhost net to associate the gsi with guest
> notifier.

No, a qemu_irq object only knows the connection it establishes. The
bigger picture of what number it has is bus / machine specific. That's
what I added the easy platform_bus_get_irqn() helper for ;).


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice
  2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
@ 2014-11-05 17:35   ` Alex Williamson
  2014-11-06  8:38     ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alex Williamson @ 2014-11-05 17:35 UTC (permalink / raw)
  To: Eric Auger
  Cc: joel.schopp, kim.phillips, eric.auger, a.rigo, peter.maydell,
	manish.jaggi, patches, will.deacon, qemu-devel, agraf,
	Bharat.Bhushan, stuart.yoder, a.motakis, pbonzini, kvmarm,
	christoffer.dall

Hi Eric,

On Fri, 2014-10-31 at 14:05 +0000, Eric Auger wrote:
> Introduce the VFIODevice struct that is going to be shared by
> VFIOPCIDevice and VFIOPlatformDevice.
> 
> Additional fields will be added there later on for review
> convenience.
> 
> the group's device_list becomes a list of VFIODevice
> 
> This obliges to rework the reset_handler which becomes generic and
> calls VFIODevice ops that are specialized in each parent object.
> Also functions that iterate on this list must take care that the
> devices can be something else than VFIOPCIDevice. The type is used
> to discriminate them.
> 
> we profit from this step to change the prototype of
> vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
> apply to VFIODevice. They are renamed as *_irqindex.
> The index is passed as parameter to anticipate their usage for
> platform IRQs

I cringe when reviewers tell me this, so I apologize in advance, but
there are logically at least 4 separate things happening in this patch:

1) VFIODevice
2) VFIODeviceOps
3) irqindex conversions
4) strcmp(name) vs comparing ssss:bb:dd.f

I don't really see any dependencies between them, and
I think they'd also be easier to review as 4 separate patches.  More
below...

> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v4->v5:
> - fix style issues
> - in vfio_initfn, rework allocation of vdev->vbasedev.name and
>   replace snprintf by g_strdup_printf
> ---
>  hw/vfio/pci.c | 241 +++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 147 insertions(+), 94 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 93181bf..0531744 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -48,6 +48,11 @@
>  #define VFIO_ALLOW_KVM_MSI 1
>  #define VFIO_ALLOW_KVM_MSIX 1
>  
> +enum {
> +    VFIO_DEVICE_TYPE_PCI = 0,
> +    VFIO_DEVICE_TYPE_PLATFORM = 1,

VFIO_DEVICE_TYPE_PLATFORM gets dropped in patch 8 and re-added in patch
9.  Let's remove it here and let it's first appearance be in patch 9.

> +};
> +
>  struct VFIOPCIDevice;
>  
>  typedef struct VFIOQuirk {
> @@ -185,9 +190,27 @@ typedef struct VFIOMSIXInfo {
>      void *mmap;
>  } VFIOMSIXInfo;
>  
> +typedef struct VFIODeviceOps VFIODeviceOps;
> +
> +typedef struct VFIODevice {
> +    QLIST_ENTRY(VFIODevice) next;
> +    struct VFIOGroup *group;
> +    char *name;
> +    int fd;
> +    int type;
> +    bool reset_works;
> +    bool needs_reset;
> +    VFIODeviceOps *ops;
> +} VFIODevice;
> +
> +struct VFIODeviceOps {
> +    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
> +    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
> +};
> +
>  typedef struct VFIOPCIDevice {
>      PCIDevice pdev;
> -    int fd;
> +    VFIODevice vbasedev;
>      VFIOINTx intx;
>      unsigned int config_size;
>      uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
> @@ -203,20 +226,16 @@ typedef struct VFIOPCIDevice {
>      VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
>      VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
>      PCIHostDeviceAddress host;
> -    QLIST_ENTRY(VFIOPCIDevice) next;
> -    struct VFIOGroup *group;
>      EventNotifier err_notifier;
>      uint32_t features;
>  #define VFIO_FEATURE_ENABLE_VGA_BIT 0
>  #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
>      int32_t bootindex;
>      uint8_t pm_cap;
> -    bool reset_works;
>      bool has_vga;
>      bool pci_aer;
>      bool has_flr;
>      bool has_pm_reset;
> -    bool needs_reset;
>      bool rom_read_failed;
>  } VFIOPCIDevice;
>  
> @@ -224,7 +243,7 @@ typedef struct VFIOGroup {
>      int fd;
>      int groupid;
>      VFIOContainer *container;
> -    QLIST_HEAD(, VFIOPCIDevice) device_list;
> +    QLIST_HEAD(, VFIODevice) device_list;
>      QLIST_ENTRY(VFIOGroup) next;
>      QLIST_ENTRY(VFIOGroup) container_next;
>  } VFIOGroup;
> @@ -277,7 +296,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>  /*
>   * Common VFIO interrupt disable
>   */
> -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
> +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>  {
>      struct vfio_irq_set irq_set = {
>          .argsz = sizeof(irq_set),
> @@ -287,37 +306,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
>          .count = 0,
>      };
>  
> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>  }
>  
>  /*
>   * INTx
>   */
> -static void vfio_unmask_intx(VFIOPCIDevice *vdev)
> +static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>  {
>      struct vfio_irq_set irq_set = {
>          .argsz = sizeof(irq_set),
>          .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
> -        .index = VFIO_PCI_INTX_IRQ_INDEX,
> +        .index = index,
>          .start = 0,
>          .count = 1,
>      };

We're turning these into a generic function, but the function assumes a
single start/count.  Do we want to reflect that in the name or args?
For instance, maybe it should be vfio_unmask_simple_irqindex() to
reflect the common use case of a single interrupt per index and we can
create another function later for a more complete specification should
we ever need it.  Thanks,

Alex

>  
> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>  }
>  
>  #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
> -static void vfio_mask_intx(VFIOPCIDevice *vdev)
> +static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>  {
>      struct vfio_irq_set irq_set = {
>          .argsz = sizeof(irq_set),
>          .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
> -        .index = VFIO_PCI_INTX_IRQ_INDEX,
> +        .index = index,
>          .start = 0,
>          .count = 1,
>      };
>  
> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>  }
>  #endif
>  
> @@ -381,7 +400,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>  
>      vdev->intx.pending = false;
>      pci_irq_deassert(&vdev->pdev);
> -    vfio_unmask_intx(vdev);
> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  }
>  
>  static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
> @@ -404,7 +423,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>  
>      /* Get to a known interrupt state */
>      qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
> -    vfio_mask_intx(vdev);
> +    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>      vdev->intx.pending = false;
>      pci_irq_deassert(&vdev->pdev);
>  
> @@ -434,7 +453,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>  
>      *pfd = irqfd.resamplefd;
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>      g_free(irq_set);
>      if (ret) {
>          error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
> @@ -442,7 +461,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>      }
>  
>      /* Let'em rip */
> -    vfio_unmask_intx(vdev);
> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  
>      vdev->intx.kvm_accel = true;
>  
> @@ -458,7 +477,7 @@ fail_irqfd:
>      event_notifier_cleanup(&vdev->intx.unmask);
>  fail:
>      qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
> -    vfio_unmask_intx(vdev);
> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  #endif
>  }
>  
> @@ -479,7 +498,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>       * Get to a known state, hardware masked, QEMU ready to accept new
>       * interrupts, QEMU IRQ de-asserted.
>       */
> -    vfio_mask_intx(vdev);
> +    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>      vdev->intx.pending = false;
>      pci_irq_deassert(&vdev->pdev);
>  
> @@ -497,7 +516,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>      vdev->intx.kvm_accel = false;
>  
>      /* If we've missed an event, let it re-fire through QEMU */
> -    vfio_unmask_intx(vdev);
> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  
>      trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
>                                  vdev->host.slot, vdev->host.function);
> @@ -583,7 +602,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
>      *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
>      qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>      g_free(irq_set);
>      if (ret) {
>          error_report("vfio: Error: Failed to setup INTx fd: %m");
> @@ -608,7 +627,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
>  
>      timer_del(vdev->intx.mmap_timer);
>      vfio_disable_intx_kvm(vdev);
> -    vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
> +    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>      vdev->intx.pending = false;
>      pci_irq_deassert(&vdev->pdev);
>      vfio_mmap_set_enabled(vdev, true);
> @@ -698,7 +717,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>          fds[i] = fd;
>      }
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>  
>      g_free(irq_set);
>  
> @@ -795,7 +814,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>       * increase them as needed.
>       */
>      if (vdev->nr_vectors < nr + 1) {
> -        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>          vdev->nr_vectors = nr + 1;
>          ret = vfio_enable_vectors(vdev, true);
>          if (ret) {
> @@ -823,7 +842,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>              *pfd = event_notifier_get_fd(&vector->interrupt);
>          }
>  
> -        ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>          g_free(irq_set);
>          if (ret) {
>              error_report("vfio: failed to modify vector, %d", ret);
> @@ -874,7 +893,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>  
>          *pfd = event_notifier_get_fd(&vector->interrupt);
>  
> -        ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>  
>          g_free(irq_set);
>      }
> @@ -1033,7 +1052,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>      }
>  
>      if (vdev->nr_vectors) {
> -        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>      }
>  
>      vfio_disable_msi_common(vdev);
> @@ -1044,7 +1063,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>  
>  static void vfio_disable_msi(VFIOPCIDevice *vdev)
>  {
> -    vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
> +    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
>      vfio_disable_msi_common(vdev);
>  
>      trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
> @@ -1188,7 +1207,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>      off_t off = 0;
>      size_t bytes;
>  
> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
> +    if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
>          error_report("vfio: Error getting ROM info: %m");
>          return;
>      }
> @@ -1218,7 +1237,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>      memset(vdev->rom, 0xff, size);
>  
>      while (size) {
> -        bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
> +        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
> +                      size, vdev->rom_offset + off);
>          if (bytes == 0) {
>              break;
>          } else if (bytes > 0) {
> @@ -1312,6 +1332,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>      off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>      DeviceState *dev = DEVICE(vdev);
>      char name[32];
> +    int fd = vdev->vbasedev.fd;
>  
>      if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>          /* Since pci handles romfile, just print a message and return */
> @@ -1330,10 +1351,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>       * Use the same size ROM BAR as the physical device.  The contents
>       * will get filled in later when the guest tries to read it.
>       */
> -    if (pread(vdev->fd, &orig, 4, offset) != 4 ||
> -        pwrite(vdev->fd, &size, 4, offset) != 4 ||
> -        pread(vdev->fd, &size, 4, offset) != 4 ||
> -        pwrite(vdev->fd, &orig, 4, offset) != 4) {
> +    if (pread(fd, &orig, 4, offset) != 4 ||
> +        pwrite(fd, &size, 4, offset) != 4 ||
> +        pread(fd, &size, 4, offset) != 4 ||
> +        pwrite(fd, &orig, 4, offset) != 4) {
>          error_report("%s(%04x:%02x:%02x.%x) failed: %m",
>                       __func__, vdev->host.domain, vdev->host.bus,
>                       vdev->host.slot, vdev->host.function);
> @@ -2345,7 +2366,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>      if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
>          ssize_t ret;
>  
> -        ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
> +        ret = pread(vdev->vbasedev.fd, &phys_val, len,
> +                    vdev->config_offset + addr);
>          if (ret != len) {
>              error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
>                           __func__, vdev->host.domain, vdev->host.bus,
> @@ -2375,7 +2397,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
>                                  addr, val, len);
>  
>      /* Write everything to VFIO, let it filter out what we can't write */
> -    if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
> +    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
> +                != len) {
>          error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
>                       __func__, vdev->host.domain, vdev->host.bus,
>                       vdev->host.slot, vdev->host.function, addr, val, len);
> @@ -2743,7 +2766,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
>      bool msi_64bit, msi_maskbit;
>      int ret, entries;
>  
> -    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
> +    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
>                vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>          return -errno;
>      }
> @@ -2782,23 +2805,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
>      uint8_t pos;
>      uint16_t ctrl;
>      uint32_t table, pba;
> +    int fd = vdev->vbasedev.fd;
>  
>      pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
>      if (!pos) {
>          return 0;
>      }
>  
> -    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
> +    if (pread(fd, &ctrl, sizeof(ctrl),
>                vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>          return -errno;
>      }
>  
> -    if (pread(vdev->fd, &table, sizeof(table),
> +    if (pread(fd, &table, sizeof(table),
>                vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
>          return -errno;
>      }
>  
> -    if (pread(vdev->fd, &pba, sizeof(pba),
> +    if (pread(fd, &pba, sizeof(pba),
>                vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
>          return -errno;
>      }
> @@ -2950,7 +2974,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>               vdev->host.function, nr);
>  
>      /* Determine what type of BAR this is for registration */
> -    ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
> +    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
>                  vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
>      if (ret != sizeof(pci_bar)) {
>          error_report("vfio: Failed to read BAR %d (%m)", nr);
> @@ -3365,12 +3389,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>                               single ? "one" : "multi");
>  
>      vfio_pci_pre_reset(vdev);
> -    vdev->needs_reset = false;
> +    vdev->vbasedev.needs_reset = false;
>  
>      info = g_malloc0(sizeof(*info));
>      info->argsz = sizeof(*info);
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>      if (ret && errno != ENOSPC) {
>          ret = -errno;
>          if (!vdev->has_pm_reset) {
> @@ -3386,7 +3410,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>      info->argsz = sizeof(*info) + (count * sizeof(*devices));
>      devices = &info->devices[0];
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>      if (ret) {
>          ret = -errno;
>          error_report("vfio: hot reset info failed: %m");
> @@ -3402,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>      for (i = 0; i < info->count; i++) {
>          PCIHostDeviceAddress host;
>          VFIOPCIDevice *tmp;
> +        VFIODevice *vbasedev_iter;
>  
>          host.domain = devices[i].segment;
>          host.bus = devices[i].bus;
> @@ -3433,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>          }
>  
>          /* Prep dependent devices for reset and clear our marker. */
> -        QLIST_FOREACH(tmp, &group->device_list, next) {
> +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> +                continue;
> +            }
> +            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>              if (vfio_pci_host_match(&host, &tmp->host)) {
>                  if (single) {
>                      error_report("vfio: found another in-use device "
> @@ -3443,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>                      goto out_single;
>                  }
>                  vfio_pci_pre_reset(tmp);
> -                tmp->needs_reset = false;
> +                tmp->vbasedev.needs_reset = false;
>                  multi = true;
>                  break;
>              }
> @@ -3482,7 +3511,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>      }
>  
>      /* Bus reset! */
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>      g_free(reset);
>  
>      trace_vfio_pci_hot_reset_result(vdev->host.domain,
> @@ -3496,6 +3525,7 @@ out:
>      for (i = 0; i < info->count; i++) {
>          PCIHostDeviceAddress host;
>          VFIOPCIDevice *tmp;
> +        VFIODevice *vbasedev_iter;
>  
>          host.domain = devices[i].segment;
>          host.bus = devices[i].bus;
> @@ -3516,7 +3546,11 @@ out:
>              break;
>          }
>  
> -        QLIST_FOREACH(tmp, &group->device_list, next) {
> +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> +                continue;
> +            }
> +            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>              if (vfio_pci_host_match(&host, &tmp->host)) {
>                  vfio_pci_post_reset(tmp);
>                  break;
> @@ -3550,28 +3584,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
>      return vfio_pci_hot_reset(vdev, true);
>  }
>  
> -static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
> +static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
>  {
> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>      return vfio_pci_hot_reset(vdev, false);
>  }
>  
> -static void vfio_pci_reset_handler(void *opaque)
> +static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
> +{
> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +    if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> +        vbasedev->needs_reset = true;
> +    }
> +    return vbasedev->needs_reset;
> +}
> +
> +static VFIODeviceOps vfio_pci_ops = {
> +    .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
> +    .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
> +};
> +
> +static void vfio_reset_handler(void *opaque)
>  {
>      VFIOGroup *group;
> -    VFIOPCIDevice *vdev;
> +    VFIODevice *vbasedev;
>  
>      QLIST_FOREACH(group, &group_list, next) {
> -        QLIST_FOREACH(vdev, &group->device_list, next) {
> -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> -                vdev->needs_reset = true;
> -            }
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
>          }
>      }
>  
>      QLIST_FOREACH(group, &group_list, next) {
> -        QLIST_FOREACH(vdev, &group->device_list, next) {
> -            if (vdev->needs_reset) {
> -                vfio_pci_hot_reset_multi(vdev);
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (vbasedev->needs_reset) {
> +                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
>              }
>          }
>      }
> @@ -3860,7 +3907,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
>      }
>  
>      if (QLIST_EMPTY(&group_list)) {
> -        qemu_register_reset(vfio_pci_reset_handler, NULL);
> +        qemu_register_reset(vfio_reset_handler, NULL);
>      }
>  
>      QLIST_INSERT_HEAD(&group_list, group, next);
> @@ -3892,7 +3939,7 @@ static void vfio_put_group(VFIOGroup *group)
>      g_free(group);
>  
>      if (QLIST_EMPTY(&group_list)) {
> -        qemu_unregister_reset(vfio_pci_reset_handler, NULL);
> +        qemu_unregister_reset(vfio_reset_handler, NULL);
>      }
>  }
>  
> @@ -3913,12 +3960,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>          return ret;
>      }
>  
> -    vdev->fd = ret;
> -    vdev->group = group;
> -    QLIST_INSERT_HEAD(&group->device_list, vdev, next);
> +    vdev->vbasedev.fd = ret;
> +    vdev->vbasedev.group = group;
> +    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>  
>      /* Sanity check device */
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
>      if (ret) {
>          error_report("vfio: error getting device info: %m");
>          goto error;
> @@ -3932,7 +3979,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>          goto error;
>      }
>  
> -    vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> +    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>  
>      if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>          error_report("vfio: unexpected number of io regions %u",
> @@ -3948,7 +3995,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>      for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
>          reg_info.index = i;
>  
> -        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>          if (ret) {
>              error_report("vfio: Error getting region %d info: %m", i);
>              goto error;
> @@ -3962,14 +4009,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>          vdev->bars[i].flags = reg_info.flags;
>          vdev->bars[i].size = reg_info.size;
>          vdev->bars[i].fd_offset = reg_info.offset;
> -        vdev->bars[i].fd = vdev->fd;
> +        vdev->bars[i].fd = vdev->vbasedev.fd;
>          vdev->bars[i].nr = i;
>          QLIST_INIT(&vdev->bars[i].quirks);
>      }
>  
>      reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>      if (ret) {
>          error_report("vfio: Error getting config info: %m");
>          goto error;
> @@ -3992,7 +4039,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>              .index = VFIO_PCI_VGA_REGION_INDEX,
>           };
>  
> -        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
>          if (ret) {
>              error_report(
>                  "vfio: Device does not support requested feature x-vga");
> @@ -4009,7 +4056,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>          }
>  
>          vdev->vga.fd_offset = vga_info.offset;
> -        vdev->vga.fd = vdev->fd;
> +        vdev->vga.fd = vdev->vbasedev.fd;
>  
>          vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
>          vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
> @@ -4027,7 +4074,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>      }
>      irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>      if (ret) {
>          /* This can fail for an old kernel or legacy PCI dev */
>          trace_vfio_get_device_get_irq_info_failure();
> @@ -4043,19 +4090,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>  
>  error:
>      if (ret) {
> -        QLIST_REMOVE(vdev, next);
> -        vdev->group = NULL;
> -        close(vdev->fd);
> +        QLIST_REMOVE(&vdev->vbasedev, next);
> +        vdev->vbasedev.group = NULL;
> +        close(vdev->vbasedev.fd);
>      }
>      return ret;
>  }
>  
>  static void vfio_put_device(VFIOPCIDevice *vdev)
>  {
> -    QLIST_REMOVE(vdev, next);
> -    vdev->group = NULL;
> -    trace_vfio_put_device(vdev->fd);
> -    close(vdev->fd);
> +    QLIST_REMOVE(&vdev->vbasedev, next);
> +    vdev->vbasedev.group = NULL;
> +    trace_vfio_put_device(vdev->vbasedev.fd);
> +    close(vdev->vbasedev.fd);
> +    g_free(vdev->vbasedev.name);
>      if (vdev->msix) {
>          g_free(vdev->msix);
>          vdev->msix = NULL;
> @@ -4124,7 +4172,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
>      *pfd = event_notifier_get_fd(&vdev->err_notifier);
>      qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>      if (ret) {
>          error_report("vfio: Failed to set up error notification");
>          qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
> @@ -4157,7 +4205,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>      pfd = (int32_t *)&irq_set->data;
>      *pfd = -1;
>  
> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>      if (ret) {
>          error_report("vfio: Failed to de-assign error fd: %m");
>      }
> @@ -4169,7 +4217,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>  
>  static int vfio_initfn(PCIDevice *pdev)
>  {
> -    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    VFIODevice *vbasedev_iter;
>      VFIOGroup *group;
>      char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>      ssize_t len;
> @@ -4187,6 +4236,13 @@ static int vfio_initfn(PCIDevice *pdev)
>          return -errno;
>      }
>  
> +    vdev->vbasedev.ops = &vfio_pci_ops;
> +
> +    vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
> +    g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
> +            vdev->host.domain, vdev->host.bus, vdev->host.slot,
> +            vdev->host.function);
> +
>      strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>  
>      len = readlink(path, iommu_group_path, sizeof(path));
> @@ -4216,12 +4272,8 @@ static int vfio_initfn(PCIDevice *pdev)
>              vdev->host.domain, vdev->host.bus, vdev->host.slot,
>              vdev->host.function);
>  
> -    QLIST_FOREACH(pvdev, &group->device_list, next) {
> -        if (pvdev->host.domain == vdev->host.domain &&
> -            pvdev->host.bus == vdev->host.bus &&
> -            pvdev->host.slot == vdev->host.slot &&
> -            pvdev->host.function == vdev->host.function) {
> -
> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +        if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
>              error_report("vfio: error: device %s is already attached", path);
>              vfio_put_group(group);
>              return -EBUSY;
> @@ -4236,7 +4288,7 @@ static int vfio_initfn(PCIDevice *pdev)
>      }
>  
>      /* Get a copy of config space */
> -    ret = pread(vdev->fd, vdev->pdev.config,
> +    ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>                  vdev->config_offset);
>      if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
> @@ -4323,7 +4375,7 @@ out_put:
>  static void vfio_exitfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> -    VFIOGroup *group = vdev->group;
> +    VFIOGroup *group = vdev->vbasedev.group;
>  
>      vfio_unregister_err_notifier(vdev);
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> @@ -4349,8 +4401,9 @@ static void vfio_pci_reset(DeviceState *dev)
>  
>      vfio_pci_pre_reset(vdev);
>  
> -    if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
> -        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
> +    if (vdev->vbasedev.reset_works &&
> +        (vdev->has_flr || !vdev->has_pm_reset) &&
> +        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>          trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
>                                    vdev->host.slot, vdev->host.function);
>          goto post_reset;
> @@ -4362,8 +4415,8 @@ static void vfio_pci_reset(DeviceState *dev)
>      }
>  
>      /* If nothing else works and the device supports PM reset, use it */
> -    if (vdev->reset_works && vdev->has_pm_reset &&
> -        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
> +    if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
> +        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>          trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
>                                  vdev->host.slot, vdev->host.function);
>          goto post_reset;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-11-05 12:31     ` Eric Auger
@ 2014-11-05 22:23       ` Alexander Graf
  2014-11-06  8:57         ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-05 22:23 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 05.11.14 13:31, Eric Auger wrote:
> On 11/05/2014 11:59 AM, Alexander Graf wrote:
>>
>>
>> On 31.10.14 15:05, Eric Auger wrote:
>>> vfio-calxeda-xgmac now can be instantiated using the -device option.
>>> The node creation function generates a very basic dt node composed
>>> of the compat, reg and interrupts properties
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>>
>>> v6 -> v7:
>>> - compat string re-formatting removed since compat string is not exposed
>>>   anymore as a user option
>>> - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
>>>   device
>>> ---
>>>  hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 88 insertions(+)
>>>
>>> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
>>> index d5476f1..f8b310b 100644
>>> --- a/hw/arm/sysbus-fdt.c
>>> +++ b/hw/arm/sysbus-fdt.c
>>> @@ -27,6 +27,8 @@
>>>  #include "hw/platform-bus.h"
>>>  #include "sysemu/sysemu.h"
>>>  #include "hw/platform-bus.h"
>>> +#include "hw/vfio/vfio-platform.h"
>>> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>>>  
>>>  /*
>>>   * internal struct that contains the information to create dynamic
>>> @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
>>>      int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
>>>  } NodeCreationPair;
>>>  
>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
>>> +
>>>  /* list of supported dynamic sysbus devices */
>>>  NodeCreationPair add_fdt_node_functions[] = {
>>> +        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
>>>          {"", NULL}, /*last element*/
>>>  };
>>
>> Can you maybe place the list somewhere smartly to make sure we don't
>> need forward declarations? Either put it in between the "generic" and
>> "device specific" code or at the end of the file with a single forward
>> declaration for the array?
> 
> sure
>>
>>>  
>>> @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
>>>  }
>>>  
>>>  /**
>>> + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
>>> + *
>>> + * set properties are:
>>> + * - compatible string
>>> + * - regs
>>> + * - interrupts
>>> + */
>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +    PlatformBusFdtData *data = opaque;
>>> +    PlatformBusDevice *pbus = data->pbus;
>>> +    void *fdt = data->fdt;
>>> +    const char *parent_node = data->pbus_node_name;
>>> +    int compat_str_len;
>>> +    char *nodename;
>>> +    int i, ret;
>>> +    uint32_t *irq_attr;
>>> +    uint64_t *reg_attr;
>>> +    uint64_t mmio_base;
>>> +    uint64_t irq_number;
>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>> +    Object *obj = OBJECT(sbdev);
>>> +
>>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>>> +
>>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>>> +                               vbasedev->name,
>>> +                               mmio_base);
>>> +
>>> +    qemu_fdt_add_subnode(fdt, nodename);
>>> +
>>> +    compat_str_len = strlen(vdev->compat) + 1;
>>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>>> +                          vdev->compat, compat_str_len);
>>
>> What if there are multiple compatibles?
> My purpose here was absolutely not to come back again on a proposal
> where we could have a generic node creation. I understand that it is not
> realistic. I rather tried to put some common property creation in this
> function but you're right even the interrupt prop depend on the device.
> 
> About your question, I think the specialized VFIO device would set its
> compat string including the various substrings. This was done in the
> past for PL330 which required arm,pl330;arm,primecell.
> 
>>
>>> +
>>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>>> +
>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>> +        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
>>> +        reg_attr[4*i] = 1;
>>
>> What is the 1 here?
> address-cells? since the bus is < 4GB, 1 32b reg is required to specify
> the base address. But since you put #size-cells already in the parent
> node maybe I can remove it.

I'm confused. Shouldn't the reg look like [ <addr> <size> ... ]?

  http://www.devicetree.org/Device_Tree_Usage#Memory_Mapped_Devices

The number of cells is defined separately via #address-cells or #size-cells.

> 
>>
>>> +        reg_attr[4*i+1] = mmio_base;
>>> +        reg_attr[4*i+2] = 1;
>>
>> and here?
> size-cells for this reg. same remark as above


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice
  2014-11-05 17:35   ` Alex Williamson
@ 2014-11-06  8:38     ` Eric Auger
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-11-06  8:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: joel.schopp, kim.phillips, eric.auger, a.rigo, peter.maydell,
	manish.jaggi, patches, will.deacon, qemu-devel, agraf,
	Bharat.Bhushan, stuart.yoder, a.motakis, pbonzini, kvmarm,
	christoffer.dall

On 11/05/2014 06:35 PM, Alex Williamson wrote:
> Hi Eric,
> 
> On Fri, 2014-10-31 at 14:05 +0000, Eric Auger wrote:
>> Introduce the VFIODevice struct that is going to be shared by
>> VFIOPCIDevice and VFIOPlatformDevice.
>>
>> Additional fields will be added there later on for review
>> convenience.
>>
>> the group's device_list becomes a list of VFIODevice
>>
>> This obliges to rework the reset_handler which becomes generic and
>> calls VFIODevice ops that are specialized in each parent object.
>> Also functions that iterate on this list must take care that the
>> devices can be something else than VFIOPCIDevice. The type is used
>> to discriminate them.
>>
>> we profit from this step to change the prototype of
>> vfio_unmask_intx, vfio_mask_intx, vfio_disable_irqindex which now
>> apply to VFIODevice. They are renamed as *_irqindex.
>> The index is passed as parameter to anticipate their usage for
>> platform IRQs
> 
> I cringe when reviewers tell me this, so I apologize in advance, but
> there are logically at least 4 separate things happening in this patch:
> 
> 1) VFIODevice
> 2) VFIODeviceOps
> 3) irqindex conversions
> 4) strcmp(name) vs comparing ssss:bb:dd.f
> 
> I don't really see any dependencies between them, and
> I think they'd also be easier to review as 4 separate patches.  More
> below...

Hi Alex,

no problem I am going to split it.
> 
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v4->v5:
>> - fix style issues
>> - in vfio_initfn, rework allocation of vdev->vbasedev.name and
>>   replace snprintf by g_strdup_printf
>> ---
>>  hw/vfio/pci.c | 241 +++++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 147 insertions(+), 94 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 93181bf..0531744 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -48,6 +48,11 @@
>>  #define VFIO_ALLOW_KVM_MSI 1
>>  #define VFIO_ALLOW_KVM_MSIX 1
>>  
>> +enum {
>> +    VFIO_DEVICE_TYPE_PCI = 0,
>> +    VFIO_DEVICE_TYPE_PLATFORM = 1,
> 
> VFIO_DEVICE_TYPE_PLATFORM gets dropped in patch 8 and re-added in patch
> 9.  Let's remove it here and let it's first appearance be in patch 9.

yes sure. My bad.
> 
>> +};
>> +
>>  struct VFIOPCIDevice;
>>  
>>  typedef struct VFIOQuirk {
>> @@ -185,9 +190,27 @@ typedef struct VFIOMSIXInfo {
>>      void *mmap;
>>  } VFIOMSIXInfo;
>>  
>> +typedef struct VFIODeviceOps VFIODeviceOps;
>> +
>> +typedef struct VFIODevice {
>> +    QLIST_ENTRY(VFIODevice) next;
>> +    struct VFIOGroup *group;
>> +    char *name;
>> +    int fd;
>> +    int type;
>> +    bool reset_works;
>> +    bool needs_reset;
>> +    VFIODeviceOps *ops;
>> +} VFIODevice;
>> +
>> +struct VFIODeviceOps {
>> +    bool (*vfio_compute_needs_reset)(VFIODevice *vdev);
>> +    int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>> +};
>> +
>>  typedef struct VFIOPCIDevice {
>>      PCIDevice pdev;
>> -    int fd;
>> +    VFIODevice vbasedev;
>>      VFIOINTx intx;
>>      unsigned int config_size;
>>      uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
>> @@ -203,20 +226,16 @@ typedef struct VFIOPCIDevice {
>>      VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
>>      VFIOVGA vga; /* 0xa0000, 0x3b0, 0x3c0 */
>>      PCIHostDeviceAddress host;
>> -    QLIST_ENTRY(VFIOPCIDevice) next;
>> -    struct VFIOGroup *group;
>>      EventNotifier err_notifier;
>>      uint32_t features;
>>  #define VFIO_FEATURE_ENABLE_VGA_BIT 0
>>  #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
>>      int32_t bootindex;
>>      uint8_t pm_cap;
>> -    bool reset_works;
>>      bool has_vga;
>>      bool pci_aer;
>>      bool has_flr;
>>      bool has_pm_reset;
>> -    bool needs_reset;
>>      bool rom_read_failed;
>>  } VFIOPCIDevice;
>>  
>> @@ -224,7 +243,7 @@ typedef struct VFIOGroup {
>>      int fd;
>>      int groupid;
>>      VFIOContainer *container;
>> -    QLIST_HEAD(, VFIOPCIDevice) device_list;
>> +    QLIST_HEAD(, VFIODevice) device_list;
>>      QLIST_ENTRY(VFIOGroup) next;
>>      QLIST_ENTRY(VFIOGroup) container_next;
>>  } VFIOGroup;
>> @@ -277,7 +296,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>  /*
>>   * Common VFIO interrupt disable
>>   */
>> -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
>> +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>>  {
>>      struct vfio_irq_set irq_set = {
>>          .argsz = sizeof(irq_set),
>> @@ -287,37 +306,37 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
>>          .count = 0,
>>      };
>>  
>> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>  }
>>  
>>  /*
>>   * INTx
>>   */
>> -static void vfio_unmask_intx(VFIOPCIDevice *vdev)
>> +static void vfio_unmask_irqindex(VFIODevice *vbasedev, int index)
>>  {
>>      struct vfio_irq_set irq_set = {
>>          .argsz = sizeof(irq_set),
>>          .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
>> -        .index = VFIO_PCI_INTX_IRQ_INDEX,
>> +        .index = index,
>>          .start = 0,
>>          .count = 1,
>>      };
> 
> We're turning these into a generic function, but the function assumes a
> single start/count.  Do we want to reflect that in the name or args?
> For instance, maybe it should be vfio_unmask_simple_irqindex() to
> reflect the common use case of a single interrupt per index and we can
> create another function later for a more complete specification should
> we ever need it.  Thanks,
OK

Thanks for your time

Best Regards

Eric
> 
> Alex
> 
>>  
>> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>  }
>>  
>>  #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
>> -static void vfio_mask_intx(VFIOPCIDevice *vdev)
>> +static void vfio_mask_irqindex(VFIODevice *vbasedev, int index)
>>  {
>>      struct vfio_irq_set irq_set = {
>>          .argsz = sizeof(irq_set),
>>          .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
>> -        .index = VFIO_PCI_INTX_IRQ_INDEX,
>> +        .index = index,
>>          .start = 0,
>>          .count = 1,
>>      };
>>  
>> -    ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>> +    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
>>  }
>>  #endif
>>  
>> @@ -381,7 +400,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
>>  
>>      vdev->intx.pending = false;
>>      pci_irq_deassert(&vdev->pdev);
>> -    vfio_unmask_intx(vdev);
>> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>  }
>>  
>>  static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>> @@ -404,7 +423,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>>  
>>      /* Get to a known interrupt state */
>>      qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
>> -    vfio_mask_intx(vdev);
>> +    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>      vdev->intx.pending = false;
>>      pci_irq_deassert(&vdev->pdev);
>>  
>> @@ -434,7 +453,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>>  
>>      *pfd = irqfd.resamplefd;
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>      g_free(irq_set);
>>      if (ret) {
>>          error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
>> @@ -442,7 +461,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
>>      }
>>  
>>      /* Let'em rip */
>> -    vfio_unmask_intx(vdev);
>> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>  
>>      vdev->intx.kvm_accel = true;
>>  
>> @@ -458,7 +477,7 @@ fail_irqfd:
>>      event_notifier_cleanup(&vdev->intx.unmask);
>>  fail:
>>      qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
>> -    vfio_unmask_intx(vdev);
>> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>  #endif
>>  }
>>  
>> @@ -479,7 +498,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>>       * Get to a known state, hardware masked, QEMU ready to accept new
>>       * interrupts, QEMU IRQ de-asserted.
>>       */
>> -    vfio_mask_intx(vdev);
>> +    vfio_mask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>      vdev->intx.pending = false;
>>      pci_irq_deassert(&vdev->pdev);
>>  
>> @@ -497,7 +516,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
>>      vdev->intx.kvm_accel = false;
>>  
>>      /* If we've missed an event, let it re-fire through QEMU */
>> -    vfio_unmask_intx(vdev);
>> +    vfio_unmask_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>  
>>      trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
>>                                  vdev->host.slot, vdev->host.function);
>> @@ -583,7 +602,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
>>      *pfd = event_notifier_get_fd(&vdev->intx.interrupt);
>>      qemu_set_fd_handler(*pfd, vfio_intx_interrupt, NULL, vdev);
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>      g_free(irq_set);
>>      if (ret) {
>>          error_report("vfio: Error: Failed to setup INTx fd: %m");
>> @@ -608,7 +627,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
>>  
>>      timer_del(vdev->intx.mmap_timer);
>>      vfio_disable_intx_kvm(vdev);
>> -    vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
>> +    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>>      vdev->intx.pending = false;
>>      pci_irq_deassert(&vdev->pdev);
>>      vfio_mmap_set_enabled(vdev, true);
>> @@ -698,7 +717,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>>          fds[i] = fd;
>>      }
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>  
>>      g_free(irq_set);
>>  
>> @@ -795,7 +814,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>       * increase them as needed.
>>       */
>>      if (vdev->nr_vectors < nr + 1) {
>> -        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>>          vdev->nr_vectors = nr + 1;
>>          ret = vfio_enable_vectors(vdev, true);
>>          if (ret) {
>> @@ -823,7 +842,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>>              *pfd = event_notifier_get_fd(&vector->interrupt);
>>          }
>>  
>> -        ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>          g_free(irq_set);
>>          if (ret) {
>>              error_report("vfio: failed to modify vector, %d", ret);
>> @@ -874,7 +893,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>>  
>>          *pfd = event_notifier_get_fd(&vector->interrupt);
>>  
>> -        ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>  
>>          g_free(irq_set);
>>      }
>> @@ -1033,7 +1052,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>>      }
>>  
>>      if (vdev->nr_vectors) {
>> -        vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +        vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>>      }
>>  
>>      vfio_disable_msi_common(vdev);
>> @@ -1044,7 +1063,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
>>  
>>  static void vfio_disable_msi(VFIOPCIDevice *vdev)
>>  {
>> -    vfio_disable_irqindex(vdev, VFIO_PCI_MSI_IRQ_INDEX);
>> +    vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
>>      vfio_disable_msi_common(vdev);
>>  
>>      trace_vfio_disable_msi(vdev->host.domain, vdev->host.bus,
>> @@ -1188,7 +1207,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>      off_t off = 0;
>>      size_t bytes;
>>  
>> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
>> +    if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info)) {
>>          error_report("vfio: Error getting ROM info: %m");
>>          return;
>>      }
>> @@ -1218,7 +1237,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>      memset(vdev->rom, 0xff, size);
>>  
>>      while (size) {
>> -        bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off);
>> +        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
>> +                      size, vdev->rom_offset + off);
>>          if (bytes == 0) {
>>              break;
>>          } else if (bytes > 0) {
>> @@ -1312,6 +1332,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>>      off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>>      DeviceState *dev = DEVICE(vdev);
>>      char name[32];
>> +    int fd = vdev->vbasedev.fd;
>>  
>>      if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>>          /* Since pci handles romfile, just print a message and return */
>> @@ -1330,10 +1351,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>>       * Use the same size ROM BAR as the physical device.  The contents
>>       * will get filled in later when the guest tries to read it.
>>       */
>> -    if (pread(vdev->fd, &orig, 4, offset) != 4 ||
>> -        pwrite(vdev->fd, &size, 4, offset) != 4 ||
>> -        pread(vdev->fd, &size, 4, offset) != 4 ||
>> -        pwrite(vdev->fd, &orig, 4, offset) != 4) {
>> +    if (pread(fd, &orig, 4, offset) != 4 ||
>> +        pwrite(fd, &size, 4, offset) != 4 ||
>> +        pread(fd, &size, 4, offset) != 4 ||
>> +        pwrite(fd, &orig, 4, offset) != 4) {
>>          error_report("%s(%04x:%02x:%02x.%x) failed: %m",
>>                       __func__, vdev->host.domain, vdev->host.bus,
>>                       vdev->host.slot, vdev->host.function);
>> @@ -2345,7 +2366,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>>      if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
>>          ssize_t ret;
>>  
>> -        ret = pread(vdev->fd, &phys_val, len, vdev->config_offset + addr);
>> +        ret = pread(vdev->vbasedev.fd, &phys_val, len,
>> +                    vdev->config_offset + addr);
>>          if (ret != len) {
>>              error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %m",
>>                           __func__, vdev->host.domain, vdev->host.bus,
>> @@ -2375,7 +2397,8 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
>>                                  addr, val, len);
>>  
>>      /* Write everything to VFIO, let it filter out what we can't write */
>> -    if (pwrite(vdev->fd, &val_le, len, vdev->config_offset + addr) != len) {
>> +    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
>> +                != len) {
>>          error_report("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %m",
>>                       __func__, vdev->host.domain, vdev->host.bus,
>>                       vdev->host.slot, vdev->host.function, addr, val, len);
>> @@ -2743,7 +2766,7 @@ static int vfio_setup_msi(VFIOPCIDevice *vdev, int pos)
>>      bool msi_64bit, msi_maskbit;
>>      int ret, entries;
>>  
>> -    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
>> +    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
>>                vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>>          return -errno;
>>      }
>> @@ -2782,23 +2805,24 @@ static int vfio_early_setup_msix(VFIOPCIDevice *vdev)
>>      uint8_t pos;
>>      uint16_t ctrl;
>>      uint32_t table, pba;
>> +    int fd = vdev->vbasedev.fd;
>>  
>>      pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
>>      if (!pos) {
>>          return 0;
>>      }
>>  
>> -    if (pread(vdev->fd, &ctrl, sizeof(ctrl),
>> +    if (pread(fd, &ctrl, sizeof(ctrl),
>>                vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
>>          return -errno;
>>      }
>>  
>> -    if (pread(vdev->fd, &table, sizeof(table),
>> +    if (pread(fd, &table, sizeof(table),
>>                vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
>>          return -errno;
>>      }
>>  
>> -    if (pread(vdev->fd, &pba, sizeof(pba),
>> +    if (pread(fd, &pba, sizeof(pba),
>>                vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
>>          return -errno;
>>      }
>> @@ -2950,7 +2974,7 @@ static void vfio_map_bar(VFIOPCIDevice *vdev, int nr)
>>               vdev->host.function, nr);
>>  
>>      /* Determine what type of BAR this is for registration */
>> -    ret = pread(vdev->fd, &pci_bar, sizeof(pci_bar),
>> +    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
>>                  vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
>>      if (ret != sizeof(pci_bar)) {
>>          error_report("vfio: Failed to read BAR %d (%m)", nr);
>> @@ -3365,12 +3389,12 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>                               single ? "one" : "multi");
>>  
>>      vfio_pci_pre_reset(vdev);
>> -    vdev->needs_reset = false;
>> +    vdev->vbasedev.needs_reset = false;
>>  
>>      info = g_malloc0(sizeof(*info));
>>      info->argsz = sizeof(*info);
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>>      if (ret && errno != ENOSPC) {
>>          ret = -errno;
>>          if (!vdev->has_pm_reset) {
>> @@ -3386,7 +3410,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>      info->argsz = sizeof(*info) + (count * sizeof(*devices));
>>      devices = &info->devices[0];
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>>      if (ret) {
>>          ret = -errno;
>>          error_report("vfio: hot reset info failed: %m");
>> @@ -3402,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>      for (i = 0; i < info->count; i++) {
>>          PCIHostDeviceAddress host;
>>          VFIOPCIDevice *tmp;
>> +        VFIODevice *vbasedev_iter;
>>  
>>          host.domain = devices[i].segment;
>>          host.bus = devices[i].bus;
>> @@ -3433,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>          }
>>  
>>          /* Prep dependent devices for reset and clear our marker. */
>> -        QLIST_FOREACH(tmp, &group->device_list, next) {
>> +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> +                continue;
>> +            }
>> +            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>>              if (vfio_pci_host_match(&host, &tmp->host)) {
>>                  if (single) {
>>                      error_report("vfio: found another in-use device "
>> @@ -3443,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>                      goto out_single;
>>                  }
>>                  vfio_pci_pre_reset(tmp);
>> -                tmp->needs_reset = false;
>> +                tmp->vbasedev.needs_reset = false;
>>                  multi = true;
>>                  break;
>>              }
>> @@ -3482,7 +3511,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>>      }
>>  
>>      /* Bus reset! */
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>>      g_free(reset);
>>  
>>      trace_vfio_pci_hot_reset_result(vdev->host.domain,
>> @@ -3496,6 +3525,7 @@ out:
>>      for (i = 0; i < info->count; i++) {
>>          PCIHostDeviceAddress host;
>>          VFIOPCIDevice *tmp;
>> +        VFIODevice *vbasedev_iter;
>>  
>>          host.domain = devices[i].segment;
>>          host.bus = devices[i].bus;
>> @@ -3516,7 +3546,11 @@ out:
>>              break;
>>          }
>>  
>> -        QLIST_FOREACH(tmp, &group->device_list, next) {
>> +        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> +                continue;
>> +            }
>> +            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>>              if (vfio_pci_host_match(&host, &tmp->host)) {
>>                  vfio_pci_post_reset(tmp);
>>                  break;
>> @@ -3550,28 +3584,41 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
>>      return vfio_pci_hot_reset(vdev, true);
>>  }
>>  
>> -static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
>> +static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
>>  {
>> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>      return vfio_pci_hot_reset(vdev, false);
>>  }
>>  
>> -static void vfio_pci_reset_handler(void *opaque)
>> +static bool vfio_pci_compute_needs_reset(VFIODevice *vbasedev)
>> +{
>> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +    if (!vbasedev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
>> +        vbasedev->needs_reset = true;
>> +    }
>> +    return vbasedev->needs_reset;
>> +}
>> +
>> +static VFIODeviceOps vfio_pci_ops = {
>> +    .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
>> +    .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
>> +};
>> +
>> +static void vfio_reset_handler(void *opaque)
>>  {
>>      VFIOGroup *group;
>> -    VFIOPCIDevice *vdev;
>> +    VFIODevice *vbasedev;
>>  
>>      QLIST_FOREACH(group, &group_list, next) {
>> -        QLIST_FOREACH(vdev, &group->device_list, next) {
>> -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
>> -                vdev->needs_reset = true;
>> -            }
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vbasedev->ops->vfio_compute_needs_reset(vbasedev);
>>          }
>>      }
>>  
>>      QLIST_FOREACH(group, &group_list, next) {
>> -        QLIST_FOREACH(vdev, &group->device_list, next) {
>> -            if (vdev->needs_reset) {
>> -                vfio_pci_hot_reset_multi(vdev);
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            if (vbasedev->needs_reset) {
>> +                vbasedev->ops->vfio_hot_reset_multi(vbasedev);
>>              }
>>          }
>>      }
>> @@ -3860,7 +3907,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
>>      }
>>  
>>      if (QLIST_EMPTY(&group_list)) {
>> -        qemu_register_reset(vfio_pci_reset_handler, NULL);
>> +        qemu_register_reset(vfio_reset_handler, NULL);
>>      }
>>  
>>      QLIST_INSERT_HEAD(&group_list, group, next);
>> @@ -3892,7 +3939,7 @@ static void vfio_put_group(VFIOGroup *group)
>>      g_free(group);
>>  
>>      if (QLIST_EMPTY(&group_list)) {
>> -        qemu_unregister_reset(vfio_pci_reset_handler, NULL);
>> +        qemu_unregister_reset(vfio_reset_handler, NULL);
>>      }
>>  }
>>  
>> @@ -3913,12 +3960,12 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>          return ret;
>>      }
>>  
>> -    vdev->fd = ret;
>> -    vdev->group = group;
>> -    QLIST_INSERT_HEAD(&group->device_list, vdev, next);
>> +    vdev->vbasedev.fd = ret;
>> +    vdev->vbasedev.group = group;
>> +    QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
>>  
>>      /* Sanity check device */
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &dev_info);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
>>      if (ret) {
>>          error_report("vfio: error getting device info: %m");
>>          goto error;
>> @@ -3932,7 +3979,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>          goto error;
>>      }
>>  
>> -    vdev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>> +    vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>>  
>>      if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
>>          error_report("vfio: unexpected number of io regions %u",
>> @@ -3948,7 +3995,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>      for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
>>          reg_info.index = i;
>>  
>> -        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>>          if (ret) {
>>              error_report("vfio: Error getting region %d info: %m", i);
>>              goto error;
>> @@ -3962,14 +4009,14 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>          vdev->bars[i].flags = reg_info.flags;
>>          vdev->bars[i].size = reg_info.size;
>>          vdev->bars[i].fd_offset = reg_info.offset;
>> -        vdev->bars[i].fd = vdev->fd;
>> +        vdev->bars[i].fd = vdev->vbasedev.fd;
>>          vdev->bars[i].nr = i;
>>          QLIST_INIT(&vdev->bars[i].quirks);
>>      }
>>  
>>      reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>>      if (ret) {
>>          error_report("vfio: Error getting config info: %m");
>>          goto error;
>> @@ -3992,7 +4039,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>              .index = VFIO_PCI_VGA_REGION_INDEX,
>>           };
>>  
>> -        ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
>> +        ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, &vga_info);
>>          if (ret) {
>>              error_report(
>>                  "vfio: Device does not support requested feature x-vga");
>> @@ -4009,7 +4056,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>          }
>>  
>>          vdev->vga.fd_offset = vga_info.offset;
>> -        vdev->vga.fd = vdev->fd;
>> +        vdev->vga.fd = vdev->vbasedev.fd;
>>  
>>          vdev->vga.region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
>>          vdev->vga.region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
>> @@ -4027,7 +4074,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>      }
>>      irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
>>      if (ret) {
>>          /* This can fail for an old kernel or legacy PCI dev */
>>          trace_vfio_get_device_get_irq_info_failure();
>> @@ -4043,19 +4090,20 @@ static int vfio_get_device(VFIOGroup *group, const char *name,
>>  
>>  error:
>>      if (ret) {
>> -        QLIST_REMOVE(vdev, next);
>> -        vdev->group = NULL;
>> -        close(vdev->fd);
>> +        QLIST_REMOVE(&vdev->vbasedev, next);
>> +        vdev->vbasedev.group = NULL;
>> +        close(vdev->vbasedev.fd);
>>      }
>>      return ret;
>>  }
>>  
>>  static void vfio_put_device(VFIOPCIDevice *vdev)
>>  {
>> -    QLIST_REMOVE(vdev, next);
>> -    vdev->group = NULL;
>> -    trace_vfio_put_device(vdev->fd);
>> -    close(vdev->fd);
>> +    QLIST_REMOVE(&vdev->vbasedev, next);
>> +    vdev->vbasedev.group = NULL;
>> +    trace_vfio_put_device(vdev->vbasedev.fd);
>> +    close(vdev->vbasedev.fd);
>> +    g_free(vdev->vbasedev.name);
>>      if (vdev->msix) {
>>          g_free(vdev->msix);
>>          vdev->msix = NULL;
>> @@ -4124,7 +4172,7 @@ static void vfio_register_err_notifier(VFIOPCIDevice *vdev)
>>      *pfd = event_notifier_get_fd(&vdev->err_notifier);
>>      qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>      if (ret) {
>>          error_report("vfio: Failed to set up error notification");
>>          qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
>> @@ -4157,7 +4205,7 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>      pfd = (int32_t *)&irq_set->data;
>>      *pfd = -1;
>>  
>> -    ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>      if (ret) {
>>          error_report("vfio: Failed to de-assign error fd: %m");
>>      }
>> @@ -4169,7 +4217,8 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>  
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>> -    VFIOPCIDevice *pvdev, *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +    VFIODevice *vbasedev_iter;
>>      VFIOGroup *group;
>>      char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>>      ssize_t len;
>> @@ -4187,6 +4236,13 @@ static int vfio_initfn(PCIDevice *pdev)
>>          return -errno;
>>      }
>>  
>> +    vdev->vbasedev.ops = &vfio_pci_ops;
>> +
>> +    vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
>> +    g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
>> +            vdev->host.domain, vdev->host.bus, vdev->host.slot,
>> +            vdev->host.function);
>> +
>>      strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>>  
>>      len = readlink(path, iommu_group_path, sizeof(path));
>> @@ -4216,12 +4272,8 @@ static int vfio_initfn(PCIDevice *pdev)
>>              vdev->host.domain, vdev->host.bus, vdev->host.slot,
>>              vdev->host.function);
>>  
>> -    QLIST_FOREACH(pvdev, &group->device_list, next) {
>> -        if (pvdev->host.domain == vdev->host.domain &&
>> -            pvdev->host.bus == vdev->host.bus &&
>> -            pvdev->host.slot == vdev->host.slot &&
>> -            pvdev->host.function == vdev->host.function) {
>> -
>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +        if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
>>              error_report("vfio: error: device %s is already attached", path);
>>              vfio_put_group(group);
>>              return -EBUSY;
>> @@ -4236,7 +4288,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>      }
>>  
>>      /* Get a copy of config space */
>> -    ret = pread(vdev->fd, vdev->pdev.config,
>> +    ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>>                  MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>>                  vdev->config_offset);
>>      if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
>> @@ -4323,7 +4375,7 @@ out_put:
>>  static void vfio_exitfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> -    VFIOGroup *group = vdev->group;
>> +    VFIOGroup *group = vdev->vbasedev.group;
>>  
>>      vfio_unregister_err_notifier(vdev);
>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>> @@ -4349,8 +4401,9 @@ static void vfio_pci_reset(DeviceState *dev)
>>  
>>      vfio_pci_pre_reset(vdev);
>>  
>> -    if (vdev->reset_works && (vdev->has_flr || !vdev->has_pm_reset) &&
>> -        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
>> +    if (vdev->vbasedev.reset_works &&
>> +        (vdev->has_flr || !vdev->has_pm_reset) &&
>> +        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>>          trace_vfio_pci_reset_flr(vdev->host.domain, vdev->host.bus,
>>                                    vdev->host.slot, vdev->host.function);
>>          goto post_reset;
>> @@ -4362,8 +4415,8 @@ static void vfio_pci_reset(DeviceState *dev)
>>      }
>>  
>>      /* If nothing else works and the device supports PM reset, use it */
>> -    if (vdev->reset_works && vdev->has_pm_reset &&
>> -        !ioctl(vdev->fd, VFIO_DEVICE_RESET)) {
>> +    if (vdev->vbasedev.reset_works && vdev->has_pm_reset &&
>> +        !ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)) {
>>          trace_vfio_pci_reset_pm(vdev->host.domain, vdev->host.bus,
>>                                  vdev->host.slot, vdev->host.function);
>>          goto post_reset;
> 
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-11-05 22:23       ` Alexander Graf
@ 2014-11-06  8:57         ` Eric Auger
  2014-11-06 12:34           ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-06  8:57 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/05/2014 11:23 PM, Alexander Graf wrote:
> 
> 
> On 05.11.14 13:31, Eric Auger wrote:
>> On 11/05/2014 11:59 AM, Alexander Graf wrote:
>>>
>>>
>>> On 31.10.14 15:05, Eric Auger wrote:
>>>> vfio-calxeda-xgmac now can be instantiated using the -device option.
>>>> The node creation function generates a very basic dt node composed
>>>> of the compat, reg and interrupts properties
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> ---
>>>>
>>>> v6 -> v7:
>>>> - compat string re-formatting removed since compat string is not exposed
>>>>   anymore as a user option
>>>> - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
>>>>   device
>>>> ---
>>>>  hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 88 insertions(+)
>>>>
>>>> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
>>>> index d5476f1..f8b310b 100644
>>>> --- a/hw/arm/sysbus-fdt.c
>>>> +++ b/hw/arm/sysbus-fdt.c
>>>> @@ -27,6 +27,8 @@
>>>>  #include "hw/platform-bus.h"
>>>>  #include "sysemu/sysemu.h"
>>>>  #include "hw/platform-bus.h"
>>>> +#include "hw/vfio/vfio-platform.h"
>>>> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>>>>  
>>>>  /*
>>>>   * internal struct that contains the information to create dynamic
>>>> @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
>>>>      int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
>>>>  } NodeCreationPair;
>>>>  
>>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
>>>> +
>>>>  /* list of supported dynamic sysbus devices */
>>>>  NodeCreationPair add_fdt_node_functions[] = {
>>>> +        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
>>>>          {"", NULL}, /*last element*/
>>>>  };
>>>
>>> Can you maybe place the list somewhere smartly to make sure we don't
>>> need forward declarations? Either put it in between the "generic" and
>>> "device specific" code or at the end of the file with a single forward
>>> declaration for the array?
>>
>> sure
>>>
>>>>  
>>>> @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
>>>>  }
>>>>  
>>>>  /**
>>>> + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
>>>> + *
>>>> + * set properties are:
>>>> + * - compatible string
>>>> + * - regs
>>>> + * - interrupts
>>>> + */
>>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
>>>> +{
>>>> +    PlatformBusFdtData *data = opaque;
>>>> +    PlatformBusDevice *pbus = data->pbus;
>>>> +    void *fdt = data->fdt;
>>>> +    const char *parent_node = data->pbus_node_name;
>>>> +    int compat_str_len;
>>>> +    char *nodename;
>>>> +    int i, ret;
>>>> +    uint32_t *irq_attr;
>>>> +    uint64_t *reg_attr;
>>>> +    uint64_t mmio_base;
>>>> +    uint64_t irq_number;
>>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>>> +    Object *obj = OBJECT(sbdev);
>>>> +
>>>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>>>> +
>>>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>>>> +                               vbasedev->name,
>>>> +                               mmio_base);
>>>> +
>>>> +    qemu_fdt_add_subnode(fdt, nodename);
>>>> +
>>>> +    compat_str_len = strlen(vdev->compat) + 1;
>>>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>>>> +                          vdev->compat, compat_str_len);
>>>
>>> What if there are multiple compatibles?
>> My purpose here was absolutely not to come back again on a proposal
>> where we could have a generic node creation. I understand that it is not
>> realistic. I rather tried to put some common property creation in this
>> function but you're right even the interrupt prop depend on the device.
>>
>> About your question, I think the specialized VFIO device would set its
>> compat string including the various substrings. This was done in the
>> past for PL330 which required arm,pl330;arm,primecell.
>>
>>>
>>>> +
>>>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>>>> +
>>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>>> +        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
>>>> +        reg_attr[4*i] = 1;
>>>
>>> What is the 1 here?
>> address-cells? since the bus is < 4GB, 1 32b reg is required to specify
>> the base address. But since you put #size-cells already in the parent
>> node maybe I can remove it.
> 
> I'm confused. Shouldn't the reg look like [ <addr> <size> ... ]?
> 
>   http://www.devicetree.org/Device_Tree_Usage#Memory_Mapped_Devices
> 
> The number of cells is defined separately via #address-cells or #size-cells.

Hi Alex,

sorry my answer was misleading and I was mixing
qemu_fdt_setprop_sized_cells_from_array usage and produced dts syntax.
"1" values effectively correspond to the number of cells respectively
used for addr value and size value. Args of
qemu_fdt_setprop_sized_cells_from_array are pairs (size, value), see
below as a reminder. The fact platform bus node has attributes
#size-cells = <0x1>, and #address-cells = <0x1> forces me to use 1. As a
result the guest dt will look as

/ {
    #address-cells = <1>;
    #size-cells = <1>;

    ...

    serial@101f0000 {
        compatible = "arm,pl011";
        reg = <0x101f0000 0x1000 >;
../..

I hope this clarifies.

Best Regards

Eric

 * qemu_fdt_setprop_sized_cells_from_array:
 * @fdt: device tree blob
 * @node_path: node to set property on
 * @property: property to set
 * @numvalues: number of values
 * @values: array of number-of-cells, value pairs
 *
 * Set the specified property on the specified node in the device tree
 * to be an array of cells. The values of the cells are specified via
 * the values list, which alternates between "number of cells used by
 * this value" and "value".
 * number-of-cells must be either 1 or 2 (other values will result in
 * an error being returned). If a value is too large to fit in the
 * number of cells specified for it, an error is returned.

> 
>>
>>>
>>>> +        reg_attr[4*i+1] = mmio_base;
>>>> +        reg_attr[4*i+2] = 1;
>>>
>>> and here?
>> size-cells for this reg. same remark as above
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
  2014-11-06  8:57         ` Eric Auger
@ 2014-11-06 12:34           ` Alexander Graf
  0 siblings, 0 replies; 43+ messages in thread
From: Alexander Graf @ 2014-11-06 12:34 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 06.11.14 09:57, Eric Auger wrote:
> On 11/05/2014 11:23 PM, Alexander Graf wrote:
>>
>>
>> On 05.11.14 13:31, Eric Auger wrote:
>>> On 11/05/2014 11:59 AM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>> vfio-calxeda-xgmac now can be instantiated using the -device option.
>>>>> The node creation function generates a very basic dt node composed
>>>>> of the compat, reg and interrupts properties
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>
>>>>> ---
>>>>>
>>>>> v6 -> v7:
>>>>> - compat string re-formatting removed since compat string is not exposed
>>>>>   anymore as a user option
>>>>> - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
>>>>>   device
>>>>> ---
>>>>>  hw/arm/sysbus-fdt.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 88 insertions(+)
>>>>>
>>>>> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
>>>>> index d5476f1..f8b310b 100644
>>>>> --- a/hw/arm/sysbus-fdt.c
>>>>> +++ b/hw/arm/sysbus-fdt.c
>>>>> @@ -27,6 +27,8 @@
>>>>>  #include "hw/platform-bus.h"
>>>>>  #include "sysemu/sysemu.h"
>>>>>  #include "hw/platform-bus.h"
>>>>> +#include "hw/vfio/vfio-platform.h"
>>>>> +#include "hw/vfio/vfio-calxeda-xgmac.h"
>>>>>  
>>>>>  /*
>>>>>   * internal struct that contains the information to create dynamic
>>>>> @@ -54,8 +56,11 @@ typedef struct NodeCreationPair {
>>>>>      int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
>>>>>  } NodeCreationPair;
>>>>>  
>>>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque);
>>>>> +
>>>>>  /* list of supported dynamic sysbus devices */
>>>>>  NodeCreationPair add_fdt_node_functions[] = {
>>>>> +        {TYPE_VFIO_CALXEDA_XGMAC, add_basic_vfio_fdt_node},
>>>>>          {"", NULL}, /*last element*/
>>>>>  };
>>>>
>>>> Can you maybe place the list somewhere smartly to make sure we don't
>>>> need forward declarations? Either put it in between the "generic" and
>>>> "device specific" code or at the end of the file with a single forward
>>>> declaration for the array?
>>>
>>> sure
>>>>
>>>>>  
>>>>> @@ -86,6 +91,89 @@ static int add_fdt_node(SysBusDevice *sbdev, void *opaque)
>>>>>  }
>>>>>  
>>>>>  /**
>>>>> + * add_basic_vfio_fdt_node - generates the most basic node for a VFIO node
>>>>> + *
>>>>> + * set properties are:
>>>>> + * - compatible string
>>>>> + * - regs
>>>>> + * - interrupts
>>>>> + */
>>>>> +static int add_basic_vfio_fdt_node(SysBusDevice *sbdev, void *opaque)
>>>>> +{
>>>>> +    PlatformBusFdtData *data = opaque;
>>>>> +    PlatformBusDevice *pbus = data->pbus;
>>>>> +    void *fdt = data->fdt;
>>>>> +    const char *parent_node = data->pbus_node_name;
>>>>> +    int compat_str_len;
>>>>> +    char *nodename;
>>>>> +    int i, ret;
>>>>> +    uint32_t *irq_attr;
>>>>> +    uint64_t *reg_attr;
>>>>> +    uint64_t mmio_base;
>>>>> +    uint64_t irq_number;
>>>>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>>>>> +    Object *obj = OBJECT(sbdev);
>>>>> +
>>>>> +    mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>>>>> +
>>>>> +    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>>>>> +                               vbasedev->name,
>>>>> +                               mmio_base);
>>>>> +
>>>>> +    qemu_fdt_add_subnode(fdt, nodename);
>>>>> +
>>>>> +    compat_str_len = strlen(vdev->compat) + 1;
>>>>> +    qemu_fdt_setprop(fdt, nodename, "compatible",
>>>>> +                          vdev->compat, compat_str_len);
>>>>
>>>> What if there are multiple compatibles?
>>> My purpose here was absolutely not to come back again on a proposal
>>> where we could have a generic node creation. I understand that it is not
>>> realistic. I rather tried to put some common property creation in this
>>> function but you're right even the interrupt prop depend on the device.
>>>
>>> About your question, I think the specialized VFIO device would set its
>>> compat string including the various substrings. This was done in the
>>> past for PL330 which required arm,pl330;arm,primecell.
>>>
>>>>
>>>>> +
>>>>> +    reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>>>>> +
>>>>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>>>>> +        mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
>>>>> +        reg_attr[4*i] = 1;
>>>>
>>>> What is the 1 here?
>>> address-cells? since the bus is < 4GB, 1 32b reg is required to specify
>>> the base address. But since you put #size-cells already in the parent
>>> node maybe I can remove it.
>>
>> I'm confused. Shouldn't the reg look like [ <addr> <size> ... ]?
>>
>>   http://www.devicetree.org/Device_Tree_Usage#Memory_Mapped_Devices
>>
>> The number of cells is defined separately via #address-cells or #size-cells.
> 
> Hi Alex,
> 
> sorry my answer was misleading and I was mixing
> qemu_fdt_setprop_sized_cells_from_array usage and produced dts syntax.
> "1" values effectively correspond to the number of cells respectively
> used for addr value and size value. Args of
> qemu_fdt_setprop_sized_cells_from_array are pairs (size, value), see
> below as a reminder. The fact platform bus node has attributes
> #size-cells = <0x1>, and #address-cells = <0x1> forces me to use 1. As a
> result the guest dt will look as
> 
> / {
>     #address-cells = <1>;
>     #size-cells = <1>;
> 
>     ...
> 
>     serial@101f0000 {
>         compatible = "arm,pl011";
>         reg = <0x101f0000 0x1000 >;
> ../..
> 
> I hope this clarifies.
> 
> Best Regards
> 
> Eric
> 
>  * qemu_fdt_setprop_sized_cells_from_array:
>  * @fdt: device tree blob
>  * @node_path: node to set property on
>  * @property: property to set
>  * @numvalues: number of values
>  * @values: array of number-of-cells, value pairs
>  *
>  * Set the specified property on the specified node in the device tree
>  * to be an array of cells. The values of the cells are specified via
>  * the values list, which alternates between "number of cells used by
>  * this value" and "value".
>  * number-of-cells must be either 1 or 2 (other values will result in
>  * an error being returned). If a value is too large to fit in the
>  * number of cells specified for it, an error is returned.

Ah, what a horrible API :). Maybe we should start introducing functions
that have awareness of what #address-cells and #size-cells are and just
directly set "regs" to an array of uint64_ts.

But this is out of scope for this patch set. Sorry for the fuss.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-05 10:29   ` Alexander Graf
  2014-11-05 12:03     ` Eric Auger
@ 2014-11-26  9:45     ` Eric Auger
  2014-11-26 10:24       ` Alexander Graf
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-26  9:45 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/05/2014 11:29 AM, Alexander Graf wrote:
> 
> 
> On 31.10.14 15:05, Eric Auger wrote:
>> Minimal VFIO platform implementation supporting
>> - register space user mapping,
>> - IRQ assignment based on eventfds handled on qemu side.
>>
>> irqfd kernel acceleration comes in a subsequent patch.
>>
>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>> v6 -> v7:
>> - compat is not exposed anymore as a user option. Rationale is
>>   the vfio device became abstract and a specialization is needed
>>   anyway. The derived device must set the compat string.
>> - in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
>>   A new function dubbed vfio_register_irq_starter replaces it. It
>>   registers a machine init done notifier that programs & starts
>>   all dynamic VFIO device IRQs. This function is supposed to be
>>   called by the machine file. A set of static helper routines are
>>   added too. It must be called before the creation of the platform
>>   bus device.
>>
>> v5 -> v6:
>> - vfio_device property renamed into host property
>> - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
>>   and remove PCI related comment
>> - remove declaration of vfio_setup_irqfd and irqfd_allowed
>>   property.Both belong to next patch (irqfd)
>> - remove declaration of vfio_intp_interrupt in vfio-platform.h
>> - functions that can be static get this characteristic
>> - remove declarations of vfio_region_ops, vfio_memory_listener,
>>   group_list, vfio_address_spaces. All are moved to vfio-common.h
>> - remove vfio_put_device declaration and definition
>> - print_regions removed. code moved into vfio_populate_regions
>> - replace DPRINTF by trace events
>> - new helper routine to set the trigger eventfd
>> - dissociate intp init from the injection enablement:
>>   vfio_enable_intp renamed into vfio_init_intp and new function
>>   named vfio_start_eventfd_injection
>> - injection start moved to vfio_start_irq_injection (not anymore
>>   in vfio_populate_interrupt)
>> - new start_irq_fn field in VFIOPlatformDevice corresponding to
>>   the function that will be used for starting injection
>> - user handled eventfd:
>>   x add mutex to protect IRQ state & list manipulation,
>>   x correct misleading comment in vfio_intp_interrupt.
>>   x Fix bugs thanks to fake interrupt modality
>> - VFIOPlatformDeviceClass becomes abstract
>> - add error_setg in vfio_platform_realize
>>
>> v4 -> v5:
>> - vfio-plaform.h included first
>> - cleanup error handling in *populate*, vfio_get_device,
>>   vfio_enable_intp
>> - vfio_put_device not called anymore
>> - add some includes to follow vfio policy
>>
>> v3 -> v4:
>> [Eric Auger]
>> - merge of "vfio: Add initial IRQ support in platform device"
>>   to get a full functional patch although perfs are limited.
>> - removal of unrealize function since I currently understand
>>   it is only used with device hot-plug feature.
>>
>> v2 -> v3:
>> [Eric Auger]
>> - further factorization between PCI and platform (VFIORegion,
>>   VFIODevice). same level of functionality.
>>
>> <= v2:
>> [Kim Philipps]
>> - Initial Creation of the device supporting register space mapping
>> ---
>>  hw/vfio/Makefile.objs           |   1 +
>>  hw/vfio/platform.c              | 672 ++++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio-common.h   |   1 +
>>  include/hw/vfio/vfio-platform.h |  87 ++++++
>>  trace-events                    |  12 +
>>  5 files changed, 773 insertions(+)
>>  create mode 100644 hw/vfio/platform.c
>>  create mode 100644 include/hw/vfio/vfio-platform.h
>>
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..c5c76fe 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,5 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>>  obj-$(CONFIG_PCI) += pci.o
>> +obj-$(CONFIG_SOFTMMU) += platform.o
>>  endif
>> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
>> new file mode 100644
>> index 0000000..9f66610
>> --- /dev/null
>> +++ b/hw/vfio/platform.c
>> @@ -0,0 +1,672 @@
>> +/*
>> + * vfio based device assignment support - platform devices
>> + *
>> + * Copyright Linaro Limited, 2014
>> + *
>> + * Authors:
>> + *  Kim Phillips <kim.phillips@linaro.org>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Based on vfio based PCI device assignment support:
>> + *  Copyright Red Hat, Inc. 2012
>> + */
>> +
>> +#include <linux/vfio.h>
>> +#include <sys/ioctl.h>
>> +
>> +#include "hw/vfio/vfio-platform.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/range.h"
>> +#include "sysemu/sysemu.h"
>> +#include "exec/memory.h"
>> +#include "qemu/queue.h"
>> +#include "hw/sysbus.h"
>> +#include "trace.h"
>> +#include "hw/platform-bus.h"
>> +
>> +static void vfio_intp_interrupt(VFIOINTp *intp);
>> +typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>> +                                    eventfd_user_side_handler_t handler);
>> +
>> +/*
>> + * Functions only used when eventfd are handled on user-side
>> + * ie. without irqfd
>> + */
>> +
>> +/**
>> + * vfio_platform_eoi - IRQ completion routine
>> + * @vbasedev: the VFIO device
>> + *
>> + * de-asserts the active virtual IRQ and unmask the physical IRQ
>> + * (masked by the  VFIO driver). Handle pending IRQs if any.
>> + * eoi function is called on the first access to any MMIO region
>> + * after an IRQ was triggered. It is assumed this access corresponds
>> + * to the IRQ status register reset. With such a mechanism, a single
>> + * IRQ can be handled at a time since there is no way to know which
>> + * IRQ was completed by the guest (we would need additional details
>> + * about the IRQ status register mask)
>> + */
>> +static void vfio_platform_eoi(VFIODevice *vbasedev)
>> +{
>> +    VFIOINTp *intp;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>> +        if (intp->state == VFIO_IRQ_ACTIVE) {
>> +            trace_vfio_platform_eoi(intp->pin,
>> +                                event_notifier_get_fd(&intp->interrupt));
>> +            intp->state = VFIO_IRQ_INACTIVE;
>> +
>> +            /* deassert the virtual IRQ and unmask physical one */
>> +            qemu_set_irq(intp->qemuirq, 0);
>> +            vfio_unmask_irqindex(vbasedev, intp->pin);
>> +
>> +            /* a single IRQ can be active at a time */
>> +            break;
>> +        }
>> +    }
>> +    /* in case there are pending IRQs, handle them one at a time */
>> +    if (!QSIMPLEQ_EMPTY(&vdev->pending_intp_queue)) {
>> +        intp = QSIMPLEQ_FIRST(&vdev->pending_intp_queue);
>> +        trace_vfio_platform_eoi_handle_pending(intp->pin);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +        vfio_intp_interrupt(intp);
>> +        qemu_mutex_lock(&vdev->intp_mutex);
>> +        QSIMPLEQ_REMOVE_HEAD(&vdev->pending_intp_queue, pqnext);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +    } else {
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_mmap_set_enabled - enable/disable the fast path mode
>> + * @vdev: the VFIO platform device
>> + * @enabled: the target mmap state
>> + *
>> + * true ~ fast path = MMIO region is mmaped (no KVM TRAP)
>> + * false ~ slow path = MMIO region is trapped and region callbacks
>> + * are called slow path enables to trap the IRQ status register
>> + * guest reset
>> +*/
>> +
>> +static void vfio_mmap_set_enabled(VFIOPlatformDevice *vdev, bool enabled)
>> +{
>> +    VFIORegion *region;
>> +    int i;
>> +
>> +    trace_vfio_platform_mmap_set_enabled(enabled);
>> +
>> +    for (i = 0; i < vdev->vbasedev.num_regions; i++) {
>> +        region = vdev->regions[i];
>> +
>> +        /* register space is unmapped to trap EOI */
>> +        memory_region_set_enabled(&region->mmap_mem, enabled);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_intp_mmap_enable - timer function, restores the fast path
>> + * if there is no more active IRQ
>> + * @opaque: actually points to the VFIO platform device
>> + *
>> + * Called on mmap timer timout, this function checks whether the
>> + * IRQ is still active and in the negative restores the fast path.
>> + * by construction a single eventfd is handled at a time.
>> + * if the IRQ is still active, the timer is restarted.
>> + */
>> +static void vfio_intp_mmap_enable(void *opaque)
>> +{
>> +    VFIOINTp *tmp;
>> +    VFIOPlatformDevice *vdev = (VFIOPlatformDevice *)opaque;
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +        if (tmp->state == VFIO_IRQ_ACTIVE) {
>> +            trace_vfio_platform_intp_mmap_enable(tmp->pin);
>> +            /* re-program the timer to check active status later */
>> +            timer_mod(vdev->mmap_timer,
>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                          vdev->mmap_timeout);
>> +            qemu_mutex_unlock(&vdev->intp_mutex);
>> +            return;
>> +        }
>> +    }
>> +    vfio_mmap_set_enabled(vdev, true);
>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>> +}
>> +
>> +/**
>> + * vfio_intp_interrupt - The user-side eventfd handler
>> + * @opaque: opaque pointer which in practice is the VFIOINTp*
>> + *
>> + * the function can be entered
>> + * - in event handler context: this IRQ is inactive
>> + *   in that case, the vIRQ is injected into the guest if there
>> + *   is no other active or pending IRQ.
>> + * - in IOhandler context: this IRQ is pending.
>> + *   there is no ACTIVE IRQ
>> + */
>> +static void vfio_intp_interrupt(VFIOINTp *intp)
>> +{
>> +    int ret;
>> +    VFIOINTp *tmp;
>> +    VFIOPlatformDevice *vdev = intp->vdev;
>> +    bool delay_handling = false;
>> +
>> +    qemu_mutex_lock(&vdev->intp_mutex);
>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>> +        QLIST_FOREACH(tmp, &vdev->intp_list, next) {
>> +            if (tmp->state == VFIO_IRQ_ACTIVE ||
>> +                tmp->state == VFIO_IRQ_PENDING) {
>> +                delay_handling = true;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +    if (delay_handling) {
>> +        /*
>> +         * the new IRQ gets a pending status and is pushed in
>> +         * the pending queue
>> +         */
>> +        intp->state = VFIO_IRQ_PENDING;
>> +        trace_vfio_intp_interrupt_set_pending(intp->pin);
>> +        QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
>> +                             intp, pqnext);
>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>> +        qemu_mutex_unlock(&vdev->intp_mutex);
>> +        return;
>> +    }
>> +
>> +    /* no active IRQ, the new IRQ can be forwarded to the guest */
>> +    trace_vfio_platform_intp_interrupt(intp->pin,
>> +                              event_notifier_get_fd(&intp->interrupt));
>> +
>> +    if (intp->state == VFIO_IRQ_INACTIVE) {
>> +        ret = event_notifier_test_and_clear(&intp->interrupt);
>> +        if (!ret) {
>> +            error_report("Error when clearing fd=%d (ret = %d)\n",
>> +                         event_notifier_get_fd(&intp->interrupt), ret);
>> +        }
>> +    } /* else this is a pending IRQ that moves to ACTIVE state */
>> +
>> +    intp->state = VFIO_IRQ_ACTIVE;
>> +
>> +    /* sets slow path */
>> +    vfio_mmap_set_enabled(vdev, false);
>> +
>> +    /* trigger the virtual IRQ */
>> +    qemu_set_irq(intp->qemuirq, 1);
>> +
>> +    /* schedule the mmap timer which will restore mmap path after EOI*/
>> +    if (vdev->mmap_timeout) {
>> +        timer_mod(vdev->mmap_timer,
>> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> +                      vdev->mmap_timeout);
>> +    }
>> +    qemu_mutex_unlock(&vdev->intp_mutex);
>> +}
>> +
>> +/**
>> + * vfio_start_eventfd_injection - starts the virtual IRQ injection using
>> + * user-side handled eventfds
>> + * @intp: the IRQ struct pointer
>> + */
>> +
>> +static int vfio_start_eventfd_injection(VFIOINTp *intp)
>> +{
>> +    int ret;
>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>> +
>> +    vfio_mask_irqindex(vbasedev, intp->pin);
>> +
>> +    ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
>> +    if (ret) {
>> +        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
>> +        vfio_unmask_irqindex(vbasedev, intp->pin);
>> +        return ret;
>> +    }
>> +    vfio_unmask_irqindex(vbasedev, intp->pin);
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Functions used whatever the injection method
>> + */
>> +
>> +/**
>> + * vfio_set_trigger_eventfd - set VFIO eventfd handling
>> + * ie. program the VFIO driver to associates a given IRQ index
>> + * with a fd handler
>> + *
>> + * @intp: IRQ struct pointer
>> + * @handler: handler to be called on eventfd trigger
>> + */
>> +static int vfio_set_trigger_eventfd(VFIOINTp *intp,
>> +                                    eventfd_user_side_handler_t handler)
>> +{
>> +    VFIODevice *vbasedev = &intp->vdev->vbasedev;
>> +    struct vfio_irq_set *irq_set;
>> +    int argsz, ret;
>> +    int32_t *pfd;
>> +
>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>> +    irq_set = g_malloc0(argsz);
>> +    irq_set->argsz = argsz;
>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
>> +    irq_set->index = intp->pin;
>> +    irq_set->start = 0;
>> +    irq_set->count = 1;
>> +    pfd = (int32_t *)&irq_set->data;
>> +    *pfd = event_notifier_get_fd(&intp->interrupt);
>> +    qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    g_free(irq_set);
>> +    if (ret < 0) {
>> +        error_report("vfio: Failed to set trigger eventfd: %m");
>> +        qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
>> +    }
>> +    return ret;
>> +}
>> +
>> +/* not implemented yet */
>> +static bool vfio_platform_compute_needs_reset(VFIODevice *vdev)
>> +{
>> +return false;
>> +}
>> +
>> +/* not implemented yet */
>> +static int vfio_platform_hot_reset_multi(VFIODevice *vdev)
>> +{
>> +return 0;
>> +}
>> +
>> +/**
>> + * vfio_init_intp - allocate, initialize the IRQ struct pointer
>> + * and add it into the list of IRQ
>> + * @vbasedev: the VFIO device
>> + * @index: VFIO device IRQ index
>> + */
>> +static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index)
>> +{
>> +    int ret;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(vdev);
>> +    VFIOINTp *intp;
>> +
>> +    /* allocate and populate a new VFIOINTp structure put in a queue list */
>> +    intp = g_malloc0(sizeof(*intp));
>> +    intp->vdev = vdev;
>> +    intp->pin = index;
>> +    intp->state = VFIO_IRQ_INACTIVE;
>> +    sysbus_init_irq(sbdev, &intp->qemuirq);
>> +
>> +    /* Get an eventfd for trigger */
>> +    ret = event_notifier_init(&intp->interrupt, 0);
>> +    if (ret) {
>> +        g_free(intp);
>> +        error_report("vfio: Error: trigger event_notifier_init failed ");
>> +        return NULL;
>> +    }
>> +
>> +    /* store the new intp in qlist */
>> +    QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
>> +    return intp;
>> +}
>> +
>> +/**
>> + * vfio_populate_device - initialize MMIO region and IRQ
>> + * @vbasedev: the VFIO device
>> + *
>> + * query the VFIO device for exposed MMIO regions and IRQ and
>> + * populate the associated fields in the device struct
>> + */
>> +static int vfio_populate_device(VFIODevice *vbasedev)
>> +{
>> +    struct vfio_irq_info irq = { .argsz = sizeof(irq) };
>> +    struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
>> +    VFIOINTp *intp;
>> +    int i, ret = 0;
>> +    VFIOPlatformDevice *vdev =
>> +        container_of(vbasedev, VFIOPlatformDevice, vbasedev);
>> +
>> +    vdev->regions = g_malloc0(sizeof(VFIORegion *) * vbasedev->num_regions);
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vdev->regions[i] = g_malloc0(sizeof(VFIORegion));
>> +        reg_info.index = i;
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, &reg_info);
>> +        if (ret) {
>> +            error_report("vfio: Error getting region %d info: %m", i);
>> +            goto error;
>> +        }
>> +        vdev->regions[i]->flags = reg_info.flags;
>> +        vdev->regions[i]->size = reg_info.size;
>> +        vdev->regions[i]->fd_offset = reg_info.offset;
>> +        vdev->regions[i]->nr = i;
>> +        vdev->regions[i]->vbasedev = vbasedev;
>> +
>> +        trace_vfio_platform_populate_regions(vdev->regions[i]->nr,
>> +                            (unsigned long)vdev->regions[i]->flags,
>> +                            (unsigned long)vdev->regions[i]->size,
>> +                            vdev->regions[i]->vbasedev->fd,
>> +                            (unsigned long)vdev->regions[i]->fd_offset);
>> +    }
>> +
>> +    vdev->mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +                                    vfio_intp_mmap_enable, vdev);
>> +
>> +    QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>> +
>> +    for (i = 0; i < vbasedev->num_irqs; i++) {
>> +        irq.index = i;
>> +
>> +        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>> +        if (ret) {
>> +            error_printf("vfio: error getting device %s irq info",
>> +                         vbasedev->name);
>> +            return ret;
>> +        } else {
>> +            trace_vfio_platform_populate_interrupts(irq.index,
>> +                                                    irq.count,
>> +                                                    irq.flags);
>> +            intp = vfio_init_intp(vbasedev, irq.index);
>> +            if (!intp) {
>> +                error_report("vfio: Error installing IRQ %d up", i);
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +    return 0;
>> +error:
>> +    return ret;
>> +}
>> +
>> +/*
>> + * vfio_start_irq_injection - associates a virtual irq to a
>> + * VFIO IRQ index and start the injection of this IRQ
>> + * @s: SysBus Device
>> + * @index: VFIO IRQ index
>> + * @virq: the virtual IRQ number, aka gsi
>> + *
>> + * this function is called when the device tree is built
>> + */
>> +static void vfio_start_irq_injection(SysBusDevice *s, int index, int virq)
>> +{
>> +    VFIOPlatformDevice *vdev = container_of(s, VFIOPlatformDevice, sbdev);
>> +    VFIOINTp *intp;
>> +
>> +    QLIST_FOREACH(intp, &vdev->intp_list, next) {
>> +        if (intp->pin == index) {
>> +            intp->virtualID = virq;
>> +            vdev->start_irq_fn(intp);
>> +        }
>> +    }
>> +}
>> +
>> +/* specialized functions ofr VFIO Platform devices */
>> +static VFIODeviceOps vfio_platform_ops = {
>> +    .vfio_compute_needs_reset = vfio_platform_compute_needs_reset,
>> +    .vfio_hot_reset_multi = vfio_platform_hot_reset_multi,
>> +    .vfio_eoi = vfio_platform_eoi,
>> +    .vfio_populate_device = vfio_populate_device,
>> +};
>> +
>> +/**
>> + * vfio_base_device_init - implements some of the VFIO mechanics
>> + * @vbasedev: the VFIO device
>> + *
>> + * retrieves the group the device belongs to and get the device fd
>> + * returns the VFIO device fd
>> + * precondition: the device name must be initialized
>> + */
>> +static int vfio_base_device_init(VFIODevice *vbasedev)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev_iter;
>> +    char path[PATH_MAX], iommu_group_path[PATH_MAX], *group_name;
>> +    ssize_t len;
>> +    struct stat st;
>> +    int groupid;
>> +    int ret;
>> +
>> +    /* name must be set prior to the call */
>> +    if (!vbasedev->name) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Check that the host device exists */
>> +    snprintf(path, sizeof(path), "/sys/bus/platform/devices/%s/",
>> +             vbasedev->name);
>> +
>> +    if (stat(path, &st) < 0) {
>> +        error_report("vfio: error: no such host device: %s", path);
>> +        return -errno;
>> +    }
>> +
>> +    strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
>> +    len = readlink(path, iommu_group_path, sizeof(path));
>> +    if (len <= 0 || len >= sizeof(path)) {
>> +        error_report("vfio: error no iommu_group for device");
>> +        return len < 0 ? -errno : ENAMETOOLONG;
>> +    }
>> +
>> +    iommu_group_path[len] = 0;
>> +    group_name = basename(iommu_group_path);
>> +
>> +    if (sscanf(group_name, "%d", &groupid) != 1) {
>> +        error_report("vfio: error reading %s: %m", path);
>> +        return -errno;
>> +    }
>> +
>> +    trace_vfio_platform_base_device_init(vbasedev->name, groupid);
>> +
>> +    group = vfio_get_group(groupid, &address_space_memory);
>> +    if (!group) {
>> +        error_report("vfio: failed to get group %d", groupid);
>> +        return -ENOENT;
>> +    }
>> +
>> +    snprintf(path, sizeof(path), "%s", vbasedev->name);
>> +
>> +    QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
>> +        if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
>> +            error_report("vfio: error: device %s is already attached", path);
>> +            vfio_put_group(group);
>> +            return -EBUSY;
>> +        }
>> +    }
>> +    ret = vfio_get_device(group, path, vbasedev);
>> +    if (ret) {
>> +        error_report("vfio: failed to get device %s", path);
>> +        vfio_put_group(group);
>> +    }
>> +    return ret;
>> +}
>> +
>> +/**
>> + * vfio_map_region - initialize the 2 mr (mmapped on ops) for a
>> + * given index
>> + * @vdev: the VFIO platform device
>> + * @nr: the index of the region
>> + *
>> + * init the top memory region and the mmapped memroy region beneath
>> + * VFIOPlatformDevice is used since VFIODevice is not a QOM Object
>> + * and could not be passed to memory region functions
>> +*/
>> +static void vfio_map_region(VFIOPlatformDevice *vdev, int nr)
>> +{
>> +    VFIORegion *region = vdev->regions[nr];
>> +    unsigned size = region->size;
>> +    char name[64];
>> +
>> +    if (!size) {
>> +        return;
>> +    }
>> +
>> +    snprintf(name, sizeof(name), "VFIO %s region %d",
>> +             vdev->vbasedev.name, nr);
>> +
>> +    /* A "slow" read/write mapping underlies all regions */
>> +    memory_region_init_io(&region->mem, OBJECT(vdev), &vfio_region_ops,
>> +                          region, name, size);
>> +
>> +    strncat(name, " mmap", sizeof(name) - strlen(name) - 1);
>> +
>> +    if (vfio_mmap_region(OBJECT(vdev), region, &region->mem,
>> +                         &region->mmap_mem, &region->mmap, size, 0, name)) {
>> +        error_report("%s unsupported. Performance may be slow", name);
>> +    }
>> +}
>> +
>> +/**
>> + * vfio_platform_realize  - the device realize function
>> + * @dev: device state pointer
>> + * @errp: error
>> + *
>> + * initialize the device, its memory regions and IRQ structures
>> + * IRQ are started separately
>> + */
>> +static void vfio_platform_realize(DeviceState *dev, Error **errp)
>> +{
>> +    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
>> +    SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
>> +    VFIODevice *vbasedev = &vdev->vbasedev;
>> +    int i, ret;
>> +
>> +    vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
>> +    vbasedev->ops = &vfio_platform_ops;
>> +    vdev->start_irq_fn = vfio_start_eventfd_injection;
>> +
>> +    trace_vfio_platform_realize(vbasedev->name, vdev->compat);
>> +
>> +    ret = vfio_base_device_init(vbasedev);
>> +    if (ret) {
>> +        error_setg(errp, "vfio: vfio_base_device_init failed for %s",
>> +                   vbasedev->name);
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < vbasedev->num_regions; i++) {
>> +        vfio_map_region(vdev, i);
>> +        sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
>> +    }
>> +}
>> +
>> +/*
>> + * Mechanics to program/start irq injection on machine init done notifier:
>> + * this is needed since at finalize time, the device IRQ are not yet
>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>> + * always is used. Binding to the platform bus IRQ happens on a machine
>> + * init done notifier registered by the machine file. After its execution
>> + * we execute a new notifier that actually starts the injection. When using
>> + * irqfd, programming the injection consists in associating eventfds to
>> + * GSI number,ie. virtual IRQ number
>> + */
>> +
>> +typedef struct VfioIrqStarterNotifierParams {
>> +    unsigned int platform_bus_first_irq;
>> +    Notifier notifier;
>> +} VfioIrqStarterNotifierParams;
>> +
>> +typedef struct VfioIrqStartParams {
>> +    PlatformBusDevice *pbus;
>> +    int platform_bus_first_irq;
>> +} VfioIrqStartParams;
>> +
>> +/* Start injection of IRQ for a specific VFIO device */
>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>> +{
>> +    int i;
>> +    VfioIrqStartParams *p = opaque;
>> +    VFIOPlatformDevice *vdev;
>> +    VFIODevice *vbasedev;
>> +    uint64_t irq_number;
>> +    PlatformBusDevice *pbus = p->pbus;
>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>> +
>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +        vbasedev = &vdev->vbasedev;
>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>> +                             + platform_bus_first_irq;
>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>> +        }
>> +    }
>> +    return 0;
>> +}
>> +
>> +/* loop on all VFIO platform devices and start their IRQ injection */
>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>> +{
>> +    VfioIrqStarterNotifierParams *p =
>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>> +    DeviceState *dev =
>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>> +
>> +    if (pbus->done_gathering) {
>> +        VfioIrqStartParams data = {
>> +            .pbus = pbus,
>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>> +        };
>> +
>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>> +    }
>> +}
>> +
>> +/* registers the machine init done notifier that will start VFIO IRQ */
>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>> +{
>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>> +
>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>> +    p->notifier.notify = vfio_irq_starter_notify;
>> +    qemu_add_machine_init_done_notifier(&p->notifier);
> 
> Could you add a notifier for each device instead? Then the notifier
> would be part of the vfio device struct and not some dangling random
> pointer :).
> 
> Of course instead of foreach_dynamic_sysbus_device() you would directly
> know the device you're dealing with and only handle a single device per
> notifier.

Hi Alex,

I don't see how to practically follow your request:

- at machine init time, VFIO devices are not yet instantiated so I
cannot call foreach_dynamic_sysbus_device() there - I was definitively
wrong in my first reply :-().

- I can't register a per VFIO device notifier in the VFIO device
finalize function because this latter is called after the platform bus
instantiation. So the IRQ binding notifier (registered in platform bus
finalize fn) would be called after the IRQ starter notifier.

- then to simplify things a bit I could use a qemu_register_reset in
place of a machine init done notifier (would relax the call order
constraint) but the problem consists in passing the platform bus first
irq (all the more so you requested it became part of a const struct)

Do I miss something?

Best Regards

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-26  9:45     ` Eric Auger
@ 2014-11-26 10:24       ` Alexander Graf
  2014-11-26 10:48         ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-26 10:24 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 26.11.14 10:45, Eric Auger wrote:
> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>
>>
>> On 31.10.14 15:05, Eric Auger wrote:
>>> Minimal VFIO platform implementation supporting
>>> - register space user mapping,
>>> - IRQ assignment based on eventfds handled on qemu side.
>>>
>>> irqfd kernel acceleration comes in a subsequent patch.
>>>
>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>

[...]

>>> +/*
>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>> + * this is needed since at finalize time, the device IRQ are not yet
>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>> + * init done notifier registered by the machine file. After its execution
>>> + * we execute a new notifier that actually starts the injection. When using
>>> + * irqfd, programming the injection consists in associating eventfds to
>>> + * GSI number,ie. virtual IRQ number
>>> + */
>>> +
>>> +typedef struct VfioIrqStarterNotifierParams {
>>> +    unsigned int platform_bus_first_irq;
>>> +    Notifier notifier;
>>> +} VfioIrqStarterNotifierParams;
>>> +
>>> +typedef struct VfioIrqStartParams {
>>> +    PlatformBusDevice *pbus;
>>> +    int platform_bus_first_irq;
>>> +} VfioIrqStartParams;
>>> +
>>> +/* Start injection of IRQ for a specific VFIO device */
>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>> +{
>>> +    int i;
>>> +    VfioIrqStartParams *p = opaque;
>>> +    VFIOPlatformDevice *vdev;
>>> +    VFIODevice *vbasedev;
>>> +    uint64_t irq_number;
>>> +    PlatformBusDevice *pbus = p->pbus;
>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>> +
>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>> +        vbasedev = &vdev->vbasedev;
>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>> +                             + platform_bus_first_irq;
>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>> +{
>>> +    VfioIrqStarterNotifierParams *p =
>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>> +    DeviceState *dev =
>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>> +
>>> +    if (pbus->done_gathering) {
>>> +        VfioIrqStartParams data = {
>>> +            .pbus = pbus,
>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>> +        };
>>> +
>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>> +    }
>>> +}
>>> +
>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>> +{
>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>> +
>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>
>> Could you add a notifier for each device instead? Then the notifier
>> would be part of the vfio device struct and not some dangling random
>> pointer :).
>>
>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>> know the device you're dealing with and only handle a single device per
>> notifier.
> 
> Hi Alex,
> 
> I don't see how to practically follow your request:
> 
> - at machine init time, VFIO devices are not yet instantiated so I
> cannot call foreach_dynamic_sysbus_device() there - I was definitively
> wrong in my first reply :-().
> 
> - I can't register a per VFIO device notifier in the VFIO device
> finalize function because this latter is called after the platform bus
> instantiation. So the IRQ binding notifier (registered in platform bus
> finalize fn) would be called after the IRQ starter notifier.
> 
> - then to simplify things a bit I could use a qemu_register_reset in
> place of a machine init done notifier (would relax the call order
> constraint) but the problem consists in passing the platform bus first
> irq (all the more so you requested it became part of a const struct)
> 
> Do I miss something?

So the basic idea is that the device itself calls
qemu_add_machine_init_done_notifier() in its realize function. The
Notifier struct would be part of the device state which means you can
cast yourself into the VFIO device state.

At that point the IRQ allocation should have already happened, so your
IRQ objects are populated. You can then ask the KVM GIC to convert that
qemu_irq object to a GIC IRQ ID that you can then use in your ioctl I
suppose.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-26 10:24       ` Alexander Graf
@ 2014-11-26 10:48         ` Eric Auger
  2014-11-26 11:20           ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-26 10:48 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/26/2014 11:24 AM, Alexander Graf wrote:
> 
> 
> On 26.11.14 10:45, Eric Auger wrote:
>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>
>>>
>>> On 31.10.14 15:05, Eric Auger wrote:
>>>> Minimal VFIO platform implementation supporting
>>>> - register space user mapping,
>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>
>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>
>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> [...]
> 
>>>> +/*
>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>> + * init done notifier registered by the machine file. After its execution
>>>> + * we execute a new notifier that actually starts the injection. When using
>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>> + * GSI number,ie. virtual IRQ number
>>>> + */
>>>> +
>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>> +    unsigned int platform_bus_first_irq;
>>>> +    Notifier notifier;
>>>> +} VfioIrqStarterNotifierParams;
>>>> +
>>>> +typedef struct VfioIrqStartParams {
>>>> +    PlatformBusDevice *pbus;
>>>> +    int platform_bus_first_irq;
>>>> +} VfioIrqStartParams;
>>>> +
>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>> +{
>>>> +    int i;
>>>> +    VfioIrqStartParams *p = opaque;
>>>> +    VFIOPlatformDevice *vdev;
>>>> +    VFIODevice *vbasedev;
>>>> +    uint64_t irq_number;
>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>> +
>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>> +        vbasedev = &vdev->vbasedev;
>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>> +                             + platform_bus_first_irq;
>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>> +{
>>>> +    VfioIrqStarterNotifierParams *p =
>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>> +    DeviceState *dev =
>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>> +
>>>> +    if (pbus->done_gathering) {
>>>> +        VfioIrqStartParams data = {
>>>> +            .pbus = pbus,
>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>> +        };
>>>> +
>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>> +{
>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>> +
>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>
>>> Could you add a notifier for each device instead? Then the notifier
>>> would be part of the vfio device struct and not some dangling random
>>> pointer :).
>>>
>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>> know the device you're dealing with and only handle a single device per
>>> notifier.
>>
>> Hi Alex,
>>
>> I don't see how to practically follow your request:
>>
>> - at machine init time, VFIO devices are not yet instantiated so I
>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>> wrong in my first reply :-().
>>
>> - I can't register a per VFIO device notifier in the VFIO device
>> finalize function because this latter is called after the platform bus
>> instantiation. So the IRQ binding notifier (registered in platform bus
>> finalize fn) would be called after the IRQ starter notifier.
>>
>> - then to simplify things a bit I could use a qemu_register_reset in
>> place of a machine init done notifier (would relax the call order
>> constraint) but the problem consists in passing the platform bus first
>> irq (all the more so you requested it became part of a const struct)
>>
>> Do I miss something?
> 
> So the basic idea is that the device itself calls
> qemu_add_machine_init_done_notifier() in its realize function. The
> Notifier struct would be part of the device state which means you can
> cast yourself into the VFIO device state.

humm, the vfio device is instantiated in the cmd line so after the
machine init. This means 1st the platform bus binding notifier is
registered (in platform bus realize) and then VFIO irq starter notifiers
are registered (in VFIO realize). Notifiers beeing executed in the
reverse order of their registration, this would fail. Am I wrong?
> 
> At that point the IRQ allocation should have already happened, so your
> IRQ objects are populated. You can then ask the KVM GIC to convert that
> qemu_irq object to a GIC IRQ ID that you can then use in your ioctl I
> suppose.
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-26 10:48         ` Eric Auger
@ 2014-11-26 11:20           ` Alexander Graf
  2014-11-26 14:46             ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-26 11:20 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 26.11.14 11:48, Eric Auger wrote:
> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>
>>
>> On 26.11.14 10:45, Eric Auger wrote:
>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>> Minimal VFIO platform implementation supporting
>>>>> - register space user mapping,
>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>
>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>
>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> [...]
>>
>>>>> +/*
>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>> + * init done notifier registered by the machine file. After its execution
>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>> + * GSI number,ie. virtual IRQ number
>>>>> + */
>>>>> +
>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>> +    unsigned int platform_bus_first_irq;
>>>>> +    Notifier notifier;
>>>>> +} VfioIrqStarterNotifierParams;
>>>>> +
>>>>> +typedef struct VfioIrqStartParams {
>>>>> +    PlatformBusDevice *pbus;
>>>>> +    int platform_bus_first_irq;
>>>>> +} VfioIrqStartParams;
>>>>> +
>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>> +{
>>>>> +    int i;
>>>>> +    VfioIrqStartParams *p = opaque;
>>>>> +    VFIOPlatformDevice *vdev;
>>>>> +    VFIODevice *vbasedev;
>>>>> +    uint64_t irq_number;
>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>> +
>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>> +        vbasedev = &vdev->vbasedev;
>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>> +                             + platform_bus_first_irq;
>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>> +        }
>>>>> +    }
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>> +{
>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>> +    DeviceState *dev =
>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>> +
>>>>> +    if (pbus->done_gathering) {
>>>>> +        VfioIrqStartParams data = {
>>>>> +            .pbus = pbus,
>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>> +        };
>>>>> +
>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>> +{
>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>> +
>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>
>>>> Could you add a notifier for each device instead? Then the notifier
>>>> would be part of the vfio device struct and not some dangling random
>>>> pointer :).
>>>>
>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>> know the device you're dealing with and only handle a single device per
>>>> notifier.
>>>
>>> Hi Alex,
>>>
>>> I don't see how to practically follow your request:
>>>
>>> - at machine init time, VFIO devices are not yet instantiated so I
>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>> wrong in my first reply :-().
>>>
>>> - I can't register a per VFIO device notifier in the VFIO device
>>> finalize function because this latter is called after the platform bus
>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>> finalize fn) would be called after the IRQ starter notifier.
>>>
>>> - then to simplify things a bit I could use a qemu_register_reset in
>>> place of a machine init done notifier (would relax the call order
>>> constraint) but the problem consists in passing the platform bus first
>>> irq (all the more so you requested it became part of a const struct)
>>>
>>> Do I miss something?
>>
>> So the basic idea is that the device itself calls
>> qemu_add_machine_init_done_notifier() in its realize function. The
>> Notifier struct would be part of the device state which means you can
>> cast yourself into the VFIO device state.
> 
> humm, the vfio device is instantiated in the cmd line so after the
> machine init. This means 1st the platform bus binding notifier is
> registered (in platform bus realize) and then VFIO irq starter notifiers
> are registered (in VFIO realize). Notifiers beeing executed in the
> reverse order of their registration, this would fail. Am I wrong?

Bleks. Ok, I see 2 ways out of this:

  1) Create a TailNotifier and convert the machine_init_done notifiers
to this

  2) Add an "irq now populated" notifier function callback in a new
PlatformBusDeviceClass struct that you use to describe the
PlatformBusDevice class. Call all children's notifiers from the
machine_init notifier in the platform bus.

The more I think about it, the more I prefer option 2 I think.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-26 11:20           ` Alexander Graf
@ 2014-11-26 14:46             ` Eric Auger
  2014-11-27 14:05               ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-26 14:46 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/26/2014 12:20 PM, Alexander Graf wrote:
> 
> 
> On 26.11.14 11:48, Eric Auger wrote:
>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>
>>>
>>> On 26.11.14 10:45, Eric Auger wrote:
>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>> Minimal VFIO platform implementation supporting
>>>>>> - register space user mapping,
>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>
>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>
>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> [...]
>>>
>>>>>> +/*
>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>> + */
>>>>>> +
>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>> +    Notifier notifier;
>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>> +
>>>>>> +typedef struct VfioIrqStartParams {
>>>>>> +    PlatformBusDevice *pbus;
>>>>>> +    int platform_bus_first_irq;
>>>>>> +} VfioIrqStartParams;
>>>>>> +
>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>> +    VFIODevice *vbasedev;
>>>>>> +    uint64_t irq_number;
>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>> +
>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>> +                             + platform_bus_first_irq;
>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>> +        }
>>>>>> +    }
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>> +{
>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>> +    DeviceState *dev =
>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>> +
>>>>>> +    if (pbus->done_gathering) {
>>>>>> +        VfioIrqStartParams data = {
>>>>>> +            .pbus = pbus,
>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>> +        };
>>>>>> +
>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>> +{
>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>> +
>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>
>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>> would be part of the vfio device struct and not some dangling random
>>>>> pointer :).
>>>>>
>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>> know the device you're dealing with and only handle a single device per
>>>>> notifier.
>>>>
>>>> Hi Alex,
>>>>
>>>> I don't see how to practically follow your request:
>>>>
>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>> wrong in my first reply :-().
>>>>
>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>> finalize function because this latter is called after the platform bus
>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>
>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>> place of a machine init done notifier (would relax the call order
>>>> constraint) but the problem consists in passing the platform bus first
>>>> irq (all the more so you requested it became part of a const struct)
>>>>
>>>> Do I miss something?
>>>
>>> So the basic idea is that the device itself calls
>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>> Notifier struct would be part of the device state which means you can
>>> cast yourself into the VFIO device state.
>>
>> humm, the vfio device is instantiated in the cmd line so after the
>> machine init. This means 1st the platform bus binding notifier is
>> registered (in platform bus realize) and then VFIO irq starter notifiers
>> are registered (in VFIO realize). Notifiers beeing executed in the
>> reverse order of their registration, this would fail. Am I wrong?
> 
> Bleks. Ok, I see 2 ways out of this:
> 
>   1) Create a TailNotifier and convert the machine_init_done notifiers
> to this
> 
>   2) Add an "irq now populated" notifier function callback in a new
> PlatformBusDeviceClass struct that you use to describe the
> PlatformBusDevice class. Call all children's notifiers from the
> machine_init notifier in the platform bus.
> 
> The more I think about it, the more I prefer option 2 I think.
Hi Alex,

ok I work on 2)

Thanks for your guidance

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-26 14:46             ` Eric Auger
@ 2014-11-27 14:05               ` Eric Auger
  2014-11-27 14:35                 ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-27 14:05 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/26/2014 03:46 PM, Eric Auger wrote:
> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>
>>
>> On 26.11.14 11:48, Eric Auger wrote:
>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>> - register space user mapping,
>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>
>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>
>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>
>>>> [...]
>>>>
>>>>>>> +/*
>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>> + */
>>>>>>> +
>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>> +    Notifier notifier;
>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>> +
>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>> +    int platform_bus_first_irq;
>>>>>>> +} VfioIrqStartParams;
>>>>>>> +
>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>> +    VFIODevice *vbasedev;
>>>>>>> +    uint64_t irq_number;
>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>> +
>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>> +                             + platform_bus_first_irq;
>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>> +{
>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>> +    DeviceState *dev =
>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>> +
>>>>>>> +    if (pbus->done_gathering) {
>>>>>>> +        VfioIrqStartParams data = {
>>>>>>> +            .pbus = pbus,
>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>> +        };
>>>>>>> +
>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>> +{
>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>> +
>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>
>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>> pointer :).
>>>>>>
>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>> know the device you're dealing with and only handle a single device per
>>>>>> notifier.
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>> I don't see how to practically follow your request:
>>>>>
>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>> wrong in my first reply :-().
>>>>>
>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>> finalize function because this latter is called after the platform bus
>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>
>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>> place of a machine init done notifier (would relax the call order
>>>>> constraint) but the problem consists in passing the platform bus first
>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>
>>>>> Do I miss something?
>>>>
>>>> So the basic idea is that the device itself calls
>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>> Notifier struct would be part of the device state which means you can
>>>> cast yourself into the VFIO device state.
>>>
>>> humm, the vfio device is instantiated in the cmd line so after the
>>> machine init. This means 1st the platform bus binding notifier is
>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>> reverse order of their registration, this would fail. Am I wrong?
>>
>> Bleks. Ok, I see 2 ways out of this:
>>
>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>> to this
>>
>>   2) Add an "irq now populated" notifier function callback in a new
>> PlatformBusDeviceClass struct that you use to describe the
>> PlatformBusDevice class. Call all children's notifiers from the
>> machine_init notifier in the platform bus.
>>
>> The more I think about it, the more I prefer option 2 I think.
> Hi Alex,
> 
> ok I work on 2)

Hi Alex,

I believe I understand your proposal but the issue is to pass the
platform bus first_irq parameter which is needed to compute the absolute
IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
bus doesn't have it either. Only machine file has the info.

The "irq now populated" notifier function callback would be called in
platform bus platform_bus_init_notify or link_sysbus_device I guess,
already executed in a machine-init-done notifier. The callback would
need to be called with sbdev and first_irq param to fulfill its task
(check of VFIO type, IRQFD setup). So I need to pass first_irq to
platform_bus. Do you agree? Can I add an API?

Besides there would be a single callback per platform bus. Wouldn't it
be worth to add an infrastructure to add/remove misc "binding_done"
notifiers and call all registered functions in link_sysbus_device? This
does not change the issue of passing the first_irq param ;-)

Eric

> 
> Thanks for your guidance
> 
> Eric
>>
>>
>> Alex
>>
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 14:05               ` Eric Auger
@ 2014-11-27 14:35                 ` Alexander Graf
  2014-11-27 15:14                   ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-27 14:35 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 27.11.14 15:05, Eric Auger wrote:
> On 11/26/2014 03:46 PM, Eric Auger wrote:
>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>
>>>
>>> On 26.11.14 11:48, Eric Auger wrote:
>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>> - register space user mapping,
>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>
>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>
>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> +/*
>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>> +    Notifier notifier;
>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>> +
>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>> +} VfioIrqStartParams;
>>>>>>>> +
>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>> +{
>>>>>>>> +    int i;
>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>> +    uint64_t irq_number;
>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>> +
>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>> +{
>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>> +    DeviceState *dev =
>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>> +
>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>> +            .pbus = pbus,
>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>> +        };
>>>>>>>> +
>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>> +{
>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>> +
>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>
>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>> pointer :).
>>>>>>>
>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>> notifier.
>>>>>>
>>>>>> Hi Alex,
>>>>>>
>>>>>> I don't see how to practically follow your request:
>>>>>>
>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>> wrong in my first reply :-().
>>>>>>
>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>> finalize function because this latter is called after the platform bus
>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>
>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>> place of a machine init done notifier (would relax the call order
>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>
>>>>>> Do I miss something?
>>>>>
>>>>> So the basic idea is that the device itself calls
>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>> Notifier struct would be part of the device state which means you can
>>>>> cast yourself into the VFIO device state.
>>>>
>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>> machine init. This means 1st the platform bus binding notifier is
>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>> reverse order of their registration, this would fail. Am I wrong?
>>>
>>> Bleks. Ok, I see 2 ways out of this:
>>>
>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>> to this
>>>
>>>   2) Add an "irq now populated" notifier function callback in a new
>>> PlatformBusDeviceClass struct that you use to describe the
>>> PlatformBusDevice class. Call all children's notifiers from the
>>> machine_init notifier in the platform bus.
>>>
>>> The more I think about it, the more I prefer option 2 I think.
>> Hi Alex,
>>
>> ok I work on 2)
> 
> Hi Alex,
> 
> I believe I understand your proposal but the issue is to pass the
> platform bus first_irq parameter which is needed to compute the absolute
> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
> bus doesn't have it either. Only machine file has the info.

Well, the GIC should have this info as well. That's why I was trying to
point out that you want to ask the GIC about the absolute IRQ number on
its own number space.

You need to make the connection with the GIC anyway, no? So you need to
somehow get awareness of the GIC device. Or are you hijacking the global
GSI number space?

> 
> The "irq now populated" notifier function callback would be called in
> platform bus platform_bus_init_notify or link_sysbus_device I guess,
> already executed in a machine-init-done notifier. The callback would
> need to be called with sbdev and first_irq param to fulfill its task
> (check of VFIO type, IRQFD setup). So I need to pass first_irq to
> platform_bus. Do you agree? Can I add an API?
> 
> Besides there would be a single callback per platform bus. Wouldn't it
> be worth to add an infrastructure to add/remove misc "binding_done"
> notifiers and call all registered functions in link_sysbus_device?

Usually the "realize" function is good enough for 99% of the devices out
there. We're just special because we do lazy binding of IRQs on the
platform bus :).



Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 14:35                 ` Alexander Graf
@ 2014-11-27 15:14                   ` Eric Auger
  2014-11-27 15:28                     ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-27 15:14 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm

On 11/27/2014 03:35 PM, Alexander Graf wrote:
> 
> 
> On 27.11.14 15:05, Eric Auger wrote:
>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>> - register space user mapping,
>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>
>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>> +    Notifier notifier;
>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>> +
>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>> +
>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>> +{
>>>>>>>>> +    int i;
>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>> +
>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +    return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>> +{
>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>> +    DeviceState *dev =
>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>> +
>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>> +            .pbus = pbus,
>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>> +        };
>>>>>>>>> +
>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>> +{
>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>> +
>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>
>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>> pointer :).
>>>>>>>>
>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>> notifier.
>>>>>>>
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> I don't see how to practically follow your request:
>>>>>>>
>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>> wrong in my first reply :-().
>>>>>>>
>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>
>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>
>>>>>>> Do I miss something?
>>>>>>
>>>>>> So the basic idea is that the device itself calls
>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>> Notifier struct would be part of the device state which means you can
>>>>>> cast yourself into the VFIO device state.
>>>>>
>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>
>>>> Bleks. Ok, I see 2 ways out of this:
>>>>
>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>> to this
>>>>
>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>> PlatformBusDeviceClass struct that you use to describe the
>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>> machine_init notifier in the platform bus.
>>>>
>>>> The more I think about it, the more I prefer option 2 I think.
>>> Hi Alex,
>>>
>>> ok I work on 2)
>>
>> Hi Alex,
>>
>> I believe I understand your proposal but the issue is to pass the
>> platform bus first_irq parameter which is needed to compute the absolute
>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>> bus doesn't have it either. Only machine file has the info.
> 
> Well, the GIC should have this info as well. That's why I was trying to
> point out that you want to ask the GIC about the absolute IRQ number on
> its own number space.
> 
> You need to make the connection with the GIC anyway, no? So you need to
> somehow get awareness of the GIC device. Or are you hijacking the global
> GSI number space?

Hi Alex,

Well OK I believe I understand your idea: in vfio device, loop on all
gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
matches the qemu_irq I want to kick off. That would be feasible if VFIO
has a handle to the GIC DeviceState (gicdev), which is not curently the
case. so me move the problem to passing the gicdev to vfio ;-)

VFIO being mostly generic we could only do that in the derived VFIO
device (the famous calxeda xgmac device) or some intermediate vfio arm
device - let's be crazy!? ;-) - . GIC derives from std sysbus device (no
kind of generic interrupt controller device I could recognize) when
parsing the qom tree stuff so I don't see any other solution to retrieve
the intc handle after machine creation.

I can try that. In that case do you agree with adding/removing sysbus
binding_done notifiers in platform bus and drop callback in platform bus
class. I would call all registered notifiers at the end of
platform_bus_init_notify.

Thanks

Best Regards

Eric

> 
>>
>> The "irq now populated" notifier function callback would be called in
>> platform bus platform_bus_init_notify or link_sysbus_device I guess,
>> already executed in a machine-init-done notifier. The callback would
>> need to be called with sbdev and first_irq param to fulfill its task
>> (check of VFIO type, IRQFD setup). So I need to pass first_irq to
>> platform_bus. Do you agree? Can I add an API?
>>
>> Besides there would be a single callback per platform bus. Wouldn't it
>> be worth to add an infrastructure to add/remove misc "binding_done"
>> notifiers and call all registered functions in link_sysbus_device?
> 
> Usually the "realize" function is good enough for 99% of the devices out
> there. We're just special because we do lazy binding of IRQs on the
> platform bus :).
> 
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 15:14                   ` Eric Auger
@ 2014-11-27 15:28                     ` Alexander Graf
  2014-11-27 15:55                       ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-27 15:28 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: peter.maydell, patches, Kim Phillips, will.deacon, stuart.yoder,
	Bharat.Bhushan, alex.williamson, a.motakis, kvmarm



On 27.11.14 16:14, Eric Auger wrote:
> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>
>>
>> On 27.11.14 15:05, Eric Auger wrote:
>>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>>> - register space user mapping,
>>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>>
>>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>> +/*
>>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>>> + */
>>>>>>>>>> +
>>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>>> +    Notifier notifier;
>>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>>> +
>>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>>> +
>>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>>> +{
>>>>>>>>>> +    int i;
>>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>>> +
>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>>> +        }
>>>>>>>>>> +    }
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>>> +    DeviceState *dev =
>>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>>> +
>>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>>> +            .pbus = pbus,
>>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>>> +        };
>>>>>>>>>> +
>>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>>> +{
>>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>>> +
>>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>>
>>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>>> pointer :).
>>>>>>>>>
>>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>>> notifier.
>>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> I don't see how to practically follow your request:
>>>>>>>>
>>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>>> wrong in my first reply :-().
>>>>>>>>
>>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>>
>>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>>
>>>>>>>> Do I miss something?
>>>>>>>
>>>>>>> So the basic idea is that the device itself calls
>>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>>> Notifier struct would be part of the device state which means you can
>>>>>>> cast yourself into the VFIO device state.
>>>>>>
>>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>>
>>>>> Bleks. Ok, I see 2 ways out of this:
>>>>>
>>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>>> to this
>>>>>
>>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>>> PlatformBusDeviceClass struct that you use to describe the
>>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>>> machine_init notifier in the platform bus.
>>>>>
>>>>> The more I think about it, the more I prefer option 2 I think.
>>>> Hi Alex,
>>>>
>>>> ok I work on 2)
>>>
>>> Hi Alex,
>>>
>>> I believe I understand your proposal but the issue is to pass the
>>> platform bus first_irq parameter which is needed to compute the absolute
>>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>>> bus doesn't have it either. Only machine file has the info.
>>
>> Well, the GIC should have this info as well. That's why I was trying to
>> point out that you want to ask the GIC about the absolute IRQ number on
>> its own number space.
>>
>> You need to make the connection with the GIC anyway, no? So you need to
>> somehow get awareness of the GIC device. Or are you hijacking the global
>> GSI number space?
> 
> Hi Alex,
> 
> Well OK I believe I understand your idea: in vfio device, loop on all
> gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
> matches the qemu_irq I want to kick off. That would be feasible if VFIO
> has a handle to the GIC DeviceState (gicdev), which is not curently the
> case. so me move the problem to passing the gicdev to vfio ;-)

That should be easy - make it a link property. In fact, this would be
one of those cases where not generalizing the code would've been a good
idea.

If device creation would live in the machine file, the machine could
automatically set the link. Maybe you can still get there somehow? You
could add a machine callback in the device allocation function.

I would also do the lookup the other way around. The GPIO / IRQ number
mapping is reasonably local to the GIC device, so I'd rather call a GIC
function to find the ID:

  kvm_gic_get_irq_gsi(s->gic_link, qdev_get_gpio_in(s, i));

> VFIO being mostly generic we could only do that in the derived VFIO
> device (the famous calxeda xgmac device) or some intermediate vfio arm
> device - let's be crazy!? ;-) - . GIC derives from std sysbus device (no
> kind of generic interrupt controller device I could recognize) when
> parsing the qom tree stuff so I don't see any other solution to retrieve
> the intc handle after machine creation.
> 
> I can try that. In that case do you agree with adding/removing sysbus
> binding_done notifiers in platform bus and drop callback in platform bus
> class. I would call all registered notifiers at the end of
> platform_bus_init_notify.

Not sure I understand what you're asking for :).


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 15:28                     ` Alexander Graf
@ 2014-11-27 15:55                       ` Alexander Graf
  2014-11-27 17:13                         ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-27 15:55 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm



On 27.11.14 16:28, Alexander Graf wrote:
> 
> 
> On 27.11.14 16:14, Eric Auger wrote:
>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>
>>>
>>> On 27.11.14 15:05, Eric Auger wrote:
>>>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>>>> - register space user mapping,
>>>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>>>
>>>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>> +/*
>>>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>>>> + */
>>>>>>>>>>> +
>>>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>>>> +    Notifier notifier;
>>>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>>>> +
>>>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>>>> +
>>>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>>>> +{
>>>>>>>>>>> +    int i;
>>>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>>>> +
>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>>>> +        }
>>>>>>>>>>> +    }
>>>>>>>>>>> +    return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>>>> +{
>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>>>> +    DeviceState *dev =
>>>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>>>> +
>>>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>>>> +            .pbus = pbus,
>>>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>>>> +        };
>>>>>>>>>>> +
>>>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>>>> +    }
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>>>> +{
>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>>>> +
>>>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>>>
>>>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>>>> pointer :).
>>>>>>>>>>
>>>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>>>> notifier.
>>>>>>>>>
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> I don't see how to practically follow your request:
>>>>>>>>>
>>>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>>>> wrong in my first reply :-().
>>>>>>>>>
>>>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>>>
>>>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>>>
>>>>>>>>> Do I miss something?
>>>>>>>>
>>>>>>>> So the basic idea is that the device itself calls
>>>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>>>> Notifier struct would be part of the device state which means you can
>>>>>>>> cast yourself into the VFIO device state.
>>>>>>>
>>>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>>>
>>>>>> Bleks. Ok, I see 2 ways out of this:
>>>>>>
>>>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>>>> to this
>>>>>>
>>>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>>>> PlatformBusDeviceClass struct that you use to describe the
>>>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>>>> machine_init notifier in the platform bus.
>>>>>>
>>>>>> The more I think about it, the more I prefer option 2 I think.
>>>>> Hi Alex,
>>>>>
>>>>> ok I work on 2)
>>>>
>>>> Hi Alex,
>>>>
>>>> I believe I understand your proposal but the issue is to pass the
>>>> platform bus first_irq parameter which is needed to compute the absolute
>>>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>>>> bus doesn't have it either. Only machine file has the info.
>>>
>>> Well, the GIC should have this info as well. That's why I was trying to
>>> point out that you want to ask the GIC about the absolute IRQ number on
>>> its own number space.
>>>
>>> You need to make the connection with the GIC anyway, no? So you need to
>>> somehow get awareness of the GIC device. Or are you hijacking the global
>>> GSI number space?
>>
>> Hi Alex,
>>
>> Well OK I believe I understand your idea: in vfio device, loop on all
>> gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
>> matches the qemu_irq I want to kick off. That would be feasible if VFIO
>> has a handle to the GIC DeviceState (gicdev), which is not curently the
>> case. so me move the problem to passing the gicdev to vfio ;-)
> 
> That should be easy - make it a link property. In fact, this would be
> one of those cases where not generalizing the code would've been a good
> idea.
> 
> If device creation would live in the machine file, the machine could
> automatically set the link. Maybe you can still get there somehow? You
> could add a machine callback in the device allocation function.

If this gets too messy, I think doing a machine attribute would work as
well here. Check out the way we pass the e500-ccsr object on e500:


http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337


http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873

I think doing an actual link would be cleaner, but at least the above
gets you to an acceptable state that can still be improved with links
later - the basic idea is the same :).


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 15:55                       ` Alexander Graf
@ 2014-11-27 17:13                         ` Eric Auger
  2014-11-27 17:24                           ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-27 17:13 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm

On 11/27/2014 04:55 PM, Alexander Graf wrote:
> 
> 
> On 27.11.14 16:28, Alexander Graf wrote:
>>
>>
>> On 27.11.14 16:14, Eric Auger wrote:
>>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 27.11.14 15:05, Eric Auger wrote:
>>>>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>>>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>>>>> - register space user mapping,
>>>>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>>>>
>>>>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>>> +/*
>>>>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>>>>> + */
>>>>>>>>>>>> +
>>>>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>>>>> +    Notifier notifier;
>>>>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>>>>> +
>>>>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    int i;
>>>>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>>>>> +    DeviceState *dev =
>>>>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>>>>> +            .pbus = pbus,
>>>>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>>>>> +        };
>>>>>>>>>>>> +
>>>>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>>>>
>>>>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>>>>> pointer :).
>>>>>>>>>>>
>>>>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>>>>> notifier.
>>>>>>>>>>
>>>>>>>>>> Hi Alex,
>>>>>>>>>>
>>>>>>>>>> I don't see how to practically follow your request:
>>>>>>>>>>
>>>>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>>>>> wrong in my first reply :-().
>>>>>>>>>>
>>>>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>>>>
>>>>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>>>>
>>>>>>>>>> Do I miss something?
>>>>>>>>>
>>>>>>>>> So the basic idea is that the device itself calls
>>>>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>>>>> Notifier struct would be part of the device state which means you can
>>>>>>>>> cast yourself into the VFIO device state.
>>>>>>>>
>>>>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>>>>
>>>>>>> Bleks. Ok, I see 2 ways out of this:
>>>>>>>
>>>>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>>>>> to this
>>>>>>>
>>>>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>>>>> PlatformBusDeviceClass struct that you use to describe the
>>>>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>>>>> machine_init notifier in the platform bus.
>>>>>>>
>>>>>>> The more I think about it, the more I prefer option 2 I think.
>>>>>> Hi Alex,
>>>>>>
>>>>>> ok I work on 2)
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>> I believe I understand your proposal but the issue is to pass the
>>>>> platform bus first_irq parameter which is needed to compute the absolute
>>>>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>>>>> bus doesn't have it either. Only machine file has the info.
>>>>
>>>> Well, the GIC should have this info as well. That's why I was trying to
>>>> point out that you want to ask the GIC about the absolute IRQ number on
>>>> its own number space.
>>>>
>>>> You need to make the connection with the GIC anyway, no? So you need to
>>>> somehow get awareness of the GIC device. Or are you hijacking the global
>>>> GSI number space?
>>>
>>> Hi Alex,
>>>
>>> Well OK I believe I understand your idea: in vfio device, loop on all
>>> gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
>>> matches the qemu_irq I want to kick off. That would be feasible if VFIO
>>> has a handle to the GIC DeviceState (gicdev), which is not curently the
>>> case. so me move the problem to passing the gicdev to vfio ;-)
>>
>> That should be easy - make it a link property. In fact, this would be
>> one of those cases where not generalizing the code would've been a good
>> idea.
In that case the machine (init done) callback would be used to pass the
vgic handle to each vfio device. Registered by the machine file, isn't
it. Aren't we exactly at the same state you wanted to improve initially
where the notifier is registered by the machine file, not belonging to
the VFIO device, just replacing first_irq param by vgic_handle which
eventually ends up as a link.

This notifier still cannot be registered by the VFIO device finalize fn
since the VFIO device has no handle to the interrupt controller. kind of
chicken & egg problem.
>>
>> If device creation would live in the machine file, the machine could
>> automatically set the link. Maybe you can still get there somehow? You
>> could add a machine callback in the device allocation function.
> 
> If this gets too messy, I think doing a machine attribute would work as
> well here. Check out the way we pass the e500-ccsr object on e500:
> 
> 
> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337
> 
> 
> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873

looks OK indeed
> 
> I think doing an actual link would be cleaner, but at least the above
> gets you to an acceptable state that can still be improved with links
> later - the basic idea is the same :).


and why not "simply" a qemu_register_reset passing the vgic handle as
opaque. removes the notifier "dangling pointer" original issue, also
removes the new problem of static const not compatible with reset
function proto) in principle. qemu_register_reset seems simpler that
machine init done notifier, bring the benefit to be called later.

Best Regards

Eric

> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 17:13                         ` Eric Auger
@ 2014-11-27 17:24                           ` Alexander Graf
  2014-11-27 17:34                             ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-27 17:24 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm



On 27.11.14 18:13, Eric Auger wrote:
> On 11/27/2014 04:55 PM, Alexander Graf wrote:
>>
>>
>> On 27.11.14 16:28, Alexander Graf wrote:
>>>
>>>
>>> On 27.11.14 16:14, Eric Auger wrote:
>>>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 27.11.14 15:05, Eric Auger wrote:
>>>>>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>>>>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>>>>>> - register space user mapping,
>>>>>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>>>>>
>>>>>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>>>>> +/*
>>>>>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>>>>>> + */
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>>>>>> +    Notifier notifier;
>>>>>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    int i;
>>>>>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>>>>>> +    DeviceState *dev =
>>>>>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>>>>>> +            .pbus = pbus,
>>>>>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>>>>>> +        };
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>>>>>
>>>>>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>>>>>> pointer :).
>>>>>>>>>>>>
>>>>>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>>>>>> notifier.
>>>>>>>>>>>
>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>
>>>>>>>>>>> I don't see how to practically follow your request:
>>>>>>>>>>>
>>>>>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>>>>>> wrong in my first reply :-().
>>>>>>>>>>>
>>>>>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>>>>>
>>>>>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>>>>>
>>>>>>>>>>> Do I miss something?
>>>>>>>>>>
>>>>>>>>>> So the basic idea is that the device itself calls
>>>>>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>>>>>> Notifier struct would be part of the device state which means you can
>>>>>>>>>> cast yourself into the VFIO device state.
>>>>>>>>>
>>>>>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>>>>>
>>>>>>>> Bleks. Ok, I see 2 ways out of this:
>>>>>>>>
>>>>>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>>>>>> to this
>>>>>>>>
>>>>>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>>>>>> PlatformBusDeviceClass struct that you use to describe the
>>>>>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>>>>>> machine_init notifier in the platform bus.
>>>>>>>>
>>>>>>>> The more I think about it, the more I prefer option 2 I think.
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> ok I work on 2)
>>>>>>
>>>>>> Hi Alex,
>>>>>>
>>>>>> I believe I understand your proposal but the issue is to pass the
>>>>>> platform bus first_irq parameter which is needed to compute the absolute
>>>>>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>>>>>> bus doesn't have it either. Only machine file has the info.
>>>>>
>>>>> Well, the GIC should have this info as well. That's why I was trying to
>>>>> point out that you want to ask the GIC about the absolute IRQ number on
>>>>> its own number space.
>>>>>
>>>>> You need to make the connection with the GIC anyway, no? So you need to
>>>>> somehow get awareness of the GIC device. Or are you hijacking the global
>>>>> GSI number space?
>>>>
>>>> Hi Alex,
>>>>
>>>> Well OK I believe I understand your idea: in vfio device, loop on all
>>>> gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
>>>> matches the qemu_irq I want to kick off. That would be feasible if VFIO
>>>> has a handle to the GIC DeviceState (gicdev), which is not curently the
>>>> case. so me move the problem to passing the gicdev to vfio ;-)
>>>
>>> That should be easy - make it a link property. In fact, this would be
>>> one of those cases where not generalizing the code would've been a good
>>> idea.
> In that case the machine (init done) callback would be used to pass the
> vgic handle to each vfio device. Registered by the machine file, isn't
> it. Aren't we exactly at the same state you wanted to improve initially
> where the notifier is registered by the machine file, not belonging to
> the VFIO device, just replacing first_irq param by vgic_handle which
> eventually ends up as a link.
> 
> This notifier still cannot be registered by the VFIO device finalize fn
> since the VFIO device has no handle to the interrupt controller. kind of
> chicken & egg problem.
>>>
>>> If device creation would live in the machine file, the machine could
>>> automatically set the link. Maybe you can still get there somehow? You
>>> could add a machine callback in the device allocation function.
>>
>> If this gets too messy, I think doing a machine attribute would work as
>> well here. Check out the way we pass the e500-ccsr object on e500:
>>
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337
>>
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873
> 
> looks OK indeed
>>
>> I think doing an actual link would be cleaner, but at least the above
>> gets you to an acceptable state that can still be improved with links
>> later - the basic idea is the same :).
> 
> 
> and why not "simply" a qemu_register_reset passing the vgic handle as
> opaque.

Who would register this reset callback? It'd have to be someone who
knows both the VFIO device as well as the vGIC device.

The reset idea could work as replacement for the notifier though. So you
could have the VFIO device register a reset callback in which it asks
the vgic for the number and registers the IRQ with KVM.


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 17:24                           ` Alexander Graf
@ 2014-11-27 17:34                             ` Eric Auger
  2014-11-27 17:51                               ` Alexander Graf
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Auger @ 2014-11-27 17:34 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm

On 11/27/2014 06:24 PM, Alexander Graf wrote:
> 
> 
> On 27.11.14 18:13, Eric Auger wrote:
>> On 11/27/2014 04:55 PM, Alexander Graf wrote:
>>>
>>>
>>> On 27.11.14 16:28, Alexander Graf wrote:
>>>>
>>>>
>>>> On 27.11.14 16:14, Eric Auger wrote:
>>>>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 27.11.14 15:05, Eric Auger wrote:
>>>>>>> On 11/26/2014 03:46 PM, Eric Auger wrote:
>>>>>>>> On 11/26/2014 12:20 PM, Alexander Graf wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 26.11.14 11:48, Eric Auger wrote:
>>>>>>>>>> On 11/26/2014 11:24 AM, Alexander Graf wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 26.11.14 10:45, Eric Auger wrote:
>>>>>>>>>>>> On 11/05/2014 11:29 AM, Alexander Graf wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 31.10.14 15:05, Eric Auger wrote:
>>>>>>>>>>>>>> Minimal VFIO platform implementation supporting
>>>>>>>>>>>>>> - register space user mapping,
>>>>>>>>>>>>>> - IRQ assignment based on eventfds handled on qemu side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> irqfd kernel acceleration comes in a subsequent patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
>>>>>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>>>>> +/*
>>>>>>>>>>>>>> + * Mechanics to program/start irq injection on machine init done notifier:
>>>>>>>>>>>>>> + * this is needed since at finalize time, the device IRQ are not yet
>>>>>>>>>>>>>> + * bound to the platform bus IRQ. It is assumed here dynamic instantiation
>>>>>>>>>>>>>> + * always is used. Binding to the platform bus IRQ happens on a machine
>>>>>>>>>>>>>> + * init done notifier registered by the machine file. After its execution
>>>>>>>>>>>>>> + * we execute a new notifier that actually starts the injection. When using
>>>>>>>>>>>>>> + * irqfd, programming the injection consists in associating eventfds to
>>>>>>>>>>>>>> + * GSI number,ie. virtual IRQ number
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +typedef struct VfioIrqStarterNotifierParams {
>>>>>>>>>>>>>> +    unsigned int platform_bus_first_irq;
>>>>>>>>>>>>>> +    Notifier notifier;
>>>>>>>>>>>>>> +} VfioIrqStarterNotifierParams;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +typedef struct VfioIrqStartParams {
>>>>>>>>>>>>>> +    PlatformBusDevice *pbus;
>>>>>>>>>>>>>> +    int platform_bus_first_irq;
>>>>>>>>>>>>>> +} VfioIrqStartParams;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* Start injection of IRQ for a specific VFIO device */
>>>>>>>>>>>>>> +static int vfio_irq_starter(SysBusDevice *sbdev, void *opaque)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    int i;
>>>>>>>>>>>>>> +    VfioIrqStartParams *p = opaque;
>>>>>>>>>>>>>> +    VFIOPlatformDevice *vdev;
>>>>>>>>>>>>>> +    VFIODevice *vbasedev;
>>>>>>>>>>>>>> +    uint64_t irq_number;
>>>>>>>>>>>>>> +    PlatformBusDevice *pbus = p->pbus;
>>>>>>>>>>>>>> +    int platform_bus_first_irq = p->platform_bus_first_irq;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(sbdev), TYPE_VFIO_PLATFORM)) {
>>>>>>>>>>>>>> +        vdev = VFIO_PLATFORM_DEVICE(sbdev);
>>>>>>>>>>>>>> +        vbasedev = &vdev->vbasedev;
>>>>>>>>>>>>>> +        for (i = 0; i < vbasedev->num_irqs; i++) {
>>>>>>>>>>>>>> +            irq_number = platform_bus_get_irqn(pbus, sbdev, i)
>>>>>>>>>>>>>> +                             + platform_bus_first_irq;
>>>>>>>>>>>>>> +            vfio_start_irq_injection(sbdev, i, irq_number);
>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* loop on all VFIO platform devices and start their IRQ injection */
>>>>>>>>>>>>>> +static void vfio_irq_starter_notify(Notifier *notifier, void *data)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p =
>>>>>>>>>>>>>> +        container_of(notifier, VfioIrqStarterNotifierParams, notifier);
>>>>>>>>>>>>>> +    DeviceState *dev =
>>>>>>>>>>>>>> +        qdev_find_recursive(sysbus_get_default(), TYPE_PLATFORM_BUS_DEVICE);
>>>>>>>>>>>>>> +    PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(dev);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if (pbus->done_gathering) {
>>>>>>>>>>>>>> +        VfioIrqStartParams data = {
>>>>>>>>>>>>>> +            .pbus = pbus,
>>>>>>>>>>>>>> +            .platform_bus_first_irq = p->platform_bus_first_irq,
>>>>>>>>>>>>>> +        };
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        foreach_dynamic_sysbus_device(vfio_irq_starter, &data);
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* registers the machine init done notifier that will start VFIO IRQ */
>>>>>>>>>>>>>> +void vfio_register_irq_starter(int platform_bus_first_irq)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    VfioIrqStarterNotifierParams *p = g_new(VfioIrqStarterNotifierParams, 1);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    p->platform_bus_first_irq = platform_bus_first_irq;
>>>>>>>>>>>>>> +    p->notifier.notify = vfio_irq_starter_notify;
>>>>>>>>>>>>>> +    qemu_add_machine_init_done_notifier(&p->notifier);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you add a notifier for each device instead? Then the notifier
>>>>>>>>>>>>> would be part of the vfio device struct and not some dangling random
>>>>>>>>>>>>> pointer :).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Of course instead of foreach_dynamic_sysbus_device() you would directly
>>>>>>>>>>>>> know the device you're dealing with and only handle a single device per
>>>>>>>>>>>>> notifier.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>
>>>>>>>>>>>> I don't see how to practically follow your request:
>>>>>>>>>>>>
>>>>>>>>>>>> - at machine init time, VFIO devices are not yet instantiated so I
>>>>>>>>>>>> cannot call foreach_dynamic_sysbus_device() there - I was definitively
>>>>>>>>>>>> wrong in my first reply :-().
>>>>>>>>>>>>
>>>>>>>>>>>> - I can't register a per VFIO device notifier in the VFIO device
>>>>>>>>>>>> finalize function because this latter is called after the platform bus
>>>>>>>>>>>> instantiation. So the IRQ binding notifier (registered in platform bus
>>>>>>>>>>>> finalize fn) would be called after the IRQ starter notifier.
>>>>>>>>>>>>
>>>>>>>>>>>> - then to simplify things a bit I could use a qemu_register_reset in
>>>>>>>>>>>> place of a machine init done notifier (would relax the call order
>>>>>>>>>>>> constraint) but the problem consists in passing the platform bus first
>>>>>>>>>>>> irq (all the more so you requested it became part of a const struct)
>>>>>>>>>>>>
>>>>>>>>>>>> Do I miss something?
>>>>>>>>>>>
>>>>>>>>>>> So the basic idea is that the device itself calls
>>>>>>>>>>> qemu_add_machine_init_done_notifier() in its realize function. The
>>>>>>>>>>> Notifier struct would be part of the device state which means you can
>>>>>>>>>>> cast yourself into the VFIO device state.
>>>>>>>>>>
>>>>>>>>>> humm, the vfio device is instantiated in the cmd line so after the
>>>>>>>>>> machine init. This means 1st the platform bus binding notifier is
>>>>>>>>>> registered (in platform bus realize) and then VFIO irq starter notifiers
>>>>>>>>>> are registered (in VFIO realize). Notifiers beeing executed in the
>>>>>>>>>> reverse order of their registration, this would fail. Am I wrong?
>>>>>>>>>
>>>>>>>>> Bleks. Ok, I see 2 ways out of this:
>>>>>>>>>
>>>>>>>>>   1) Create a TailNotifier and convert the machine_init_done notifiers
>>>>>>>>> to this
>>>>>>>>>
>>>>>>>>>   2) Add an "irq now populated" notifier function callback in a new
>>>>>>>>> PlatformBusDeviceClass struct that you use to describe the
>>>>>>>>> PlatformBusDevice class. Call all children's notifiers from the
>>>>>>>>> machine_init notifier in the platform bus.
>>>>>>>>>
>>>>>>>>> The more I think about it, the more I prefer option 2 I think.
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> ok I work on 2)
>>>>>>>
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> I believe I understand your proposal but the issue is to pass the
>>>>>>> platform bus first_irq parameter which is needed to compute the absolute
>>>>>>> IRQ number (=irqfd GSI). VFIO device does not have this info. Platform
>>>>>>> bus doesn't have it either. Only machine file has the info.
>>>>>>
>>>>>> Well, the GIC should have this info as well. That's why I was trying to
>>>>>> point out that you want to ask the GIC about the absolute IRQ number on
>>>>>> its own number space.
>>>>>>
>>>>>> You need to make the connection with the GIC anyway, no? So you need to
>>>>>> somehow get awareness of the GIC device. Or are you hijacking the global
>>>>>> GSI number space?
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>> Well OK I believe I understand your idea: in vfio device, loop on all
>>>>> gic gpios using   qdev_get_gpio_in(gicdev, i) and identify i that
>>>>> matches the qemu_irq I want to kick off. That would be feasible if VFIO
>>>>> has a handle to the GIC DeviceState (gicdev), which is not curently the
>>>>> case. so me move the problem to passing the gicdev to vfio ;-)
>>>>
>>>> That should be easy - make it a link property. In fact, this would be
>>>> one of those cases where not generalizing the code would've been a good
>>>> idea.
>> In that case the machine (init done) callback would be used to pass the
>> vgic handle to each vfio device. Registered by the machine file, isn't
>> it. Aren't we exactly at the same state you wanted to improve initially
>> where the notifier is registered by the machine file, not belonging to
>> the VFIO device, just replacing first_irq param by vgic_handle which
>> eventually ends up as a link.
>>
>> This notifier still cannot be registered by the VFIO device finalize fn
>> since the VFIO device has no handle to the interrupt controller. kind of
>> chicken & egg problem.
>>>>
>>>> If device creation would live in the machine file, the machine could
>>>> automatically set the link. Maybe you can still get there somehow? You
>>>> could add a machine callback in the device allocation function.
>>>
>>> If this gets too messy, I think doing a machine attribute would work as
>>> well here. Check out the way we pass the e500-ccsr object on e500:
>>>
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337
>>>
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873
>>
>> looks OK indeed
>>>
>>> I think doing an actual link would be cleaner, but at least the above
>>> gets you to an acceptable state that can still be improved with links
>>> later - the basic idea is the same :).
>>
>>
>> and why not "simply" a qemu_register_reset passing the vgic handle as
>> opaque.
> 
> Who would register this reset callback? It'd have to be someone who
> knows both the VFIO device as well as the vGIC device.
the machine file would. reset callback implemented in vfio-platform.c,
looping on all instances. ~ as today for the notifier but without the
dangling pointer. not sure you will like it though ;-)
> 
> The reset idea could work as replacement for the notifier though. So you
> could have the VFIO device register a reset callback in which it asks
> the vgic for the number and registers the IRQ with KVM.
arghh, still the problem of passing the vgic handle. I used the reset cb
registration by the machine file to do that. Of course if we use your
machine property trick we can do the registration by the VFIO driver
itself.

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 17:34                             ` Eric Auger
@ 2014-11-27 17:51                               ` Alexander Graf
  2014-11-27 17:54                                 ` Eric Auger
  0 siblings, 1 reply; 43+ messages in thread
From: Alexander Graf @ 2014-11-27 17:51 UTC (permalink / raw)
  To: Eric Auger, eric.auger, christoffer.dall, qemu-devel, pbonzini,
	kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm



On 27.11.14 18:34, Eric Auger wrote:
> On 11/27/2014 06:24 PM, Alexander Graf wrote:
>>
>>
>> On 27.11.14 18:13, Eric Auger wrote:
>>> On 11/27/2014 04:55 PM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 27.11.14 16:28, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 27.11.14 16:14, Eric Auger wrote:
>>>>>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>>>>>
>>>>>>>

[...]

>>>>>
>>>>> That should be easy - make it a link property. In fact, this would be
>>>>> one of those cases where not generalizing the code would've been a good
>>>>> idea.
>>> In that case the machine (init done) callback would be used to pass the
>>> vgic handle to each vfio device. Registered by the machine file, isn't
>>> it. Aren't we exactly at the same state you wanted to improve initially
>>> where the notifier is registered by the machine file, not belonging to
>>> the VFIO device, just replacing first_irq param by vgic_handle which
>>> eventually ends up as a link.
>>>
>>> This notifier still cannot be registered by the VFIO device finalize fn
>>> since the VFIO device has no handle to the interrupt controller. kind of
>>> chicken & egg problem.
>>>>>
>>>>> If device creation would live in the machine file, the machine could
>>>>> automatically set the link. Maybe you can still get there somehow? You
>>>>> could add a machine callback in the device allocation function.
>>>>
>>>> If this gets too messy, I think doing a machine attribute would work as
>>>> well here. Check out the way we pass the e500-ccsr object on e500:
>>>>
>>>>
>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337
>>>>
>>>>
>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873
>>>
>>> looks OK indeed
>>>>
>>>> I think doing an actual link would be cleaner, but at least the above
>>>> gets you to an acceptable state that can still be improved with links
>>>> later - the basic idea is the same :).
>>>
>>>
>>> and why not "simply" a qemu_register_reset passing the vgic handle as
>>> opaque.
>>
>> Who would register this reset callback? It'd have to be someone who
>> knows both the VFIO device as well as the vGIC device.
> the machine file would. reset callback implemented in vfio-platform.c,
> looping on all instances. ~ as today for the notifier but without the
> dangling pointer. not sure you will like it though ;-)

Ah, so you would do the actual VFIO call inside the machine file? Or
would you call a VFIO function when you see that a device is VFIO and
trigger the connection at that point? That would work too I suppose.

>>
>> The reset idea could work as replacement for the notifier though. So you
>> could have the VFIO device register a reset callback in which it asks
>> the vgic for the number and registers the IRQ with KVM.
> arghh, still the problem of passing the vgic handle. I used the reset cb
> registration by the machine file to do that. Of course if we use your
> machine property trick we can do the registration by the VFIO driver
> itself.

Yup, either way works IMHO :).


Alex

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support
  2014-11-27 17:51                               ` Alexander Graf
@ 2014-11-27 17:54                                 ` Eric Auger
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Auger @ 2014-11-27 17:54 UTC (permalink / raw)
  To: Alexander Graf, eric.auger, christoffer.dall, qemu-devel,
	pbonzini, kim.phillips, a.rigo, manish.jaggi, joel.schopp
  Cc: patches, Kim Phillips, will.deacon, stuart.yoder,
	alex.williamson, kvmarm

On 11/27/2014 06:51 PM, Alexander Graf wrote:
> 
> 
> On 27.11.14 18:34, Eric Auger wrote:
>> On 11/27/2014 06:24 PM, Alexander Graf wrote:
>>>
>>>
>>> On 27.11.14 18:13, Eric Auger wrote:
>>>> On 11/27/2014 04:55 PM, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 27.11.14 16:28, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 27.11.14 16:14, Eric Auger wrote:
>>>>>>> On 11/27/2014 03:35 PM, Alexander Graf wrote:
>>>>>>>>
>>>>>>>>
> 
> [...]
> 
>>>>>>
>>>>>> That should be easy - make it a link property. In fact, this would be
>>>>>> one of those cases where not generalizing the code would've been a good
>>>>>> idea.
>>>> In that case the machine (init done) callback would be used to pass the
>>>> vgic handle to each vfio device. Registered by the machine file, isn't
>>>> it. Aren't we exactly at the same state you wanted to improve initially
>>>> where the notifier is registered by the machine file, not belonging to
>>>> the VFIO device, just replacing first_irq param by vgic_handle which
>>>> eventually ends up as a link.
>>>>
>>>> This notifier still cannot be registered by the VFIO device finalize fn
>>>> since the VFIO device has no handle to the interrupt controller. kind of
>>>> chicken & egg problem.
>>>>>>
>>>>>> If device creation would live in the machine file, the machine could
>>>>>> automatically set the link. Maybe you can still get there somehow? You
>>>>>> could add a machine callback in the device allocation function.
>>>>>
>>>>> If this gets too messy, I think doing a machine attribute would work as
>>>>> well here. Check out the way we pass the e500-ccsr object on e500:
>>>>>
>>>>>
>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/pci-host/ppce500.c;h=1b4c0f00236e8005c261da527d416fe6a053b353;hb=HEAD#l337
>>>>>
>>>>>
>>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/e500.c;h=2832fc0da444d89737768f7c4dcb0638e2625750;hb=HEAD#l873
>>>>
>>>> looks OK indeed
>>>>>
>>>>> I think doing an actual link would be cleaner, but at least the above
>>>>> gets you to an acceptable state that can still be improved with links
>>>>> later - the basic idea is the same :).
>>>>
>>>>
>>>> and why not "simply" a qemu_register_reset passing the vgic handle as
>>>> opaque.
>>>
>>> Who would register this reset callback? It'd have to be someone who
>>> knows both the VFIO device as well as the vGIC device.
>> the machine file would. reset callback implemented in vfio-platform.c,
>> looping on all instances. ~ as today for the notifier but without the
>> dangling pointer. not sure you will like it though ;-)
> 
> Ah, so you would do the actual VFIO call inside the machine file?
yes in the machine file.
 Or
> would you call a VFIO function when you see that a device is VFIO and
> trigger the connection at that point? That would work too I suppose.
> 
>>>
>>> The reset idea could work as replacement for the notifier though. So you
>>> could have the VFIO device register a reset callback in which it asks
>>> the vgic for the number and registers the IRQ with KVM.
>> arghh, still the problem of passing the vgic handle. I used the reset cb
>> registration by the machine file to do that. Of course if we use your
>> machine property trick we can do the registration by the VFIO driver
>> itself.
> 
> Yup, either way works IMHO :).
OK I suggest I do my next patch as is and you will tell me... it will be
easy to revert to machine prop anyway.

Thanks for your time!

Eric
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2014-11-27 17:55 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-31 14:05 [Qemu-devel] [PATCH v7 00/16] KVM platform device passthrough Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 01/16] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 02/16] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice Eric Auger
2014-11-05 17:35   ` Alex Williamson
2014-11-06  8:38     ` Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 04/16] hw/vfio/pci: Introduce VFIORegion Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 05/16] hw/vfio/pci: split vfio_get_device Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 06/16] hw/vfio/pci: rename group_list into vfio_group_list Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 07/16] hw/vfio/pci: use name field in format strings Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 08/16] hw/vfio: create common module Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 09/16] hw/vfio/platform: add vfio-platform support Eric Auger
2014-11-05 10:29   ` Alexander Graf
2014-11-05 12:03     ` Eric Auger
2014-11-05 13:05       ` Alexander Graf
2014-11-26  9:45     ` Eric Auger
2014-11-26 10:24       ` Alexander Graf
2014-11-26 10:48         ` Eric Auger
2014-11-26 11:20           ` Alexander Graf
2014-11-26 14:46             ` Eric Auger
2014-11-27 14:05               ` Eric Auger
2014-11-27 14:35                 ` Alexander Graf
2014-11-27 15:14                   ` Eric Auger
2014-11-27 15:28                     ` Alexander Graf
2014-11-27 15:55                       ` Alexander Graf
2014-11-27 17:13                         ` Eric Auger
2014-11-27 17:24                           ` Alexander Graf
2014-11-27 17:34                             ` Eric Auger
2014-11-27 17:51                               ` Alexander Graf
2014-11-27 17:54                                 ` Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 10/16] hw/vfio: calxeda xgmac device Eric Auger
2014-11-05 10:26   ` Alexander Graf
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 11/16] hw/arm/virt: add support for VFIO devices Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 12/16] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation Eric Auger
2014-11-05 10:59   ` Alexander Graf
2014-11-05 12:31     ` Eric Auger
2014-11-05 22:23       ` Alexander Graf
2014-11-06  8:57         ` Eric Auger
2014-11-06 12:34           ` Alexander Graf
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 13/16] hw/vfio/platform: Add irqfd support Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 14/16] linux-headers: Update KVM headers from linux-next tag ToBeFilled Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 15/16] hw/vfio/common: vfio_kvm_device_fd moved in the common header Eric Auger
2014-10-31 14:05 ` [Qemu-devel] [PATCH v7 16/16] hw/vfio/platform: add forwarded irq support Eric Auger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.