All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] vfio/quirks: ioeventfd support
@ 2018-02-28 20:45 ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

This is the QEMU counterpart to https://lkml.org/lkml/2018/2/28/1222

As described in the third patch, we have a use case for taking
advantage of existing KVM ioeventfd support for accelerating the
MSI-ACK behavior of NVIDIA GPUs.  This series adds generic
infrastructure within vfio quirks for making use of ioeventfds and
specifically enables it for this purpose.  The first three patches
provide a performance improvement on their own and do not depend on
the additional acceleration added by the remainder of the patches to
be worthwhile.  The Linux header update in patch 4 is not intended
to be a full refresh, the kernel API is not yet upstream, this is for
testing and review purposes.  The intention would be to commit the
series in separate chunks, 1-3 once we have review consensus, 4-5 as
RFC until the kernel API is upstream.

RFC->v1:
 * Cap the number of dynamically added ioeventfds to 10 such that
   pathological driver behavior cannot consume too many file handles.
 * Added a reset hook and cleanup mechanism to drop dynamically added
   ioeventfds on device reset.
 * Additional comments and removed info_report.
 * Folded ioeventfd infrastructure patch into usage patch, fail to
   stand on its own without setup, which requires consumers.

Thanks,

Alex

---

Alex Williamson (5):
      vfio/quirks: Add common quirk alloc helper
      vfio/quirks: Add quirk reset callback
      vfio/quirks: ioeventfd quirk acceleration
      vfio: Update linux header
      vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly


 hw/vfio/pci-quirks.c       |  255 +++++++++++++++++++++++++++++++++++++++-----
 hw/vfio/pci.c              |    2 
 hw/vfio/pci.h              |   17 +++
 linux-headers/linux/vfio.h |   27 +++++
 4 files changed, 272 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 0/5] vfio/quirks: ioeventfd support
@ 2018-02-28 20:45 ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

This is the QEMU counterpart to https://lkml.org/lkml/2018/2/28/1222

As described in the third patch, we have a use case for taking
advantage of existing KVM ioeventfd support for accelerating the
MSI-ACK behavior of NVIDIA GPUs.  This series adds generic
infrastructure within vfio quirks for making use of ioeventfds and
specifically enables it for this purpose.  The first three patches
provide a performance improvement on their own and do not depend on
the additional acceleration added by the remainder of the patches to
be worthwhile.  The Linux header update in patch 4 is not intended
to be a full refresh, the kernel API is not yet upstream, this is for
testing and review purposes.  The intention would be to commit the
series in separate chunks, 1-3 once we have review consensus, 4-5 as
RFC until the kernel API is upstream.

RFC->v1:
 * Cap the number of dynamically added ioeventfds to 10 such that
   pathological driver behavior cannot consume too many file handles.
 * Added a reset hook and cleanup mechanism to drop dynamically added
   ioeventfds on device reset.
 * Additional comments and removed info_report.
 * Folded ioeventfd infrastructure patch into usage patch, fail to
   stand on its own without setup, which requires consumers.

Thanks,

Alex

---

Alex Williamson (5):
      vfio/quirks: Add common quirk alloc helper
      vfio/quirks: Add quirk reset callback
      vfio/quirks: ioeventfd quirk acceleration
      vfio: Update linux header
      vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly


 hw/vfio/pci-quirks.c       |  255 +++++++++++++++++++++++++++++++++++++++-----
 hw/vfio/pci.c              |    2 
 hw/vfio/pci.h              |   17 +++
 linux-headers/linux/vfio.h |   27 +++++
 4 files changed, 272 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] vfio/quirks: Add common quirk alloc helper
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-02-28 20:45   ` Alex Williamson
  -1 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

This will later be used to include list initialization.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index e5779a7ad35b..cc3a74ed992a 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -275,6 +275,15 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
+{
+    VFIOQuirk *quirk = g_new0(VFIOQuirk, 1);
+    quirk->mem = g_new0(MemoryRegion, nr_mem);
+    quirk->nr_mem = nr_mem;
+
+    return quirk;
+}
+
 static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     VFIOQuirk *quirk;
@@ -288,9 +297,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    quirk = vfio_quirk_alloc(1);
 
     memory_region_init_io(quirk->mem, OBJECT(vdev), &vfio_ati_3c3_quirk, vdev,
                           "vfio-ati-3c3-quirk", 1);
@@ -323,9 +330,7 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     window = quirk->data = g_malloc0(sizeof(*window) +
                                      sizeof(VFIOConfigWindowMatch));
     window->vdev = vdev;
@@ -371,10 +376,9 @@ static void vfio_probe_ati_bar2_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(1);
     mirror = quirk->data = g_malloc0(sizeof(*mirror));
-    mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x4000;
     mirror->bar = nr;
@@ -548,10 +552,8 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(2);
     quirk->data = data = g_malloc0(sizeof(*data));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
     data->vdev = vdev;
 
     memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_nvidia_3d4_quirk,
@@ -667,9 +669,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 4);
-    quirk->nr_mem = 4;
+    quirk = vfio_quirk_alloc(4);
     bar5 = quirk->data = g_malloc0(sizeof(*bar5) +
                                    (sizeof(VFIOConfigWindowMatch) * 2));
     window = &bar5->window;
@@ -762,10 +762,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(1);
     mirror = quirk->data = g_malloc0(sizeof(*mirror));
-    mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x88000;
     mirror->bar = nr;
@@ -781,10 +780,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 
     /* The 0x1800 offset mirror only seems to get used by legacy VGA */
     if (vdev->vga) {
-        quirk = g_malloc0(sizeof(*quirk));
+        quirk = vfio_quirk_alloc(1);
         mirror = quirk->data = g_malloc0(sizeof(*mirror));
-        mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-        quirk->nr_mem = 1;
+        mirror->mem = quirk->mem;
         mirror->vdev = vdev;
         mirror->offset = 0x1800;
         mirror->bar = nr;
@@ -945,9 +943,7 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     quirk->data = rtl = g_malloc0(sizeof(*rtl));
     rtl->vdev = vdev;
 
@@ -1507,9 +1503,7 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     }
 
     /* Setup our quirk to munge GTT addresses to the VM allocated buffer */
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     igd = quirk->data = g_malloc0(sizeof(*igd));
     igd->vdev = vdev;
     igd->index = ~0;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/5] vfio/quirks: Add common quirk alloc helper
@ 2018-02-28 20:45   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

This will later be used to include list initialization.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index e5779a7ad35b..cc3a74ed992a 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -275,6 +275,15 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
+{
+    VFIOQuirk *quirk = g_new0(VFIOQuirk, 1);
+    quirk->mem = g_new0(MemoryRegion, nr_mem);
+    quirk->nr_mem = nr_mem;
+
+    return quirk;
+}
+
 static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     VFIOQuirk *quirk;
@@ -288,9 +297,7 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    quirk = vfio_quirk_alloc(1);
 
     memory_region_init_io(quirk->mem, OBJECT(vdev), &vfio_ati_3c3_quirk, vdev,
                           "vfio-ati-3c3-quirk", 1);
@@ -323,9 +330,7 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     window = quirk->data = g_malloc0(sizeof(*window) +
                                      sizeof(VFIOConfigWindowMatch));
     window->vdev = vdev;
@@ -371,10 +376,9 @@ static void vfio_probe_ati_bar2_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(1);
     mirror = quirk->data = g_malloc0(sizeof(*mirror));
-    mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x4000;
     mirror->bar = nr;
@@ -548,10 +552,8 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice *vdev)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(2);
     quirk->data = data = g_malloc0(sizeof(*data));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
     data->vdev = vdev;
 
     memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_nvidia_3d4_quirk,
@@ -667,9 +669,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 4);
-    quirk->nr_mem = 4;
+    quirk = vfio_quirk_alloc(4);
     bar5 = quirk->data = g_malloc0(sizeof(*bar5) +
                                    (sizeof(VFIOConfigWindowMatch) * 2));
     window = &bar5->window;
@@ -762,10 +762,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
+    quirk = vfio_quirk_alloc(1);
     mirror = quirk->data = g_malloc0(sizeof(*mirror));
-    mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-    quirk->nr_mem = 1;
+    mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x88000;
     mirror->bar = nr;
@@ -781,10 +780,9 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 
     /* The 0x1800 offset mirror only seems to get used by legacy VGA */
     if (vdev->vga) {
-        quirk = g_malloc0(sizeof(*quirk));
+        quirk = vfio_quirk_alloc(1);
         mirror = quirk->data = g_malloc0(sizeof(*mirror));
-        mirror->mem = quirk->mem = g_new0(MemoryRegion, 1);
-        quirk->nr_mem = 1;
+        mirror->mem = quirk->mem;
         mirror->vdev = vdev;
         mirror->offset = 0x1800;
         mirror->bar = nr;
@@ -945,9 +943,7 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     quirk->data = rtl = g_malloc0(sizeof(*rtl));
     rtl->vdev = vdev;
 
@@ -1507,9 +1503,7 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     }
 
     /* Setup our quirk to munge GTT addresses to the VM allocated buffer */
-    quirk = g_malloc0(sizeof(*quirk));
-    quirk->mem = g_new0(MemoryRegion, 2);
-    quirk->nr_mem = 2;
+    quirk = vfio_quirk_alloc(2);
     igd = quirk->data = g_malloc0(sizeof(*igd));
     igd->vdev = vdev;
     igd->index = ~0;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] vfio/quirks: Add quirk reset callback
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-02-28 20:45   ` Alex Williamson
  -1 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

Quirks can be self modifying, provide a hook to allow them to cleanup
on device reset if desired.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   15 +++++++++++++++
 hw/vfio/pci.c        |    2 ++
 hw/vfio/pci.h        |    2 ++
 3 files changed, 19 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index cc3a74ed992a..f0947cbf152f 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1694,6 +1694,21 @@ void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr)
 /*
  * Reset quirks
  */
+void vfio_quirk_reset(VFIOPCIDevice *vdev)
+{
+    int i;
+
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        VFIOQuirk *quirk;
+        VFIOBAR *bar = &vdev->bars[i];
+
+        QLIST_FOREACH(quirk, &bar->quirks, next) {
+            if (quirk->reset) {
+                quirk->reset(vdev, quirk);
+            }
+        }
+    }
+}
 
 /*
  * AMD Radeon PCI config reset, based on Linux:
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 033cc8dea1b9..e412fa8dc705 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2185,6 +2185,8 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
                          vdev->vbasedev.name, nr);
         }
     }
+
+    vfio_quirk_reset(vdev);
 }
 
 static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f4aa13e021fa..0be41a70be1d 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -29,6 +29,7 @@ typedef struct VFIOQuirk {
     void *data;
     int nr_mem;
     MemoryRegion *mem;
+    void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk);
 } VFIOQuirk;
 
 typedef struct VFIOBAR {
@@ -165,6 +166,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
 void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
 void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
 int vfio_add_virt_caps(VFIOPCIDevice *vdev, Error **errp);
+void vfio_quirk_reset(VFIOPCIDevice *vdev);
 
 extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 2/5] vfio/quirks: Add quirk reset callback
@ 2018-02-28 20:45   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

Quirks can be self modifying, provide a hook to allow them to cleanup
on device reset if desired.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   15 +++++++++++++++
 hw/vfio/pci.c        |    2 ++
 hw/vfio/pci.h        |    2 ++
 3 files changed, 19 insertions(+)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index cc3a74ed992a..f0947cbf152f 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1694,6 +1694,21 @@ void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr)
 /*
  * Reset quirks
  */
+void vfio_quirk_reset(VFIOPCIDevice *vdev)
+{
+    int i;
+
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        VFIOQuirk *quirk;
+        VFIOBAR *bar = &vdev->bars[i];
+
+        QLIST_FOREACH(quirk, &bar->quirks, next) {
+            if (quirk->reset) {
+                quirk->reset(vdev, quirk);
+            }
+        }
+    }
+}
 
 /*
  * AMD Radeon PCI config reset, based on Linux:
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 033cc8dea1b9..e412fa8dc705 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2185,6 +2185,8 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
                          vdev->vbasedev.name, nr);
         }
     }
+
+    vfio_quirk_reset(vdev);
 }
 
 static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f4aa13e021fa..0be41a70be1d 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -29,6 +29,7 @@ typedef struct VFIOQuirk {
     void *data;
     int nr_mem;
     MemoryRegion *mem;
+    void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk);
 } VFIOQuirk;
 
 typedef struct VFIOBAR {
@@ -165,6 +166,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
 void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
 void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
 int vfio_add_virt_caps(VFIOPCIDevice *vdev, Error **errp);
+void vfio_quirk_reset(VFIOPCIDevice *vdev);
 
 extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] vfio/quirks: ioeventfd quirk acceleration
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-02-28 20:45   ` Alex Williamson
  -1 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found
in device MMIO space.  Normally PCI config space is considered a slow
path and further optimization is unnecessary, however NVIDIA uses a
register here to enable the MSI interrupt to re-trigger.  Exiting to
QEMU for this MSI-ACK handling can therefore rate limit our interrupt
handling.  Fortunately the MSI-ACK write is easily detected since the
quirk MemoryRegion otherwise has very few accesses, so simply looking
for consecutive writes with the same data is sufficient, in this case
10 consecutive writes with the same data and size is arbitrarily
chosen.  We configure the KVM ioeventfd with data match, so there's
no risk of triggering for the wrong data or size, but we do risk that
pathological driver behavior might consume all of QEMU's file
descriptors, so we cap ourselves to 10 ioeventfds for this purpose.

In support of the above, generic ioeventfd infrastructure is added
for vfio quirks.  This automatically initializes an ioeventfd list
per quirk, disables and frees ioeventfds on exit, and allows
ioeventfds marked as dynamic to be dropped on device reset.  The
rationale for this latter feature is that useful ioeventfds may
depend on specific driver behavior and since we necessarily place a
cap on our use of ioeventfds, a machine reset is a reasonable point
at which to assume a new driver and re-profile.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |  159 +++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/vfio/pci.h        |   14 ++++
 2 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f0947cbf152f..e01e2f0f69df 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -12,6 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
@@ -202,6 +203,7 @@ typedef struct VFIOConfigMirrorQuirk {
     uint32_t offset;
     uint8_t bar;
     MemoryRegion *mem;
+    uint8_t data[];
 } VFIOConfigMirrorQuirk;
 
 static uint64_t vfio_generic_quirk_mirror_read(void *opaque,
@@ -278,12 +280,84 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
 static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
 {
     VFIOQuirk *quirk = g_new0(VFIOQuirk, 1);
+    QLIST_INIT(&quirk->ioeventfds);
     quirk->mem = g_new0(MemoryRegion, nr_mem);
     quirk->nr_mem = nr_mem;
 
     return quirk;
 }
 
+static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd)
+{
+    QLIST_REMOVE(ioeventfd, next);
+    memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size,
+                              ioeventfd->match_data, ioeventfd->data,
+                              &ioeventfd->e);
+    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL);
+    event_notifier_cleanup(&ioeventfd->e);
+    g_free(ioeventfd);
+}
+
+static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
+{
+    VFIOIOEventFD *ioeventfd, *tmp;
+
+    QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) {
+        if (ioeventfd->dynamic) {
+            vfio_ioeventfd_exit(ioeventfd);
+        }
+    }
+}
+
+static void vfio_ioeventfd_handler(void *opaque)
+{
+    VFIOIOEventFD *ioeventfd = opaque;
+
+    if (event_notifier_test_and_clear(&ioeventfd->e)) {
+        vfio_region_write(ioeventfd->region, ioeventfd->region_addr,
+                          ioeventfd->data, ioeventfd->size);
+    }
+}
+
+static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
+                                          MemoryRegion *mr, hwaddr addr,
+                                          unsigned size, uint64_t data,
+                                          VFIORegion *region,
+                                          hwaddr region_addr, bool dynamic)
+{
+    VFIOIOEventFD *ioeventfd = g_malloc0(sizeof(*ioeventfd));
+
+    if (event_notifier_init(&ioeventfd->e, 0)) {
+        g_free(ioeventfd);
+        return NULL;
+    }
+
+    /*
+     * MemoryRegion and relative offset, plus additional ioeventfd setup
+     * parameters for configuring and later tearing down KVM ioeventfd.
+     */
+    ioeventfd->mr = mr;
+    ioeventfd->addr = addr;
+    ioeventfd->size = size;
+    ioeventfd->data = data;
+    ioeventfd->match_data = true;
+    ioeventfd->dynamic = dynamic;
+    /*
+     * VFIORegion and relative offset for implementing the userspace
+     * handler.  data & size fields shared for both uses.
+     */
+    ioeventfd->region = region;
+    ioeventfd->region_addr = region_addr;
+
+    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                        vfio_ioeventfd_handler, NULL, ioeventfd);
+    memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr,
+                              ioeventfd->size, ioeventfd->match_data,
+                              ioeventfd->data, &ioeventfd->e);
+
+    return ioeventfd;
+}
+
 static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     VFIOQuirk *quirk;
@@ -719,6 +793,17 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
     trace_vfio_quirk_nvidia_bar5_probe(vdev->vbasedev.name);
 }
 
+typedef struct LastDataSet {
+    hwaddr addr;
+    uint64_t data;
+    unsigned size;
+    int hits;
+    int added;
+} LastDataSet;
+
+#define MAX_DYN_IOEVENTFD 10
+#define HITS_FOR_IOEVENTFD 10
+
 /*
  * Finally, BAR0 itself.  We want to redirect any accesses to either
  * 0x1800 or 0x88000 through the PCI config space access functions.
@@ -729,6 +814,7 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr,
     VFIOConfigMirrorQuirk *mirror = opaque;
     VFIOPCIDevice *vdev = mirror->vdev;
     PCIDevice *pdev = &vdev->pdev;
+    LastDataSet *last = (LastDataSet *)&mirror->data;
 
     vfio_generic_quirk_mirror_write(opaque, addr, data, size);
 
@@ -743,6 +829,59 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr,
                           addr + mirror->offset, data, size);
         trace_vfio_quirk_nvidia_bar0_msi_ack(vdev->vbasedev.name);
     }
+
+    /*
+     * Automatically add an ioeventfd to handle any repeated write with the
+     * same data and size above the standard PCI config space header.  This is
+     * primarily expected to accelerate the MSI-ACK behavior, such as noted
+     * above.  Current hardware/drivers should trigger an ioeventfd at config
+     * offset 0x704 (region offset 0x88704), with data 0x0, size 4.
+     *
+     * The criteria of 10 successive hits is arbitrary but reliably adds the
+     * MSI-ACK region.  Note that as some writes are bypassed via the ioeventfd,
+     * the remaining ones have a greater chance of being seen successively.
+     * To avoid the pathological case of burning up all of QEMU's open file
+     * handles, arbitrarily limit this algorithm from adding no more than 10
+     * ioeventfds, print an error if we would have added an 11th, and then
+     * stop counting.
+     */
+    if (addr > PCI_STD_HEADER_SIZEOF && last->added < MAX_DYN_IOEVENTFD + 1) {
+        if (addr != last->addr || data != last->data || size != last->size) {
+            last->addr = addr;
+            last->data = data;
+            last->size = size;
+            last->hits = 1;
+        } else if (++last->hits >= HITS_FOR_IOEVENTFD) {
+            if (last->added < MAX_DYN_IOEVENTFD) {
+                VFIOIOEventFD *ioeventfd;
+                ioeventfd = vfio_ioeventfd_init(vdev, mirror->mem, addr, size,
+                                        data, &vdev->bars[mirror->bar].region,
+                                        mirror->offset + addr, true);
+                if (ioeventfd) {
+                    VFIOQuirk *quirk;
+
+                    QLIST_FOREACH(quirk,
+                                  &vdev->bars[mirror->bar].quirks, next) {
+                        if (quirk->data == mirror) {
+                            QLIST_INSERT_HEAD(&quirk->ioeventfds,
+                                              ioeventfd, next);
+                            break;
+                        }
+                    }
+
+                    assert(quirk != NULL); /* Check not found */
+
+                    last->added++;
+                }
+            } else {
+                last->added++;
+
+                error_report("NVIDIA ioeventfd queue full for %s, unable to "
+                             "accelerate 0x%"HWADDR_PRIx", data 0x%"PRIx64", "
+                             "size %u", vdev->vbasedev.name, addr, data, size);
+            }
+        }
+    }
 }
 
 static const MemoryRegionOps vfio_nvidia_mirror_quirk = {
@@ -751,6 +890,16 @@ static const MemoryRegionOps vfio_nvidia_mirror_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static void vfio_nvidia_bar0_quirk_reset(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
+{
+    VFIOConfigMirrorQuirk *mirror = quirk->data;
+    LastDataSet *last = (LastDataSet *)&mirror->data;
+
+    memset(last, 0, sizeof(*last));
+
+    vfio_drop_dynamic_eventfds(vdev, quirk);
+}
+
 static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 {
     VFIOQuirk *quirk;
@@ -763,7 +912,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     }
 
     quirk = vfio_quirk_alloc(1);
-    mirror = quirk->data = g_malloc0(sizeof(*mirror));
+    quirk->reset = vfio_nvidia_bar0_quirk_reset;
+    mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet));
     mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x88000;
@@ -781,7 +931,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     /* The 0x1800 offset mirror only seems to get used by legacy VGA */
     if (vdev->vga) {
         quirk = vfio_quirk_alloc(1);
-        mirror = quirk->data = g_malloc0(sizeof(*mirror));
+        quirk->reset = vfio_nvidia_bar0_quirk_reset;
+        mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet));
         mirror->mem = quirk->mem;
         mirror->vdev = vdev;
         mirror->offset = 0x1800;
@@ -1668,6 +1819,10 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
     int i;
 
     QLIST_FOREACH(quirk, &bar->quirks, next) {
+        while (!QLIST_EMPTY(&quirk->ioeventfds)) {
+            vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds));
+        }
+
         for (i = 0; i < quirk->nr_mem; i++) {
             memory_region_del_subregion(bar->region.mem, &quirk->mem[i]);
         }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 0be41a70be1d..de651993b57a 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -24,9 +24,23 @@
 
 struct VFIOPCIDevice;
 
+typedef struct VFIOIOEventFD {
+    QLIST_ENTRY(VFIOIOEventFD) next;
+    MemoryRegion *mr;
+    hwaddr addr;
+    unsigned size;
+    uint64_t data;
+    EventNotifier e;
+    VFIORegion *region;
+    hwaddr region_addr;
+    bool match_data;
+    bool dynamic;
+} VFIOIOEventFD;
+
 typedef struct VFIOQuirk {
     QLIST_ENTRY(VFIOQuirk) next;
     void *data;
+    QLIST_HEAD(, VFIOIOEventFD) ioeventfds;
     int nr_mem;
     MemoryRegion *mem;
     void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 3/5] vfio/quirks: ioeventfd quirk acceleration
@ 2018-02-28 20:45   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found
in device MMIO space.  Normally PCI config space is considered a slow
path and further optimization is unnecessary, however NVIDIA uses a
register here to enable the MSI interrupt to re-trigger.  Exiting to
QEMU for this MSI-ACK handling can therefore rate limit our interrupt
handling.  Fortunately the MSI-ACK write is easily detected since the
quirk MemoryRegion otherwise has very few accesses, so simply looking
for consecutive writes with the same data is sufficient, in this case
10 consecutive writes with the same data and size is arbitrarily
chosen.  We configure the KVM ioeventfd with data match, so there's
no risk of triggering for the wrong data or size, but we do risk that
pathological driver behavior might consume all of QEMU's file
descriptors, so we cap ourselves to 10 ioeventfds for this purpose.

In support of the above, generic ioeventfd infrastructure is added
for vfio quirks.  This automatically initializes an ioeventfd list
per quirk, disables and frees ioeventfds on exit, and allows
ioeventfds marked as dynamic to be dropped on device reset.  The
rationale for this latter feature is that useful ioeventfds may
depend on specific driver behavior and since we necessarily place a
cap on our use of ioeventfds, a machine reset is a reasonable point
at which to assume a new driver and re-profile.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |  159 +++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/vfio/pci.h        |   14 ++++
 2 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f0947cbf152f..e01e2f0f69df 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -12,6 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
@@ -202,6 +203,7 @@ typedef struct VFIOConfigMirrorQuirk {
     uint32_t offset;
     uint8_t bar;
     MemoryRegion *mem;
+    uint8_t data[];
 } VFIOConfigMirrorQuirk;
 
 static uint64_t vfio_generic_quirk_mirror_read(void *opaque,
@@ -278,12 +280,84 @@ static const MemoryRegionOps vfio_ati_3c3_quirk = {
 static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
 {
     VFIOQuirk *quirk = g_new0(VFIOQuirk, 1);
+    QLIST_INIT(&quirk->ioeventfds);
     quirk->mem = g_new0(MemoryRegion, nr_mem);
     quirk->nr_mem = nr_mem;
 
     return quirk;
 }
 
+static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd)
+{
+    QLIST_REMOVE(ioeventfd, next);
+    memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size,
+                              ioeventfd->match_data, ioeventfd->data,
+                              &ioeventfd->e);
+    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL);
+    event_notifier_cleanup(&ioeventfd->e);
+    g_free(ioeventfd);
+}
+
+static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
+{
+    VFIOIOEventFD *ioeventfd, *tmp;
+
+    QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) {
+        if (ioeventfd->dynamic) {
+            vfio_ioeventfd_exit(ioeventfd);
+        }
+    }
+}
+
+static void vfio_ioeventfd_handler(void *opaque)
+{
+    VFIOIOEventFD *ioeventfd = opaque;
+
+    if (event_notifier_test_and_clear(&ioeventfd->e)) {
+        vfio_region_write(ioeventfd->region, ioeventfd->region_addr,
+                          ioeventfd->data, ioeventfd->size);
+    }
+}
+
+static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
+                                          MemoryRegion *mr, hwaddr addr,
+                                          unsigned size, uint64_t data,
+                                          VFIORegion *region,
+                                          hwaddr region_addr, bool dynamic)
+{
+    VFIOIOEventFD *ioeventfd = g_malloc0(sizeof(*ioeventfd));
+
+    if (event_notifier_init(&ioeventfd->e, 0)) {
+        g_free(ioeventfd);
+        return NULL;
+    }
+
+    /*
+     * MemoryRegion and relative offset, plus additional ioeventfd setup
+     * parameters for configuring and later tearing down KVM ioeventfd.
+     */
+    ioeventfd->mr = mr;
+    ioeventfd->addr = addr;
+    ioeventfd->size = size;
+    ioeventfd->data = data;
+    ioeventfd->match_data = true;
+    ioeventfd->dynamic = dynamic;
+    /*
+     * VFIORegion and relative offset for implementing the userspace
+     * handler.  data & size fields shared for both uses.
+     */
+    ioeventfd->region = region;
+    ioeventfd->region_addr = region_addr;
+
+    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                        vfio_ioeventfd_handler, NULL, ioeventfd);
+    memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr,
+                              ioeventfd->size, ioeventfd->match_data,
+                              ioeventfd->data, &ioeventfd->e);
+
+    return ioeventfd;
+}
+
 static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice *vdev)
 {
     VFIOQuirk *quirk;
@@ -719,6 +793,17 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
     trace_vfio_quirk_nvidia_bar5_probe(vdev->vbasedev.name);
 }
 
+typedef struct LastDataSet {
+    hwaddr addr;
+    uint64_t data;
+    unsigned size;
+    int hits;
+    int added;
+} LastDataSet;
+
+#define MAX_DYN_IOEVENTFD 10
+#define HITS_FOR_IOEVENTFD 10
+
 /*
  * Finally, BAR0 itself.  We want to redirect any accesses to either
  * 0x1800 or 0x88000 through the PCI config space access functions.
@@ -729,6 +814,7 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr,
     VFIOConfigMirrorQuirk *mirror = opaque;
     VFIOPCIDevice *vdev = mirror->vdev;
     PCIDevice *pdev = &vdev->pdev;
+    LastDataSet *last = (LastDataSet *)&mirror->data;
 
     vfio_generic_quirk_mirror_write(opaque, addr, data, size);
 
@@ -743,6 +829,59 @@ static void vfio_nvidia_quirk_mirror_write(void *opaque, hwaddr addr,
                           addr + mirror->offset, data, size);
         trace_vfio_quirk_nvidia_bar0_msi_ack(vdev->vbasedev.name);
     }
+
+    /*
+     * Automatically add an ioeventfd to handle any repeated write with the
+     * same data and size above the standard PCI config space header.  This is
+     * primarily expected to accelerate the MSI-ACK behavior, such as noted
+     * above.  Current hardware/drivers should trigger an ioeventfd at config
+     * offset 0x704 (region offset 0x88704), with data 0x0, size 4.
+     *
+     * The criteria of 10 successive hits is arbitrary but reliably adds the
+     * MSI-ACK region.  Note that as some writes are bypassed via the ioeventfd,
+     * the remaining ones have a greater chance of being seen successively.
+     * To avoid the pathological case of burning up all of QEMU's open file
+     * handles, arbitrarily limit this algorithm from adding no more than 10
+     * ioeventfds, print an error if we would have added an 11th, and then
+     * stop counting.
+     */
+    if (addr > PCI_STD_HEADER_SIZEOF && last->added < MAX_DYN_IOEVENTFD + 1) {
+        if (addr != last->addr || data != last->data || size != last->size) {
+            last->addr = addr;
+            last->data = data;
+            last->size = size;
+            last->hits = 1;
+        } else if (++last->hits >= HITS_FOR_IOEVENTFD) {
+            if (last->added < MAX_DYN_IOEVENTFD) {
+                VFIOIOEventFD *ioeventfd;
+                ioeventfd = vfio_ioeventfd_init(vdev, mirror->mem, addr, size,
+                                        data, &vdev->bars[mirror->bar].region,
+                                        mirror->offset + addr, true);
+                if (ioeventfd) {
+                    VFIOQuirk *quirk;
+
+                    QLIST_FOREACH(quirk,
+                                  &vdev->bars[mirror->bar].quirks, next) {
+                        if (quirk->data == mirror) {
+                            QLIST_INSERT_HEAD(&quirk->ioeventfds,
+                                              ioeventfd, next);
+                            break;
+                        }
+                    }
+
+                    assert(quirk != NULL); /* Check not found */
+
+                    last->added++;
+                }
+            } else {
+                last->added++;
+
+                error_report("NVIDIA ioeventfd queue full for %s, unable to "
+                             "accelerate 0x%"HWADDR_PRIx", data 0x%"PRIx64", "
+                             "size %u", vdev->vbasedev.name, addr, data, size);
+            }
+        }
+    }
 }
 
 static const MemoryRegionOps vfio_nvidia_mirror_quirk = {
@@ -751,6 +890,16 @@ static const MemoryRegionOps vfio_nvidia_mirror_quirk = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static void vfio_nvidia_bar0_quirk_reset(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
+{
+    VFIOConfigMirrorQuirk *mirror = quirk->data;
+    LastDataSet *last = (LastDataSet *)&mirror->data;
+
+    memset(last, 0, sizeof(*last));
+
+    vfio_drop_dynamic_eventfds(vdev, quirk);
+}
+
 static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 {
     VFIOQuirk *quirk;
@@ -763,7 +912,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     }
 
     quirk = vfio_quirk_alloc(1);
-    mirror = quirk->data = g_malloc0(sizeof(*mirror));
+    quirk->reset = vfio_nvidia_bar0_quirk_reset;
+    mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet));
     mirror->mem = quirk->mem;
     mirror->vdev = vdev;
     mirror->offset = 0x88000;
@@ -781,7 +931,8 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     /* The 0x1800 offset mirror only seems to get used by legacy VGA */
     if (vdev->vga) {
         quirk = vfio_quirk_alloc(1);
-        mirror = quirk->data = g_malloc0(sizeof(*mirror));
+        quirk->reset = vfio_nvidia_bar0_quirk_reset;
+        mirror = quirk->data = g_malloc0(sizeof(*mirror) + sizeof(LastDataSet));
         mirror->mem = quirk->mem;
         mirror->vdev = vdev;
         mirror->offset = 0x1800;
@@ -1668,6 +1819,10 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
     int i;
 
     QLIST_FOREACH(quirk, &bar->quirks, next) {
+        while (!QLIST_EMPTY(&quirk->ioeventfds)) {
+            vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds));
+        }
+
         for (i = 0; i < quirk->nr_mem; i++) {
             memory_region_del_subregion(bar->region.mem, &quirk->mem[i]);
         }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 0be41a70be1d..de651993b57a 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -24,9 +24,23 @@
 
 struct VFIOPCIDevice;
 
+typedef struct VFIOIOEventFD {
+    QLIST_ENTRY(VFIOIOEventFD) next;
+    MemoryRegion *mr;
+    hwaddr addr;
+    unsigned size;
+    uint64_t data;
+    EventNotifier e;
+    VFIORegion *region;
+    hwaddr region_addr;
+    bool match_data;
+    bool dynamic;
+} VFIOIOEventFD;
+
 typedef struct VFIOQuirk {
     QLIST_ENTRY(VFIOQuirk) next;
     void *data;
+    QLIST_HEAD(, VFIOIOEventFD) ioeventfds;
     int nr_mem;
     MemoryRegion *mem;
     void (*reset)(struct VFIOPCIDevice *vdev, struct VFIOQuirk *quirk);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] vfio: Update linux header
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-02-28 20:46   ` Alex Williamson
  -1 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

Update with proposed ioeventfd API.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 linux-headers/linux/vfio.h |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 4312e961ffd3..c9d7e2db132e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -503,6 +503,33 @@ struct vfio_pci_hot_reset {
 
 #define VFIO_DEVICE_PCI_HOT_RESET	_IO(VFIO_TYPE, VFIO_BASE + 13)
 
+/**
+ * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 16,
+ *                              struct vfio_device_ioeventfd)
+ *
+ * Perform a write to the device at the specified device fd offset, with
+ * the specified data and width when the provided eventfd is triggered.
+ * vfio bus drivers may not support this for all regions, or at all.
+ * vfio-pci currently only enables support for BAR regions and excludes
+ * the MSI-X vector table.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_ioeventfd {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_DEVICE_IOEVENTFD_8		(1 << 0) /* 1-byte write */
+#define VFIO_DEVICE_IOEVENTFD_16	(1 << 1) /* 2-byte write */
+#define VFIO_DEVICE_IOEVENTFD_32	(1 << 2) /* 4-byte write */
+#define VFIO_DEVICE_IOEVENTFD_64	(1 << 3) /* 8-byte write */
+#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK	(0xf)
+	__u64	offset;			/* device fd offset of write */
+	__u64	data;			/* data to be written */
+	__s32	fd;			/* -1 for de-assignment */
+};
+
+#define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 4/5] vfio: Update linux header
@ 2018-02-28 20:46   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

Update with proposed ioeventfd API.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 linux-headers/linux/vfio.h |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 4312e961ffd3..c9d7e2db132e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -503,6 +503,33 @@ struct vfio_pci_hot_reset {
 
 #define VFIO_DEVICE_PCI_HOT_RESET	_IO(VFIO_TYPE, VFIO_BASE + 13)
 
+/**
+ * VFIO_DEVICE_IOEVENTFD - _IOW(VFIO_TYPE, VFIO_BASE + 16,
+ *                              struct vfio_device_ioeventfd)
+ *
+ * Perform a write to the device at the specified device fd offset, with
+ * the specified data and width when the provided eventfd is triggered.
+ * vfio bus drivers may not support this for all regions, or at all.
+ * vfio-pci currently only enables support for BAR regions and excludes
+ * the MSI-X vector table.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_ioeventfd {
+	__u32	argsz;
+	__u32	flags;
+#define VFIO_DEVICE_IOEVENTFD_8		(1 << 0) /* 1-byte write */
+#define VFIO_DEVICE_IOEVENTFD_16	(1 << 1) /* 2-byte write */
+#define VFIO_DEVICE_IOEVENTFD_32	(1 << 2) /* 4-byte write */
+#define VFIO_DEVICE_IOEVENTFD_64	(1 << 3) /* 8-byte write */
+#define VFIO_DEVICE_IOEVENTFD_SIZE_MASK	(0xf)
+	__u64	offset;			/* device fd offset of write */
+	__u64	data;			/* data to be written */
+	__s32	fd;			/* -1 for de-assignment */
+};
+
+#define VFIO_DEVICE_IOEVENTFD		_IO(VFIO_TYPE, VFIO_BASE + 16)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-02-28 20:46   ` Alex Williamson
  -1 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: eric.auger, alex.williamson, peterx, kvm

With vfio ioeventfd support, we can program vfio-pci to perform a
specified BAR write when an eventfd is triggered.  This allows the
KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding
userspace handling for these events.  On the same micro-benchmark
where the ioeventfd got us to almost 90% of performance versus
disabling the GeForce quirks, this gets us to within 95%.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   45 +++++++++++++++++++++++++++++++++++++++------
 hw/vfio/pci.h        |    1 +
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index e01e2f0f69df..561fa6ea321d 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -16,6 +16,7 @@
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include <sys/ioctl.h>
 #include "hw/nvram/fw_cfg.h"
 #include "pci.h"
 #include "trace.h"
@@ -287,13 +288,31 @@ static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
     return quirk;
 }
 
-static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd)
+static void vfio_ioeventfd_exit(VFIOPCIDevice *vdev, VFIOIOEventFD *ioeventfd)
 {
     QLIST_REMOVE(ioeventfd, next);
+
     memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size,
                               ioeventfd->match_data, ioeventfd->data,
                               &ioeventfd->e);
-    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL);
+
+    if (ioeventfd->vfio) {
+        struct vfio_device_ioeventfd vfio_ioeventfd;
+
+        vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
+        vfio_ioeventfd.flags = ioeventfd->size;
+        vfio_ioeventfd.data = ioeventfd->data;
+        vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
+                                ioeventfd->region_addr;
+        vfio_ioeventfd.fd = -1;
+
+        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd);
+
+    } else {
+        qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                            NULL, NULL, NULL);
+    }
+
     event_notifier_cleanup(&ioeventfd->e);
     g_free(ioeventfd);
 }
@@ -304,7 +323,7 @@ static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
 
     QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) {
         if (ioeventfd->dynamic) {
-            vfio_ioeventfd_exit(ioeventfd);
+            vfio_ioeventfd_exit(vdev, ioeventfd);
         }
     }
 }
@@ -326,6 +345,7 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
                                           hwaddr region_addr, bool dynamic)
 {
     VFIOIOEventFD *ioeventfd = g_malloc0(sizeof(*ioeventfd));
+    struct vfio_device_ioeventfd vfio_ioeventfd;
 
     if (event_notifier_init(&ioeventfd->e, 0)) {
         g_free(ioeventfd);
@@ -349,8 +369,21 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
     ioeventfd->region = region;
     ioeventfd->region_addr = region_addr;
 
-    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
-                        vfio_ioeventfd_handler, NULL, ioeventfd);
+    vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
+    vfio_ioeventfd.flags = ioeventfd->size;
+    vfio_ioeventfd.data = ioeventfd->data;
+    vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
+                            ioeventfd->region_addr;
+    vfio_ioeventfd.fd = event_notifier_get_fd(&ioeventfd->e);
+
+    ioeventfd->vfio = !ioctl(vdev->vbasedev.fd,
+                             VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd);
+
+    if (!ioeventfd->vfio) {
+        qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                            vfio_ioeventfd_handler, NULL, ioeventfd);
+    }
+
     memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr,
                               ioeventfd->size, ioeventfd->match_data,
                               ioeventfd->data, &ioeventfd->e);
@@ -1820,7 +1853,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
 
     QLIST_FOREACH(quirk, &bar->quirks, next) {
         while (!QLIST_EMPTY(&quirk->ioeventfds)) {
-            vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds));
+            vfio_ioeventfd_exit(vdev, QLIST_FIRST(&quirk->ioeventfds));
         }
 
         for (i = 0; i < quirk->nr_mem; i++) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index de651993b57a..26c06e92ec26 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -35,6 +35,7 @@ typedef struct VFIOIOEventFD {
     hwaddr region_addr;
     bool match_data;
     bool dynamic;
+    bool vfio;
 } VFIOIOEventFD;
 
 typedef struct VFIOQuirk {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 5/5] vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly
@ 2018-02-28 20:46   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2018-02-28 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kvm, alex.williamson, peterx, eric.auger

With vfio ioeventfd support, we can program vfio-pci to perform a
specified BAR write when an eventfd is triggered.  This allows the
KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding
userspace handling for these events.  On the same micro-benchmark
where the ioeventfd got us to almost 90% of performance versus
disabling the GeForce quirks, this gets us to within 95%.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |   45 +++++++++++++++++++++++++++++++++++++++------
 hw/vfio/pci.h        |    1 +
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index e01e2f0f69df..561fa6ea321d 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -16,6 +16,7 @@
 #include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include <sys/ioctl.h>
 #include "hw/nvram/fw_cfg.h"
 #include "pci.h"
 #include "trace.h"
@@ -287,13 +288,31 @@ static VFIOQuirk *vfio_quirk_alloc(int nr_mem)
     return quirk;
 }
 
-static void vfio_ioeventfd_exit(VFIOIOEventFD *ioeventfd)
+static void vfio_ioeventfd_exit(VFIOPCIDevice *vdev, VFIOIOEventFD *ioeventfd)
 {
     QLIST_REMOVE(ioeventfd, next);
+
     memory_region_del_eventfd(ioeventfd->mr, ioeventfd->addr, ioeventfd->size,
                               ioeventfd->match_data, ioeventfd->data,
                               &ioeventfd->e);
-    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e), NULL, NULL, NULL);
+
+    if (ioeventfd->vfio) {
+        struct vfio_device_ioeventfd vfio_ioeventfd;
+
+        vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
+        vfio_ioeventfd.flags = ioeventfd->size;
+        vfio_ioeventfd.data = ioeventfd->data;
+        vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
+                                ioeventfd->region_addr;
+        vfio_ioeventfd.fd = -1;
+
+        ioctl(vdev->vbasedev.fd, VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd);
+
+    } else {
+        qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                            NULL, NULL, NULL);
+    }
+
     event_notifier_cleanup(&ioeventfd->e);
     g_free(ioeventfd);
 }
@@ -304,7 +323,7 @@ static void vfio_drop_dynamic_eventfds(VFIOPCIDevice *vdev, VFIOQuirk *quirk)
 
     QLIST_FOREACH_SAFE(ioeventfd, &quirk->ioeventfds, next, tmp) {
         if (ioeventfd->dynamic) {
-            vfio_ioeventfd_exit(ioeventfd);
+            vfio_ioeventfd_exit(vdev, ioeventfd);
         }
     }
 }
@@ -326,6 +345,7 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
                                           hwaddr region_addr, bool dynamic)
 {
     VFIOIOEventFD *ioeventfd = g_malloc0(sizeof(*ioeventfd));
+    struct vfio_device_ioeventfd vfio_ioeventfd;
 
     if (event_notifier_init(&ioeventfd->e, 0)) {
         g_free(ioeventfd);
@@ -349,8 +369,21 @@ static VFIOIOEventFD *vfio_ioeventfd_init(VFIOPCIDevice *vdev,
     ioeventfd->region = region;
     ioeventfd->region_addr = region_addr;
 
-    qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
-                        vfio_ioeventfd_handler, NULL, ioeventfd);
+    vfio_ioeventfd.argsz = sizeof(vfio_ioeventfd);
+    vfio_ioeventfd.flags = ioeventfd->size;
+    vfio_ioeventfd.data = ioeventfd->data;
+    vfio_ioeventfd.offset = ioeventfd->region->fd_offset +
+                            ioeventfd->region_addr;
+    vfio_ioeventfd.fd = event_notifier_get_fd(&ioeventfd->e);
+
+    ioeventfd->vfio = !ioctl(vdev->vbasedev.fd,
+                             VFIO_DEVICE_IOEVENTFD, &vfio_ioeventfd);
+
+    if (!ioeventfd->vfio) {
+        qemu_set_fd_handler(event_notifier_get_fd(&ioeventfd->e),
+                            vfio_ioeventfd_handler, NULL, ioeventfd);
+    }
+
     memory_region_add_eventfd(ioeventfd->mr, ioeventfd->addr,
                               ioeventfd->size, ioeventfd->match_data,
                               ioeventfd->data, &ioeventfd->e);
@@ -1820,7 +1853,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr)
 
     QLIST_FOREACH(quirk, &bar->quirks, next) {
         while (!QLIST_EMPTY(&quirk->ioeventfds)) {
-            vfio_ioeventfd_exit(QLIST_FIRST(&quirk->ioeventfds));
+            vfio_ioeventfd_exit(vdev, QLIST_FIRST(&quirk->ioeventfds));
         }
 
         for (i = 0; i < quirk->nr_mem; i++) {
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index de651993b57a..26c06e92ec26 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -35,6 +35,7 @@ typedef struct VFIOIOEventFD {
     hwaddr region_addr;
     bool match_data;
     bool dynamic;
+    bool vfio;
 } VFIOIOEventFD;
 
 typedef struct VFIOQuirk {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] vfio/quirks: Add common quirk alloc helper
  2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
@ 2018-03-07  7:04     ` Peter Xu
  -1 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: eric.auger, qemu-devel, kvm

On Wed, Feb 28, 2018 at 01:45:23PM -0700, Alex Williamson wrote:
> This will later be used to include list initialization.
> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] vfio/quirks: Add common quirk alloc helper
@ 2018-03-07  7:04     ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm, eric.auger

On Wed, Feb 28, 2018 at 01:45:23PM -0700, Alex Williamson wrote:
> This will later be used to include list initialization.
> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] vfio/quirks: Add quirk reset callback
  2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
@ 2018-03-07  7:04     ` Peter Xu
  -1 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: eric.auger, qemu-devel, kvm

On Wed, Feb 28, 2018 at 01:45:37PM -0700, Alex Williamson wrote:
> Quirks can be self modifying, provide a hook to allow them to cleanup
> on device reset if desired.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] vfio/quirks: Add quirk reset callback
@ 2018-03-07  7:04     ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm, eric.auger

On Wed, Feb 28, 2018 at 01:45:37PM -0700, Alex Williamson wrote:
> Quirks can be self modifying, provide a hook to allow them to cleanup
> on device reset if desired.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] vfio/quirks: ioeventfd quirk acceleration
  2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
@ 2018-03-07  7:06     ` Peter Xu
  -1 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: eric.auger, qemu-devel, kvm

On Wed, Feb 28, 2018 at 01:45:54PM -0700, Alex Williamson wrote:
> The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found
> in device MMIO space.  Normally PCI config space is considered a slow
> path and further optimization is unnecessary, however NVIDIA uses a
> register here to enable the MSI interrupt to re-trigger.  Exiting to
> QEMU for this MSI-ACK handling can therefore rate limit our interrupt
> handling.  Fortunately the MSI-ACK write is easily detected since the
> quirk MemoryRegion otherwise has very few accesses, so simply looking
> for consecutive writes with the same data is sufficient, in this case
> 10 consecutive writes with the same data and size is arbitrarily
> chosen.  We configure the KVM ioeventfd with data match, so there's
> no risk of triggering for the wrong data or size, but we do risk that
> pathological driver behavior might consume all of QEMU's file
> descriptors, so we cap ourselves to 10 ioeventfds for this purpose.
> 
> In support of the above, generic ioeventfd infrastructure is added
> for vfio quirks.  This automatically initializes an ioeventfd list
> per quirk, disables and frees ioeventfds on exit, and allows
> ioeventfds marked as dynamic to be dropped on device reset.  The
> rationale for this latter feature is that useful ioeventfds may
> depend on specific driver behavior and since we necessarily place a
> cap on our use of ioeventfds, a machine reset is a reasonable point
> at which to assume a new driver and re-profile.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

I don't know when will there be non-dynamic vfio-ioeventfds, but it
looks fine at least to me even if all of them are dynamic now:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] vfio/quirks: ioeventfd quirk acceleration
@ 2018-03-07  7:06     ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2018-03-07  7:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm, eric.auger

On Wed, Feb 28, 2018 at 01:45:54PM -0700, Alex Williamson wrote:
> The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found
> in device MMIO space.  Normally PCI config space is considered a slow
> path and further optimization is unnecessary, however NVIDIA uses a
> register here to enable the MSI interrupt to re-trigger.  Exiting to
> QEMU for this MSI-ACK handling can therefore rate limit our interrupt
> handling.  Fortunately the MSI-ACK write is easily detected since the
> quirk MemoryRegion otherwise has very few accesses, so simply looking
> for consecutive writes with the same data is sufficient, in this case
> 10 consecutive writes with the same data and size is arbitrarily
> chosen.  We configure the KVM ioeventfd with data match, so there's
> no risk of triggering for the wrong data or size, but we do risk that
> pathological driver behavior might consume all of QEMU's file
> descriptors, so we cap ourselves to 10 ioeventfds for this purpose.
> 
> In support of the above, generic ioeventfd infrastructure is added
> for vfio quirks.  This automatically initializes an ioeventfd list
> per quirk, disables and frees ioeventfds on exit, and allows
> ioeventfds marked as dynamic to be dropped on device reset.  The
> rationale for this latter feature is that useful ioeventfds may
> depend on specific driver behavior and since we necessarily place a
> cap on our use of ioeventfds, a machine reset is a reasonable point
> at which to assume a new driver and re-profile.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

I don't know when will there be non-dynamic vfio-ioeventfds, but it
looks fine at least to me even if all of them are dynamic now:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] vfio/quirks: ioeventfd support
  2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
@ 2018-03-13 14:38   ` Auger Eric
  -1 siblings, 0 replies; 20+ messages in thread
From: Auger Eric @ 2018-03-13 14:38 UTC (permalink / raw)
  To: Alex Williamson, qemu-devel; +Cc: peterx, kvm

Hi Alex,
On 28/02/18 21:45, Alex Williamson wrote:
> This is the QEMU counterpart to https://lkml.org/lkml/2018/2/28/1222
> 
> As described in the third patch, we have a use case for taking
> advantage of existing KVM ioeventfd support for accelerating the
> MSI-ACK behavior of NVIDIA GPUs.  This series adds generic
> infrastructure within vfio quirks for making use of ioeventfds and
> specifically enables it for this purpose.  The first three patches
> provide a performance improvement on their own and do not depend on
> the additional acceleration added by the remainder of the patches to
> be worthwhile.  The Linux header update in patch 4 is not intended
> to be a full refresh, the kernel API is not yet upstream, this is for
> testing and review purposes.  The intention would be to commit the
> series in separate chunks, 1-3 once we have review consensus, 4-5 as
> RFC until the kernel API is upstream.
> 
> RFC->v1:
>  * Cap the number of dynamically added ioeventfds to 10 such that
>    pathological driver behavior cannot consume too many file handles.
>  * Added a reset hook and cleanup mechanism to drop dynamically added
>    ioeventfds on device reset.
>  * Additional comments and removed info_report.
>  * Folded ioeventfd infrastructure patch into usage patch, fail to
>    stand on its own without setup, which requires consumers.
> 
> Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (5):
>       vfio/quirks: Add common quirk alloc helper
>       vfio/quirks: Add quirk reset callback
>       vfio/quirks: ioeventfd quirk acceleration
>       vfio: Update linux header
>       vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly
> 
> 
>  hw/vfio/pci-quirks.c       |  255 +++++++++++++++++++++++++++++++++++++++-----
>  hw/vfio/pci.c              |    2 
>  hw/vfio/pci.h              |   17 +++
>  linux-headers/linux/vfio.h |   27 +++++
>  4 files changed, 272 insertions(+), 29 deletions(-)
> 
For all patches except the yet partial linux header update,

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] vfio/quirks: ioeventfd support
@ 2018-03-13 14:38   ` Auger Eric
  0 siblings, 0 replies; 20+ messages in thread
From: Auger Eric @ 2018-03-13 14:38 UTC (permalink / raw)
  To: Alex Williamson, qemu-devel; +Cc: kvm, peterx

Hi Alex,
On 28/02/18 21:45, Alex Williamson wrote:
> This is the QEMU counterpart to https://lkml.org/lkml/2018/2/28/1222
> 
> As described in the third patch, we have a use case for taking
> advantage of existing KVM ioeventfd support for accelerating the
> MSI-ACK behavior of NVIDIA GPUs.  This series adds generic
> infrastructure within vfio quirks for making use of ioeventfds and
> specifically enables it for this purpose.  The first three patches
> provide a performance improvement on their own and do not depend on
> the additional acceleration added by the remainder of the patches to
> be worthwhile.  The Linux header update in patch 4 is not intended
> to be a full refresh, the kernel API is not yet upstream, this is for
> testing and review purposes.  The intention would be to commit the
> series in separate chunks, 1-3 once we have review consensus, 4-5 as
> RFC until the kernel API is upstream.
> 
> RFC->v1:
>  * Cap the number of dynamically added ioeventfds to 10 such that
>    pathological driver behavior cannot consume too many file handles.
>  * Added a reset hook and cleanup mechanism to drop dynamically added
>    ioeventfds on device reset.
>  * Additional comments and removed info_report.
>  * Folded ioeventfd infrastructure patch into usage patch, fail to
>    stand on its own without setup, which requires consumers.
> 
> Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (5):
>       vfio/quirks: Add common quirk alloc helper
>       vfio/quirks: Add quirk reset callback
>       vfio/quirks: ioeventfd quirk acceleration
>       vfio: Update linux header
>       vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly
> 
> 
>  hw/vfio/pci-quirks.c       |  255 +++++++++++++++++++++++++++++++++++++++-----
>  hw/vfio/pci.c              |    2 
>  hw/vfio/pci.h              |   17 +++
>  linux-headers/linux/vfio.h |   27 +++++
>  4 files changed, 272 insertions(+), 29 deletions(-)
> 
For all patches except the yet partial linux header update,

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-03-13 14:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 20:45 [PATCH 0/5] vfio/quirks: ioeventfd support Alex Williamson
2018-02-28 20:45 ` [Qemu-devel] " Alex Williamson
2018-02-28 20:45 ` [PATCH 1/5] vfio/quirks: Add common quirk alloc helper Alex Williamson
2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
2018-03-07  7:04   ` Peter Xu
2018-03-07  7:04     ` [Qemu-devel] " Peter Xu
2018-02-28 20:45 ` [PATCH 2/5] vfio/quirks: Add quirk reset callback Alex Williamson
2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
2018-03-07  7:04   ` Peter Xu
2018-03-07  7:04     ` [Qemu-devel] " Peter Xu
2018-02-28 20:45 ` [PATCH 3/5] vfio/quirks: ioeventfd quirk acceleration Alex Williamson
2018-02-28 20:45   ` [Qemu-devel] " Alex Williamson
2018-03-07  7:06   ` Peter Xu
2018-03-07  7:06     ` [Qemu-devel] " Peter Xu
2018-02-28 20:46 ` [PATCH 4/5] vfio: Update linux header Alex Williamson
2018-02-28 20:46   ` [Qemu-devel] " Alex Williamson
2018-02-28 20:46 ` [PATCH 5/5] vfio/quirks: Enable ioeventfd quirks to be handled by vfio directly Alex Williamson
2018-02-28 20:46   ` [Qemu-devel] " Alex Williamson
2018-03-13 14:38 ` [PATCH 0/5] vfio/quirks: ioeventfd support Auger Eric
2018-03-13 14:38   ` [Qemu-devel] " Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.