All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19
@ 2016-01-19 19:17 Alex Williamson
  2016-01-19 19:17 ` [Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space Alex Williamson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Alex Williamson @ 2016-01-19 19:17 UTC (permalink / raw)
  To: qemu-devel

The following changes since commit 3db34bf64ab4f8797565dd8750003156c32b301d:

  Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging (2016-01-18 17:40:50 +0000)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20160119.0

for you to fetch changes up to 95239e162518dc6577164be3d9a789aba7f591a3:

  vfio/pci: Lazy PBA emulation (2016-01-19 11:33:42 -0700)

----------------------------------------------------------------
VFIO updates 2016-01-19

 - Performance fix for devices with poorly placed MSI-X PBA regions
 - Quirk fix for hosts with broken MMCONFIG access

----------------------------------------------------------------
Alex Williamson (2):
      vfio/pci-quirks: Only quirk to size of PCI config space
      vfio/pci: Lazy PBA emulation

 hw/vfio/pci-quirks.c |  6 +++---
 hw/vfio/pci.c        | 39 +++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.h        |  1 +
 trace-events         |  2 ++
 4 files changed, 45 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space
  2016-01-19 19:17 [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Alex Williamson
@ 2016-01-19 19:17 ` Alex Williamson
  2016-01-19 19:18 ` [Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation Alex Williamson
  2016-01-21 12:09 ` [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Peter Maydell
  2 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2016-01-19 19:17 UTC (permalink / raw)
  To: qemu-devel

For quirks that support the full PCIe extended config space, limit the
quirk to only the size of config space available through vfio.  This
allows host systems with broken MMCONFIG regions to still make use of
these quirks without generating bad address faults trying to access
beyond the end of config space exposed through vfio.  This may expose
direct access to the mirror of extended config space, only trapping
the sub-range of standard config space, but allowing this makes the
quirk, and thus the device, functional.  We expect that only device
specific accesses make use of the mirror, not general extended PCI
capability accesses, so any virtualization in this space is likely
unnecessary anyway, and the device is still IOMMU isolated, so it
should only be able to hurt itself through any bogus configurations
enabled by this space.

Link: https://www.redhat.com/archives/vfio-users/2015-November/msg00192.html
Reported-by: Ronnie Swanink <ronnie@ronnieswanink.nl>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 30c68a1..e117c41 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -328,7 +328,7 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, int nr)
     window->data_offset = 4;
     window->nr_matches = 1;
     window->matches[0].match = 0x4000;
-    window->matches[0].mask = PCIE_CONFIG_SPACE_SIZE - 1;
+    window->matches[0].mask = vdev->config_size - 1;
     window->bar = nr;
     window->addr_mem = &quirk->mem[0];
     window->data_mem = &quirk->mem[1];
@@ -674,7 +674,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice *vdev, int nr)
     window->matches[0].match = 0x1800;
     window->matches[0].mask = PCI_CONFIG_SPACE_SIZE - 1;
     window->matches[1].match = 0x88000;
-    window->matches[1].mask = PCIE_CONFIG_SPACE_SIZE - 1;
+    window->matches[1].mask = vdev->config_size - 1;
     window->bar = nr;
     window->addr_mem = bar5->addr_mem = &quirk->mem[0];
     window->data_mem = bar5->data_mem = &quirk->mem[1];
@@ -765,7 +765,7 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice *vdev, int nr)
     memory_region_init_io(mirror->mem, OBJECT(vdev),
                           &vfio_nvidia_mirror_quirk, mirror,
                           "vfio-nvidia-bar0-88000-mirror-quirk",
-                          PCIE_CONFIG_SPACE_SIZE);
+                          vdev->config_size);
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
                                         mirror->offset, mirror->mem, 1);
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation
  2016-01-19 19:17 [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Alex Williamson
  2016-01-19 19:17 ` [Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space Alex Williamson
@ 2016-01-19 19:18 ` Alex Williamson
  2016-01-21 12:09 ` [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Peter Maydell
  2 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2016-01-19 19:18 UTC (permalink / raw)
  To: qemu-devel

The PCI spec recommends devices use additional alignment for MSI-X
data structures to allow software to map them to separate processor
pages.  One advantage of doing this is that we can emulate those data
structures without a significant performance impact to the operation
of the device.  Some devices fail to implement that suggestion and
assigned device performance suffers.

One such case of this is a Mellanox MT27500 series, ConnectX-3 VF,
where the MSI-X vector table and PBA are aligned on separate 4K
pages.  If PBA emulation is enabled, performance suffers.  It's not
clear how much value we get from PBA emulation, but the solution here
is to only lazily enable the emulated PBA when a masked MSI-X vector
fires.  We then attempt to more aggresively disable the PBA memory
region any time a vector is unmasked.  The expectation is then that
a typical VM will run entirely with PBA emulation disabled, and only
when used is that emulation re-enabled.

Reported-by: Shyam Kaushik <shyam.kaushik@gmail.com>
Tested-by: Shyam Kaushik <shyam.kaushik@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci.c |   39 +++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.h |    1 +
 trace-events  |    2 ++
 3 files changed, 42 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..e66c47f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -356,6 +356,13 @@ static void vfio_msi_interrupt(void *opaque)
     if (vdev->interrupt == VFIO_INT_MSIX) {
         get_msg = msix_get_message;
         notify = msix_notify;
+
+        /* A masked vector firing needs to use the PBA, enable it */
+        if (msix_is_masked(&vdev->pdev, nr)) {
+            set_bit(nr, vdev->msix->pending);
+            memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, true);
+            trace_vfio_msix_pba_enable(vdev->vbasedev.name);
+        }
     } else if (vdev->interrupt == VFIO_INT_MSI) {
         get_msg = msi_get_message;
         notify = msi_notify;
@@ -535,6 +542,14 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
         }
     }
 
+    /* Disable PBA emulation when nothing more is pending. */
+    clear_bit(nr, vdev->msix->pending);
+    if (find_first_bit(vdev->msix->pending,
+                       vdev->nr_vectors) == vdev->nr_vectors) {
+        memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
+        trace_vfio_msix_pba_disable(vdev->vbasedev.name);
+    }
+
     return 0;
 }
 
@@ -738,6 +753,9 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
 
     vfio_msi_disable_common(vdev);
 
+    memset(vdev->msix->pending, 0,
+           BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long));
+
     trace_vfio_msix_disable(vdev->vbasedev.name);
 }
 
@@ -1251,6 +1269,8 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
 {
     int ret;
 
+    vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
+                                    sizeof(unsigned long));
     ret = msix_init(&vdev->pdev, vdev->msix->entries,
                     &vdev->bars[vdev->msix->table_bar].region.mem,
                     vdev->msix->table_bar, vdev->msix->table_offset,
@@ -1264,6 +1284,24 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
         return ret;
     }
 
+    /*
+     * The PCI spec suggests that devices provide additional alignment for
+     * MSI-X structures and avoid overlapping non-MSI-X related registers.
+     * For an assigned device, this hopefully means that emulation of MSI-X
+     * structures does not affect the performance of the device.  If devices
+     * fail to provide that alignment, a significant performance penalty may
+     * result, for instance Mellanox MT27500 VFs:
+     * http://www.spinics.net/lists/kvm/msg125881.html
+     *
+     * The PBA is simply not that important for such a serious regression and
+     * most drivers do not appear to look at it.  The solution for this is to
+     * disable the PBA MemoryRegion unless it's being used.  We disable it
+     * here and only enable it if a masked vector fires through QEMU.  As the
+     * vector-use notifier is called, which occurs on unmask, we test whether
+     * PBA emulation is needed and again disable if not.
+     */
+    memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false);
+
     return 0;
 }
 
@@ -1275,6 +1313,7 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
         msix_uninit(&vdev->pdev,
                     &vdev->bars[vdev->msix->table_bar].region.mem,
                     &vdev->bars[vdev->msix->pba_bar].region.mem);
+        g_free(vdev->msix->pending);
     }
 }
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f004d52..6256587 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -95,6 +95,7 @@ typedef struct VFIOMSIXInfo {
     uint32_t pba_offset;
     MemoryRegion mmap_mem;
     void *mmap;
+    unsigned long *pending;
 } VFIOMSIXInfo;
 
 typedef struct VFIOPCIDevice {
diff --git a/trace-events b/trace-events
index 934a7b6..c9ac144 100644
--- a/trace-events
+++ b/trace-events
@@ -1631,6 +1631,8 @@ vfio_msi_interrupt(const char *name, int index, uint64_t addr, int data) " (%s)
 vfio_msix_vector_do_use(const char *name, int index) " (%s) vector %d used"
 vfio_msix_vector_release(const char *name, int index) " (%s) vector %d released"
 vfio_msix_enable(const char *name) " (%s)"
+vfio_msix_pba_disable(const char *name) " (%s)"
+vfio_msix_pba_enable(const char *name) " (%s)"
 vfio_msix_disable(const char *name) " (%s)"
 vfio_msi_enable(const char *name, int nr_vectors) " (%s) Enabled %d MSI vectors"
 vfio_msi_disable(const char *name) " (%s)"

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19
  2016-01-19 19:17 [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Alex Williamson
  2016-01-19 19:17 ` [Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space Alex Williamson
  2016-01-19 19:18 ` [Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation Alex Williamson
@ 2016-01-21 12:09 ` Peter Maydell
  2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2016-01-21 12:09 UTC (permalink / raw)
  To: Alex Williamson; +Cc: QEMU Developers

On 19 January 2016 at 19:17, Alex Williamson <alex.williamson@redhat.com> wrote:
> The following changes since commit 3db34bf64ab4f8797565dd8750003156c32b301d:
>
>   Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging (2016-01-18 17:40:50 +0000)
>
> are available in the git repository at:
>
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20160119.0
>
> for you to fetch changes up to 95239e162518dc6577164be3d9a789aba7f591a3:
>
>   vfio/pci: Lazy PBA emulation (2016-01-19 11:33:42 -0700)
>
> ----------------------------------------------------------------
> VFIO updates 2016-01-19
>
>  - Performance fix for devices with poorly placed MSI-X PBA regions
>  - Quirk fix for hosts with broken MMCONFIG access
>
> ----------------------------------------------------------------

Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-21 12:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-19 19:17 [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Alex Williamson
2016-01-19 19:17 ` [Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space Alex Williamson
2016-01-19 19:18 ` [Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation Alex Williamson
2016-01-21 12:09 ` [Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19 Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.