All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17
@ 2018-08-17 17:08 Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users Alex Williamson
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Alex Williamson @ 2018-08-17 17:08 UTC (permalink / raw)
  To: qemu-devel

The following changes since commit 0abaa41d936becd914a16ee1fe2a981d96d19428:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging (2018-08-17 09:46:00 +0100)

are available in the Git repository at:

  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20180817.0

for you to fetch changes up to 238e91728503d400e1c4e644e3a9b80f9e621682:

  vfio/ccw/pci: Allow devices to opt-in for ballooning (2018-08-17 09:27:16 -0600)

----------------------------------------------------------------
VFIO update 2018-08-17

 - Enhance balloon inhibitor for multiple users and use around vfio
   device assignment (Alex Williamson)

----------------------------------------------------------------
Alex Williamson (4):
      balloon: Allow multiple inhibit users
      kvm: Use inhibit to prevent ballooning without synchronous mmu
      vfio: Inhibit ballooning based on group attachment to a container
      vfio/ccw/pci: Allow devices to opt-in for ballooning

 accel/kvm/kvm-all.c           |  4 ++++
 balloon.c                     | 13 ++++++++---
 hw/vfio/ccw.c                 |  9 ++++++++
 hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 26 +++++++++++++++++++++-
 hw/vfio/trace-events          |  1 +
 hw/virtio/virtio-balloon.c    |  4 +---
 include/hw/vfio/vfio-common.h |  2 ++
 8 files changed, 103 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users
  2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
@ 2018-08-17 17:08 ` Alex Williamson
  2018-08-22 15:56   ` Christian Borntraeger
  2018-08-17 17:08 ` [Qemu-devel] [PULL 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu Alex Williamson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2018-08-17 17:08 UTC (permalink / raw)
  To: qemu-devel

A simple true/false internal state does not allow multiple users.  Fix
this within the existing interface by converting to a counter, so long
as the counter is elevated, ballooning is inhibited.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 balloon.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/balloon.c b/balloon.c
index 6bf0a9681377..931987983858 100644
--- a/balloon.c
+++ b/balloon.c
@@ -26,6 +26,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/atomic.h"
 #include "exec/cpu-common.h"
 #include "sysemu/kvm.h"
 #include "sysemu/balloon.h"
@@ -37,16 +38,22 @@
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
-static bool balloon_inhibited;
+static int balloon_inhibit_count;
 
 bool qemu_balloon_is_inhibited(void)
 {
-    return balloon_inhibited;
+    return atomic_read(&balloon_inhibit_count) > 0;
 }
 
 void qemu_balloon_inhibit(bool state)
 {
-    balloon_inhibited = state;
+    if (state) {
+        atomic_inc(&balloon_inhibit_count);
+    } else {
+        atomic_dec(&balloon_inhibit_count);
+    }
+
+    assert(atomic_read(&balloon_inhibit_count) >= 0);
 }
 
 static bool have_balloon(Error **errp)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PULL 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu
  2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users Alex Williamson
@ 2018-08-17 17:08 ` Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 3/4] vfio: Inhibit ballooning based on group attachment to a container Alex Williamson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2018-08-17 17:08 UTC (permalink / raw)
  To: qemu-devel

Remove KVM specific tests in balloon_page(), instead marking
ballooning as inhibited without KVM_CAP_SYNC_MMU support.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 accel/kvm/kvm-all.c        |    4 ++++
 hw/virtio/virtio-balloon.c |    4 +---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eb7db92a5e3b..38f468d8e2b1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -39,6 +39,7 @@
 #include "trace.h"
 #include "hw/irq.h"
 #include "sysemu/sev.h"
+#include "sysemu/balloon.h"
 
 #include "hw/boards.h"
 
@@ -1698,6 +1699,9 @@ static int kvm_init(MachineState *ms)
     s->many_ioeventfds = kvm_check_many_ioeventfds();
 
     s->sync_mmu = !!kvm_vm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
+    if (!s->sync_mmu) {
+        qemu_balloon_inhibit(true);
+    }
 
     return 0;
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1f7a87f09429..b5425080c5fb 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -21,7 +21,6 @@
 #include "hw/mem/pc-dimm.h"
 #include "sysemu/balloon.h"
 #include "hw/virtio/virtio-balloon.h"
-#include "sysemu/kvm.h"
 #include "exec/address-spaces.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-misc.h"
@@ -36,8 +35,7 @@
 
 static void balloon_page(void *addr, int deflate)
 {
-    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
-                                         kvm_has_sync_mmu())) {
+    if (!qemu_balloon_is_inhibited()) {
         qemu_madvise(addr, BALLOON_PAGE_SIZE,
                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
     }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PULL 3/4] vfio: Inhibit ballooning based on group attachment to a container
  2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu Alex Williamson
@ 2018-08-17 17:08 ` Alex Williamson
  2018-08-17 17:08 ` [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
  2018-08-20  8:47 ` [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Peter Maydell
  4 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2018-08-17 17:08 UTC (permalink / raw)
  To: qemu-devel

We use a VFIOContainer to associate an AddressSpace to one or more
VFIOGroups.  The VFIOContainer represents the DMA context for that
AdressSpace for those VFIOGroups and is synchronized to changes in
that AddressSpace via a MemoryListener.  For IOMMU backed devices,
maintaining the DMA context for a VFIOGroup generally involves
pinning a host virtual address in order to create a stable host
physical address and then mapping a translation from the associated
guest physical address to that host physical address into the IOMMU.

While the above maintains the VFIOContainer synchronized to the QEMU
memory API of the VM, memory ballooning occurs outside of that API.
Inflating the memory balloon (ie. cooperatively capturing pages from
the guest for use by the host) simply uses MADV_DONTNEED to "zap"
pages from QEMU's host virtual address space.  The page pinning and
IOMMU mapping above remains in place, negating the host's ability to
reuse the page, but the host virtual to host physical mapping of the
page is invalidated outside of QEMU's memory API.

When the balloon is later deflated, attempting to cooperatively
return pages to the guest, the page is simply freed by the guest
balloon driver, allowing it to be used in the guest and incurring a
page fault when that occurs.  The page fault maps a new host physical
page backing the existing host virtual address, meanwhile the
VFIOContainer still maintains the translation to the original host
physical address.  At this point the guest vCPU and any assigned
devices will map different host physical addresses to the same guest
physical address.  Badness.

The IOMMU typically does not have page level granularity with which
it can track this mapping without also incurring inefficiencies in
using page size mappings throughout.  MMU notifiers in the host
kernel also provide indicators for invalidating the mapping on
balloon inflation, not for updating the mapping when the balloon is
deflated.  For these reasons we assume a default behavior that the
mapping of each VFIOGroup into the VFIOContainer is incompatible
with memory ballooning and increment the balloon inhibitor to match
the attached VFIOGroups.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fb396cf00ac4..a3d758af8d8f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -32,6 +32,7 @@
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
+#include "sysemu/balloon.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "qapi/error.h"
@@ -1044,6 +1045,33 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     space = vfio_get_address_space(as);
 
+    /*
+     * VFIO is currently incompatible with memory ballooning insofar as the
+     * madvise to purge (zap) the page from QEMU's address space does not
+     * interact with the memory API and therefore leaves stale virtual to
+     * physical mappings in the IOMMU if the page was previously pinned.  We
+     * therefore add a balloon inhibit for each group added to a container,
+     * whether the container is used individually or shared.  This provides
+     * us with options to allow devices within a group to opt-in and allow
+     * ballooning, so long as it is done consistently for a group (for instance
+     * if the device is an mdev device where it is known that the host vendor
+     * driver will never pin pages outside of the working set of the guest
+     * driver, which would thus not be ballooning candidates).
+     *
+     * The first opportunity to induce pinning occurs here where we attempt to
+     * attach the group to existing containers within the AddressSpace.  If any
+     * pages are already zapped from the virtual address space, such as from a
+     * previous ballooning opt-in, new pinning will cause valid mappings to be
+     * re-established.  Likewise, when the overall MemoryListener for a new
+     * container is registered, a replay of mappings within the AddressSpace
+     * will occur, re-establishing any previously zapped pages as well.
+     *
+     * NB. Balloon inhibiting does not currently block operation of the
+     * balloon driver or revoke previously pinned pages, it only prevents
+     * calling madvise to modify the virtual mapping of ballooned pages.
+     */
+    qemu_balloon_inhibit(true);
+
     QLIST_FOREACH(container, &space->containers, next) {
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             group->container = container;
@@ -1232,6 +1260,7 @@ close_fd_exit:
     close(fd);
 
 put_space_exit:
+    qemu_balloon_inhibit(false);
     vfio_put_address_space(space);
 
     return ret;
@@ -1352,6 +1381,7 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    qemu_balloon_inhibit(false);
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning
  2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
                   ` (2 preceding siblings ...)
  2018-08-17 17:08 ` [Qemu-devel] [PULL 3/4] vfio: Inhibit ballooning based on group attachment to a container Alex Williamson
@ 2018-08-17 17:08 ` Alex Williamson
  2018-08-20 14:23   ` Peter Maydell
  2018-08-20  8:47 ` [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Peter Maydell
  4 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2018-08-17 17:08 UTC (permalink / raw)
  To: qemu-devel

If a vfio assigned device makes use of a physical IOMMU, then memory
ballooning is necessarily inhibited due to the page pinning, lack of
page level granularity at the IOMMU, and sufficient notifiers to both
remove the page on balloon inflation and add it back on deflation.
However, not all devices are backed by a physical IOMMU.  In the case
of mediated devices, if a vendor driver is well synchronized with the
guest driver, such that only pages actively used by the guest driver
are pinned by the host mdev vendor driver, then there should be no
overlap between pages available for the balloon driver and pages
actively in use by the device.  Under these conditions, ballooning
should be safe.

vfio-ccw devices are always mediated devices and always operate under
the constraints above.  Therefore we can consider all vfio-ccw devices
as balloon compatible.

The situation is far from straightforward with vfio-pci.  These
devices can be physical devices with physical IOMMU backing or
mediated devices where it is unknown whether a physical IOMMU is in
use or whether the vendor driver is well synchronized to the working
set of the guest driver.  The safest approach is therefore to assume
all vfio-pci devices are incompatible with ballooning, but allow user
opt-in should they have further insight into mediated devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/ccw.c                 |    9 +++++++++
 hw/vfio/common.c              |   23 ++++++++++++++++++++++-
 hw/vfio/pci.c                 |   26 +++++++++++++++++++++++++-
 hw/vfio/trace-events          |    1 +
 include/hw/vfio/vfio-common.h |    2 ++
 5 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 351b305e1ae7..e96bbdc78b48 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -349,6 +349,15 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev,
         }
     }
 
+    /*
+     * All vfio-ccw devices are believed to operate in a way compatible with
+     * memory ballooning, ie. pages pinned in the host are in the current
+     * working set of the guest driver and therefore never overlap with pages
+     * available to the guest balloon driver.  This needs to be set before
+     * vfio_get_device() for vfio common to handle the balloon inhibitor.
+     */
+    vcdev->vdev.balloon_allowed = true;
+
     if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) {
         goto out_err;
     }
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a3d758af8d8f..cd1f4af18abb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1381,7 +1381,9 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
-    qemu_balloon_inhibit(false);
+    if (!group->balloon_allowed) {
+        qemu_balloon_inhibit(false);
+    }
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
@@ -1417,6 +1419,25 @@ int vfio_get_device(VFIOGroup *group, const char *name,
         return ret;
     }
 
+    /*
+     * Clear the balloon inhibitor for this group if the driver knows the
+     * device operates compatibly with ballooning.  Setting must be consistent
+     * per group, but since compatibility is really only possible with mdev
+     * currently, we expect singleton groups.
+     */
+    if (vbasedev->balloon_allowed != group->balloon_allowed) {
+        if (!QLIST_EMPTY(&group->device_list)) {
+            error_setg(errp,
+                       "Inconsistent device balloon setting within group");
+            return -1;
+        }
+
+        if (!group->balloon_allowed) {
+            group->balloon_allowed = true;
+            qemu_balloon_inhibit(false);
+        }
+    }
+
     vbasedev->fd = fd;
     vbasedev->group = group;
     QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6cbb8fa0549d..056f3a887a8f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2804,12 +2804,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
-    char *tmp, group_path[PATH_MAX], *group_name;
+    char *tmp, *subsys, group_path[PATH_MAX], *group_name;
     Error *err = NULL;
     ssize_t len;
     struct stat st;
     int groupid;
     int i, ret;
+    bool is_mdev;
 
     if (!vdev->vbasedev.sysfsdev) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -2869,6 +2870,27 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         }
     }
 
+    /*
+     * Mediated devices *might* operate compatibly with memory ballooning, but
+     * we cannot know for certain, it depends on whether the mdev vendor driver
+     * stays in sync with the active working set of the guest driver.  Prevent
+     * the x-balloon-allowed option unless this is minimally an mdev device.
+     */
+    tmp = g_strdup_printf("%s/subsystem", vdev->vbasedev.sysfsdev);
+    subsys = realpath(tmp, NULL);
+    g_free(tmp);
+    is_mdev = (strcmp(subsys, "/sys/bus/mdev") == 0);
+    free(subsys);
+
+    trace_vfio_mdev(vdev->vbasedev.name, is_mdev);
+
+    if (vdev->vbasedev.balloon_allowed && !is_mdev) {
+        error_setg(errp, "x-balloon-allowed only potentially compatible "
+                   "with mdev devices");
+        vfio_put_group(group);
+        goto error;
+    }
+
     ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev, errp);
     if (ret) {
         vfio_put_group(group);
@@ -3170,6 +3192,8 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
                     VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
     DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
+    DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
+                     vbasedev.balloon_allowed, false),
     DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false),
     DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false),
     DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index d2a74952e389..a85e8662eadb 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -39,6 +39,7 @@ vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %
 vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m"
 vfio_realize(const char *name, int group_id) " (%s) group %d"
+vfio_mdev(const char *name, bool is_mdev) " (%s) is_mdev %d"
 vfio_add_ext_cap_dropped(const char *name, uint16_t cap, uint16_t offset) "%s 0x%x@0x%x"
 vfio_pci_reset(const char *name) " (%s)"
 vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a9036929b220..15ea6c26fdbc 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -112,6 +112,7 @@ typedef struct VFIODevice {
     bool reset_works;
     bool needs_reset;
     bool no_mmap;
+    bool balloon_allowed;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
@@ -131,6 +132,7 @@ typedef struct VFIOGroup {
     QLIST_HEAD(, VFIODevice) device_list;
     QLIST_ENTRY(VFIOGroup) next;
     QLIST_ENTRY(VFIOGroup) container_next;
+    bool balloon_allowed;
 } VFIOGroup;
 
 typedef struct VFIODMABuf {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17
  2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
                   ` (3 preceding siblings ...)
  2018-08-17 17:08 ` [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
@ 2018-08-20  8:47 ` Peter Maydell
  4 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2018-08-20  8:47 UTC (permalink / raw)
  To: Alex Williamson; +Cc: QEMU Developers

On 17 August 2018 at 18:08, Alex Williamson <alex.williamson@redhat.com> wrote:
> The following changes since commit 0abaa41d936becd914a16ee1fe2a981d96d19428:
>
>   Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging (2018-08-17 09:46:00 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20180817.0
>
> for you to fetch changes up to 238e91728503d400e1c4e644e3a9b80f9e621682:
>
>   vfio/ccw/pci: Allow devices to opt-in for ballooning (2018-08-17 09:27:16 -0600)
>
> ----------------------------------------------------------------
> VFIO update 2018-08-17
>
>  - Enhance balloon inhibitor for multiple users and use around vfio
>    device assignment (Alex Williamson)
>

Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning
  2018-08-17 17:08 ` [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
@ 2018-08-20 14:23   ` Peter Maydell
  2018-08-20 14:55     ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2018-08-20 14:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: QEMU Developers

On 17 August 2018 at 18:08, Alex Williamson <alex.williamson@redhat.com> wrote:
> If a vfio assigned device makes use of a physical IOMMU, then memory
> ballooning is necessarily inhibited due to the page pinning, lack of
> page level granularity at the IOMMU, and sufficient notifiers to both
> remove the page on balloon inflation and add it back on deflation.
> However, not all devices are backed by a physical IOMMU.  In the case
> of mediated devices, if a vendor driver is well synchronized with the
> guest driver, such that only pages actively used by the guest driver
> are pinned by the host mdev vendor driver, then there should be no
> overlap between pages available for the balloon driver and pages
> actively in use by the device.  Under these conditions, ballooning
> should be safe.
>
> vfio-ccw devices are always mediated devices and always operate under
> the constraints above.  Therefore we can consider all vfio-ccw devices
> as balloon compatible.
>
> The situation is far from straightforward with vfio-pci.  These
> devices can be physical devices with physical IOMMU backing or
> mediated devices where it is unknown whether a physical IOMMU is in
> use or whether the vendor driver is well synchronized to the working
> set of the guest driver.  The safest approach is therefore to assume
> all vfio-pci devices are incompatible with ballooning, but allow user
> opt-in should they have further insight into mediated devices.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

> @@ -2869,6 +2870,27 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          }
>      }
>
> +    /*
> +     * Mediated devices *might* operate compatibly with memory ballooning, but
> +     * we cannot know for certain, it depends on whether the mdev vendor driver
> +     * stays in sync with the active working set of the guest driver.  Prevent
> +     * the x-balloon-allowed option unless this is minimally an mdev device.
> +     */
> +    tmp = g_strdup_printf("%s/subsystem", vdev->vbasedev.sysfsdev);
> +    subsys = realpath(tmp, NULL);
> +    g_free(tmp);
> +    is_mdev = (strcmp(subsys, "/sys/bus/mdev") == 0);
> +    free(subsys);

Hi -- Coverity points out that we aren't checking the return
value from realpath() here. It returns NULL on failure, in
which case we'll pass NULL into strcmp() and segfault.
(CID 1395101).

thanks
-- PMM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning
  2018-08-20 14:23   ` Peter Maydell
@ 2018-08-20 14:55     ` Alex Williamson
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Williamson @ 2018-08-20 14:55 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Mon, 20 Aug 2018 15:23:35 +0100
Peter Maydell <peter.maydell@linaro.org> wrote:

> On 17 August 2018 at 18:08, Alex Williamson <alex.williamson@redhat.com> wrote:
> > If a vfio assigned device makes use of a physical IOMMU, then memory
> > ballooning is necessarily inhibited due to the page pinning, lack of
> > page level granularity at the IOMMU, and sufficient notifiers to both
> > remove the page on balloon inflation and add it back on deflation.
> > However, not all devices are backed by a physical IOMMU.  In the case
> > of mediated devices, if a vendor driver is well synchronized with the
> > guest driver, such that only pages actively used by the guest driver
> > are pinned by the host mdev vendor driver, then there should be no
> > overlap between pages available for the balloon driver and pages
> > actively in use by the device.  Under these conditions, ballooning
> > should be safe.
> >
> > vfio-ccw devices are always mediated devices and always operate under
> > the constraints above.  Therefore we can consider all vfio-ccw devices
> > as balloon compatible.
> >
> > The situation is far from straightforward with vfio-pci.  These
> > devices can be physical devices with physical IOMMU backing or
> > mediated devices where it is unknown whether a physical IOMMU is in
> > use or whether the vendor driver is well synchronized to the working
> > set of the guest driver.  The safest approach is therefore to assume
> > all vfio-pci devices are incompatible with ballooning, but allow user
> > opt-in should they have further insight into mediated devices.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>  
> 
> > @@ -2869,6 +2870,27 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >          }
> >      }
> >
> > +    /*
> > +     * Mediated devices *might* operate compatibly with memory ballooning, but
> > +     * we cannot know for certain, it depends on whether the mdev vendor driver
> > +     * stays in sync with the active working set of the guest driver.  Prevent
> > +     * the x-balloon-allowed option unless this is minimally an mdev device.
> > +     */
> > +    tmp = g_strdup_printf("%s/subsystem", vdev->vbasedev.sysfsdev);
> > +    subsys = realpath(tmp, NULL);
> > +    g_free(tmp);
> > +    is_mdev = (strcmp(subsys, "/sys/bus/mdev") == 0);
> > +    free(subsys);  
> 
> Hi -- Coverity points out that we aren't checking the return
> value from realpath() here. It returns NULL on failure, in
> which case we'll pass NULL into strcmp() and segfault.
> (CID 1395101).

Thanks, patch posted.  The fix is trivial, you're welcome to pull it in
directly, otherwise I'll send a pull request if qemu-trivial doesn't
get to it first.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users
  2018-08-17 17:08 ` [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users Alex Williamson
@ 2018-08-22 15:56   ` Christian Borntraeger
  2018-08-22 17:47     ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Borntraeger @ 2018-08-22 15:56 UTC (permalink / raw)
  To: Alex Williamson, qemu-devel, qemu-s390x, Peter Maydell

This breaks qemu-io test for me.

#0  0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6
#1  0x000003ff98f239a8 in abort () at /lib64/libc.so.6
#2  0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6
#3  0x000003ff98f363ac in  () at /lib64/libc.so.6
#4  0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56
#5  0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542
#6  0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407
#7  0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116
#8  0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6

Cant see right now whats wrong.

On 08/17/2018 07:08 PM, Alex Williamson wrote:
> A simple true/false internal state does not allow multiple users.  Fix
> this within the existing interface by converting to a counter, so long
> as the counter is elevated, ballooning is inhibited.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  balloon.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index 6bf0a9681377..931987983858 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -26,6 +26,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
> +#include "qemu/atomic.h"
>  #include "exec/cpu-common.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/balloon.h"
> @@ -37,16 +38,22 @@
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
>  static void *balloon_opaque;
> -static bool balloon_inhibited;
> +static int balloon_inhibit_count;
>  
>  bool qemu_balloon_is_inhibited(void)
>  {
> -    return balloon_inhibited;
> +    return atomic_read(&balloon_inhibit_count) > 0;
>  }
>  
>  void qemu_balloon_inhibit(bool state)
>  {
> -    balloon_inhibited = state;
> +    if (state) {
> +        atomic_inc(&balloon_inhibit_count);
> +    } else {
> +        atomic_dec(&balloon_inhibit_count);
> +    }
> +
> +    assert(atomic_read(&balloon_inhibit_count) >= 0);
>  }
>  
>  static bool have_balloon(Error **errp)
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users
  2018-08-22 15:56   ` Christian Borntraeger
@ 2018-08-22 17:47     ` Alex Williamson
  2018-08-22 19:01       ` Alex Williamson
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2018-08-22 17:47 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: qemu-devel, qemu-s390x, Peter Maydell

On Wed, 22 Aug 2018 17:56:12 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> This breaks qemu-io test for me.
> 
> #0  0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6
> #1  0x000003ff98f239a8 in abort () at /lib64/libc.so.6
> #2  0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6
> #3  0x000003ff98f363ac in  () at /lib64/libc.so.6
> #4  0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56
> #5  0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542
> #6  0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407
> #7  0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116
> #8  0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6
> 
> Cant see right now whats wrong.

I'd have to guess it's the assert added to verify increments and
decrements are balanced, which suggests postcopy is de-inhibiting more
than it's inhibiting :-\  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users
  2018-08-22 17:47     ` Alex Williamson
@ 2018-08-22 19:01       ` Alex Williamson
  2018-08-23  8:10         ` Cornelia Huck
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Williamson @ 2018-08-22 19:01 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: qemu-devel, qemu-s390x, Peter Maydell, Dr. David Alan Gilbert

On Wed, 22 Aug 2018 11:47:09 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 22 Aug 2018 17:56:12 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
> > This breaks qemu-io test for me.
> > 
> > #0  0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6
> > #1  0x000003ff98f239a8 in abort () at /lib64/libc.so.6
> > #2  0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6
> > #3  0x000003ff98f363ac in  () at /lib64/libc.so.6
> > #4  0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56
> > #5  0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542
> > #6  0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407
> > #7  0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116
> > #8  0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6
> > 
> > Cant see right now whats wrong.  
> 
> I'd have to guess it's the assert added to verify increments and
> decrements are balanced, which suggests postcopy is de-inhibiting more
> than it's inhibiting :-\  Thanks,

When postcopy inhibits ballooning, it does so from the migration stream
at loaddvm_process_command().  If the command is
MIG_CMD_POSTCOPY_LISTEN, this calls loadvm_postcopy_handle_listen().
If postcopy RAM is enabled, this calls postcopy_ram_enable_notify(),
which is where the balloon inhibitor is triggered.  I think the
postcopy_ram_listen_thread, which is created at the end of
loadvm_postcopy_handle_listen(), is intended to be where the inhibit
would be lifted, but I note a couple error paths before that where the
inhibit is not released.

postcopy_ram_listen_thread() calls postcopy_ram_incoming_cleanup(),
which always releases the balloon inhibit, regardless of the postcopy
RAM capability, and of course there are some error cases in advance of
this that would again leave the inhibit in place.

We also have the coroutine process_incoming_migration_co, launched from
migration_incoming_process() which seems to do a gratuitous call to
postcopy_ram_incoming_cleanup() and will release the inhibit regardless
of whether it was activated previous.

So apparently I assumed too much about the existing balloon inhibitor
use case, it's clearly not as general as the interface would lead one
to believe.  The immediate solution is probably to make a postcopy
specific wrapper around qemu_balloon_inhibit() that keeps a static
internal state to avoid over incrementing or decrementing.  That should
resolve this and then we can move on to whether we want to change the
overall internal interface or fixup some of the other cases noted
above.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users
  2018-08-22 19:01       ` Alex Williamson
@ 2018-08-23  8:10         ` Cornelia Huck
  0 siblings, 0 replies; 12+ messages in thread
From: Cornelia Huck @ 2018-08-23  8:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Christian Borntraeger, Peter Maydell, qemu-s390x, qemu-devel,
	Dr. David Alan Gilbert

On Wed, 22 Aug 2018 13:01:39 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> So apparently I assumed too much about the existing balloon inhibitor
> use case, it's clearly not as general as the interface would lead one
> to believe.  The immediate solution is probably to make a postcopy
> specific wrapper around qemu_balloon_inhibit() that keeps a static
> internal state to avoid over incrementing or decrementing.  That should
> resolve this and then we can move on to whether we want to change the
> overall internal interface or fixup some of the other cases noted
> above.  Thanks,
> 
> Alex
> 

My gut feeling is that an interface that allows code to add
yet-another-inhibitor and to remove it again (IOW, what this patch
implemented) is the way to go and that we should push out any checking
whether a inhibitor actually can be removed to the callers.

So, I think it actually makes a lot of sense for the postcopy code to
keep track whether it actually submitted 'its' inhibit and only call
into the generic inhibitor code to release it if that's the case.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-23  8:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 17:08 [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Alex Williamson
2018-08-17 17:08 ` [Qemu-devel] [PULL 1/4] balloon: Allow multiple inhibit users Alex Williamson
2018-08-22 15:56   ` Christian Borntraeger
2018-08-22 17:47     ` Alex Williamson
2018-08-22 19:01       ` Alex Williamson
2018-08-23  8:10         ` Cornelia Huck
2018-08-17 17:08 ` [Qemu-devel] [PULL 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu Alex Williamson
2018-08-17 17:08 ` [Qemu-devel] [PULL 3/4] vfio: Inhibit ballooning based on group attachment to a container Alex Williamson
2018-08-17 17:08 ` [Qemu-devel] [PULL 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
2018-08-20 14:23   ` Peter Maydell
2018-08-20 14:55     ` Alex Williamson
2018-08-20  8:47 ` [Qemu-devel] [PULL 0/4] VFIO update 2018-08-17 Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.