Re: [RFC PATCH for-QEMU-5.2] vfio: Make migration support experimental

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Neo Jia" <cjia@nvidia.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	qemu-devel@nongnu.org, "Kirti Wankhede" <kwankhede@nvidia.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>
Subject: Re: [RFC PATCH for-QEMU-5.2] vfio: Make migration support experimental
Date: Mon, 9 Nov 2020 19:44:17 +0000	[thread overview]
Message-ID: <20201109194417.GR3024@work-vm> (raw)
In-Reply-To: <160494787833.1473.10514376876696596117.stgit@gimli.home>

* Alex Williamson (alex.williamson@redhat.com) wrote:
> Per the proposed documentation for vfio device migration:
> 
>   Dirty pages are tracked when device is in stop-and-copy phase
>   because if pages are marked dirty during pre-copy phase and
>   content is transfered from source to destination, there is no
>   way to know newly dirtied pages from the point they were copied
>   earlier until device stops. To avoid repeated copy of same
>   content, pinned pages are marked dirty only during
>   stop-and-copy phase.
> 
> Essentially, since we don't have hardware dirty page tracking for
> assigned devices at this point, we consider any page that is pinned
> by an mdev vendor driver or pinned and mapped through the IOMMU to
> be perpetually dirty.  In the worst case, this may result in all of
> guest memory being considered dirty during every iteration of live
> migration.  The current vfio implementation of migration has chosen
> to mask device dirtied pages until the final stages of migration in
> order to avoid this worst case scenario.
> 
> Allowing the device to implement a policy decision to prioritize
> reduced migration data like this jeopardizes QEMU's overall ability
> to implement any degree of service level guarantees during migration.
> For example, any estimates towards achieving acceptable downtime
> margins cannot be trusted when such a device is present.  The vfio
> device should participate in dirty page tracking to the best of its
> ability throughout migration, even if that means the dirty footprint
> of the device impedes migration progress, allowing both QEMU and
> higher level management tools to decide whether to continue the
> migration or abort due to failure to achieve the desired behavior.

I don't feel particularly badly about the decision to squash it in
during the stop-and-copy phase; for devices where the pinned memory
is large, I don't think doing it during the main phase makes much sense;
especially if you then have to deal with tracking changes in pinning.

Having said that, I agree with marking it as experimental, because
I'm dubious how useful it will be for the same reason, I worry
about whether the downtime will be so large to make it pointless.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Link: https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg00807.html
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Cc: Neo Jia <cjia@nvidia.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Philippe Mathieu-DaudÃ© <philmd@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> Given that our discussion in the link above seems to be going in
> circles, I'm afraid it seems necessary to both have a contigency
> plan and to raise the visibility of the current behavior to
> determine whether others agree that this is a sufficiently
> troubling behavior to consider migration support experimental
> at this stage.  Please voice your opinion or contribute patches
> to resolve this before QEMU 5.2.  Thanks,
> 
> Alex
> 
>  hw/vfio/migration.c           |    2 +-
>  hw/vfio/pci.c                 |    2 ++
>  include/hw/vfio/vfio-common.h |    1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3ce285ea395d..cd44d465a50b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -882,7 +882,7 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>      Error *local_err = NULL;
>      int ret = -ENOTSUP;
>  
> -    if (!container->dirty_pages_supported) {
> +    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
>          goto add_blocker;
>      }
>  
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 58c0ce8971e3..1349b900e513 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3194,6 +3194,8 @@ static Property vfio_pci_dev_properties[] = {
>                      VFIO_FEATURE_ENABLE_REQ_BIT, true),
>      DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>                      VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
> +    DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice,
> +                     vbasedev.enable_migration, false),
>      DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>      DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>                       vbasedev.ram_block_discard_allowed, false),
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index baeb4dcff102..2119872c8af1 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -123,6 +123,7 @@ typedef struct VFIODevice {
>      bool needs_reset;
>      bool no_mmap;
>      bool ram_block_discard_allowed;
> +    bool enable_migration;
>      VFIODeviceOps *ops;
>      unsigned int num_irqs;
>      unsigned int num_regions;
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK