VFIO Migration

* VFIO Migration
@ 2020-11-02 11:11 Stefan Hajnoczi
  2020-11-02 12:28 ` Cornelia Huck
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2020-11-02 11:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: John G Johnson, mtsirkin, Daniel P. Berrangé,
	quintela, Alex Williamson, qemu-devel, Kirti Wankhede,
	Thanos Makatos, Felipe Franciosi, Paolo Bonzini,
	Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 9334 bytes --]

There is discussion about VFIO migration in the "Re: Out-of-Process
Device Emulation session at KVM Forum 2020" thread. The current status
is that Kirti proposed a VFIO device region type for saving and loading
device state. There is currently no guidance on migrating between
different device versions or device implementations from different
vendors. This is known to be non-trivial and raised discussion about
whether it should really be handled by VFIO or centralized in QEMU.

Below is a document that describes how to ensure migration compatibility
in VFIO. It does not require changes to the VFIO migration interface. It
can be used for both VFIO/mdev kernel devices and vfio-user devices.

The idea is that the device state blob is opaque to the VMM but the same
level of migration compatibility that exists today is still available.

I hope this will help us reach consensus and let us discuss specifics.

If you followed the previous discussion, I changed the approach from
sending a magic constant in the device state blob to identifying device
models by URIs. Therefore the device state structure does not need to be
defined here - the critical information for ensuring device migration
compatibility is the device model and configuration defined below.

Stefan
---
VFIO Migration
==============
This document describes how to save and load VFIO device states. Saving a
device state produces a snapshot of a VFIO device's state that can be loaded
again at a later point in time to resume the device from the snapshot.

The data representation of the device state is outside the scope of this
document.

Overview
--------
The purpose of device states is to save the device at a point in time and then
restore the device back to the saved state later. This is more challenging than
it first appears.

The process of saving a device state and loading it later is called
*migration*. The state may be loaded by the same device that saved it or by a
new instance of the device, possibly running on a different computer.

It must be possible to migrate to a newer implementation of the device
as well as to an older implementation of the device. This allows users
to upgrade and roll back their systems.

Migration can fail if loading the device state is not possible. It should fail
early with a clear error message. It must not appear to complete but leave the
device inoperable due to a migration problem.

The rest of this document describes how these requirements can be met.

Device Models
-------------
Devices have a *hardware interface* consisting of hardware registers,
interrupts, and so on.

The hardware interface together with the device state representation is called
a *device model*. Device models can be assigned URIs such as
https://qemu.org/devices/e1000e to uniquely identify them.

Multiple implementations of a device model may exist. They are they are
interchangeable if they follow the same hardware interface and device
state representation.

Multiple implementations of the same hardware interface may exist with
different device state representations, in which case the device models are not
interchangeable and must be assigned different URIs.

Migration is only possible when the same device model is supported by the
*source* and the *destination* devices.

Device Configuration
--------------------
Device models may have parameters that affect the hardware interface or device
state representation. For example, a network card may have a configurable
address filtering table size parameter called ``rx-filter-size``. A
device state saved with ``rx-filter-size=32`` cannot be safely loaded
into a device with ``rx-filter-size=0``, because changing the size from
32 to 0 may disrupt device operation.

A list of configuration parameters is called the *device configuration*.
Migration is expected to succeed when the same device model and configuration
that was used for saving the device state is used again to load it.

Note that not all parameters used to instantiate a device need to be part of
the device configuration. For example, assigning a network card to a specific
physical port is not part of the device configuration since it is not part of
the device's hardware interface or the device state representation. The device
state can be loaded and run on a different physical port without affecting the
operation of the device. Therefore the physical port is not part of the device
configuration.

However, secondary aspects related to the physical port may affect the device's
hardware interface and need to be reflected in the device configuration. The
link speed may depend on the physical port and be reported through the device's
hardware interface. In that case a ``link-speed`` configuration parameter is
required to prevent unexpected changes to the link speed after migration.

Note that the device configuration is a conservative bound on device
states that can be migrated successfully since not all configuration
parameters may be strictly required to match on the source and
destination devices. For example, if the device's hardware interface has
not yet been initialized then changes to the link speed may not be
noticed. However, accurately representing runtime constraints is complex
and risks introducing migration bugs, so no attempt is made to support
them to achieve more relaxed bounds on successful migrations.

Device Versions
---------------
As a device evolves, the number of configuration parameters required may become
inconvenient for users to express in full. A device configuration can be
aliased by a *device version*, which is a shorthand for the full device
configuration. This makes it easy to apply a standard device configuration
without listing every configuration parameter explicitly.

For example, if address filtering support was added to a network card then
device versions and the corresponding configurations may look like this:
* ``version=1`` - Behaves as if ``rx-filter-size=0``
* ``version=2`` - ``rx-filter-size=32``

Device States
-------------
The details of the device state representation are not covered in this document
but the general requirements are discussed here.

The device state consists of data accessible through the device's hardware
interface and internal state that is needed to restore device operation.
State in the hardware interface includes the values of hardware registers.
An example of internal state is an index value needed to avoid processing
queued requests more than once.

Changes can be made to the device state representation as follows. Each change
to device state must have a corresponding device configuration parameter that
allows the change to toggled:

* When the parameter is disabled the hardware interface and device state
  representation are unchanged. This allows old device states to be loaded.

* When the parameter is enabled the change comes into effect.

* The parameter's default value disables the change. Therefore old versions do
  not have to explicitly specify the parameter.

The following example illustrates migration from an old device
implementation to a new one. A version=1 network card is migrated to a
new device implementation that is also capable of version=2 and adds the
rx-filter-size=32 parameter. The new device is instantiated with
version=1, which disables rx-filter-size and is capable of loading the
version=1 device state. The migration completes successfully but note
the device is still operating at version=1 level in the new device.

The following example illustrates migration from a new device
implementation back to an older one. The new device implementation
supports version=1 and version=2. The old device implementation supports
version=1 only. Therefore the device can only be migrated when
instantiated with version=1 or the equivalent full configuration
parameters.

Orchestrating Migrations
------------------------
The following steps must be followed to migrate devices:

1. Check that the source and destination devices support the same device model.

2. Check that the destination device supports the source device's
   configuration. Each configuration parameter must be accepted by the
   destination in order to ensure that it will be possible to load the device
   state.

3. The device state is saved on the source and loaded on the destination.

4. If migration succeeds then the destination resumes operation and the source
   must not resume operation. If the migration fails then the source resumes
   operation and the destination must not resume operation.

VFIO Implementation
-------------------
The following applies both to kernel VFIO/mdev drivers and vfio-user device
backends.

Devices are instantiated based on a version and/or configuration parameters:
* ``version=1`` - use the device configuration aliased by version 1
* ``version=2,rx-filter-size=64`` - use version 1 and override ``rx-filter-size``
* ``rx-filter-size=0`` - directly set configuration parameters without using a version

Device creation fails if the version and/or configuration parameters are not
supported.

There must be a mechanism to query the "latest" configuration for a device
model. It may simply report the ``version=5`` where 5 is the latest version but
it could also report all configuration parameters instead of using a version
alias.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread