All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3] VFIO Migration
@ 2020-11-10  9:53 Stefan Hajnoczi
  2020-11-10 11:12 ` Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-10  9:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, Dr. David Alan Gilbert,
	Yan Zhao, Kirti Wankhede, Paolo Bonzini, Alex Williamson,
	Gerd Hoffmann, Felipe Franciosi, Christophe de Dinechin,
	Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 20183 bytes --]

v3:
 * Introduce migration info JSON to describe migration parameters
 * Rework mdev sysfs interface
 * Propose standard interface for vfio-user device emulation programs

VFIO Migration
==============
This document describes how to ensure migration compatibility for VFIO devices,
including mdev and vfio-user devices.

Overview
--------
VFIO devices can save and load a *device state*. Saving a device state produces
a snapshot of a VFIO device that can be loaded again at a later point in time
to resume the device from the snapshot.

The process of saving a device state and loading it later is called
*migration*. The device state may be loaded by the same device instance that
saved it or by a new instance, possibly running on a different machine.

A VFIO/mdev driver together with the physical device provides the functionality
of a device. Alternatively, a vfio-user device emulation program can provide
the functionality of a device. These are called *device implementations*.

The device implementation where a migration originates is called the *source*
and the device implementation that a migration targets is called the
*destination*.

Although it is possible to migrate device state without regard to migration
compatibility, this is prone to failure except in trivial cases. Device
implementations vary in feature availability and resource capacity so that it
is difficult to be confident that a migration of a complex device will succeed.
Furthermore, without migration compatibility checking it is possible that
migration appears to succeed but leaves the device in an inoperable state,
leading to data loss or corruption.

This document describes how to establish migration compatibility between the
source and destination. A check can be performed before migrating and can
therefore be used to select a suitable migration destination. When
compatibility has been established, the probability of migrating successfully
is high and a successful migration does not leave the device inoperable due to
silent migration problems.

Migration Parameters
--------------------
*Migration parameters* are used to describe characteristics that must match
between source and destination to achieve migration compatibility.

The first implementation of a simple device may not require migration
parameters if the source and destination are always compatible. As the device
evolves, the source and destination may differ and migration parameters are
required to express these differences. More complex devices may require
migration parameters from the start due to optional functionality that is not
guaranteed to be present in both source and destination.

A migration parameter consists of a name and a value. The name is a UTF-8
string that does not contain equals ('='), backslash ('/'), or whitespace
characters. The value is a UTF-8 string that does not contain newline
characters ('\n').

The meaning of the migration parameter and its possible values are specific to
the device, but values are based on one of the following types:
* bool - booleans (on/off)
* int - integers (0, 1, 2, ...)
* str - character strings

Migration parameters are represented as <name>=<value> in this document.
Examples include my-feature=on and num-queues=4.

When a new migration parameter is introduced, its absence must have the same
effect as before the migration parameter was introduced. For example, if
my-feature=on|off is added to control the availability of a new device feature,
then my-feature=off is equivalent to omitting the migration parameter.

Hardware Interface Compatibility
--------------------------------
VFIO devices have a *hardware interface* consisting of device regions and
interrupts. Aspects of the hardware interface can vary between device
implementations and require migration parameters to express migration
compatibility requirements.

Examples of migration parameters include:
* Feature availability - feature bitmasks, hardware revision numbers, etc. If
  the destination may lack support for optional features or hardware interface
  revisions, then migration parameters are required.
* Functionality - hardware register blocks that are only present on certain
  device instances. If there are multiple devices sub-models that have
  different hardware interfaces then migration parameters are required.
* Resource capacity - size of display framebuffers, number of queues, queue
  size, etc.

These examples demonstrate aspects of the hardware interface that must not
change unexpectedly. Were they to differ between source and destination, the
chance of device driver malfunction would be high because the layout of the
hardware interface would change or assumptions the device driver makes about
available functionality would be violated. Migration parameters are used to
preserve the hardware interface across migration and explicitly represent
variations between device implementations.

Hardware interfaces sometimes support reporting an event when a change occurs.
In those cases it may be possible to support visible changes in the hardware
interface across migration. In most other cases migration must not result in a
visible change in the hardware interface.

Migration parameters are not necessary for read-only values exposed through the
hardware interface, such as MAC address EEPROMs or serial numbers, so long as
all device implementations can be configured with the same range of input
values for these read-only values. This is possible because migration
parameters do not capture the full configuration of the device, only aspects
that affect migration compatibility.

Device configuration that is not visible through the hardware interface, such
as a host file system path of a disk image file or the physical network port
assigned to a network card, usually does not require migration parameters
because they can be changed without breaking migration compatibility.

The disk image file may indirectly affect the hardware interface, for example
by constraining the device's block size to a specific value. In this case a
block-size=N migration parameter is required to ensure migration compatibility,
but the host file system path of the disk image file still does not require a
migration parameter.

Device State Representation
---------------------------
Device state contains both data accessible through the device's hardware
interface and device-internal state needed to restore device operation.

The contents of hardware registers are usually included in the device state if
they can change at runtime. Hardware registers with constant or computed data
may not need to be part of the device state provided that device
implementations can produce the necessary data.

Device-internal state includes the portion of the device's state that cannot be
reconstructed from the hardware interface alone. Defining device-internal state
in the most general way instead of exposing device implementation details
allows for flexibility in the future. For example, device implementations often
maintain a ring index, which is not available through the hardware interface,
to keep track of which ring elements have already been consumed. The ring index
must be included in the device state so that the destination can resume
processing from the correct point in the ring. Representing this as an index
into the ring in the hardware interface is more general than adding device
implementation-specific request tracking data structures into the device state.

The *device state representation* defines the binary data layout of the device
state. The device state representation is specific to each device and is beyond
the scope of this document, but aspects pertaining to migration compatibility
are discussed here.

Each change to the device state representation that affects migration
compatibility requires a migration parameter. When a new field is added to the
device state representation then a new migration parameter must be added to
reflect this change. Often a single migration parameter expresses both a change
to the hardware interface and the device state representation. It is also
possible to change the device state representation without changing the
hardware interface, for example when some state was forgotten while designing
the previous device state representation.

The device state representation may support adding extra data that can be
safely ignored by old device implementations. In this case migration
compatibility is unaffected and a migration parameter is not required to
indicate such extra data has been added.

Device Models
-------------
The combination of the hardware interface, device state representation, and
migration parameter definitions is called a *device model*. Device models are
identified by a unique UTF-8 string starting with a domain name and followed by
path components separated with backslashes ('/'). Examples include
vendor-a.com/my-nic, gitlab.com/user/my-device, virtio-spec.org/pci/virtio-net,
and qemu.org/pci/10ec/8139.

The unique device model string is not changed as the device evolves. Instead,
migration parameters are added to express variations in a device.

The device model is not tied to a specific device implementation. The same
device model could be implemented as a VFIO/dev driver or as a vfio-user device
emulation program.

Multiple device implementations can support the same device model. Doing so
means that the device implementations can offer migration compatiblity because
they support the same hardware interface, device state representation, and
migration parameters.

Multiple device models can exist for the same hardware interface, each with a
different device state representation and migration parameters. This makes it
possible to fork and independently develop device models.

Orchestrating Migrations
------------------------
In order to migrate a device a *migration parameter list* must first be built
on the source. Each migration parameter is added to the list if it is in
effect. For example, the migration parameter list for a device with
new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
parameter was introduced with the off value disabling its effect.

The following conditions must be met to establish migration compatibility:

1. The source and destination device model strings match.

2. Each migration parameter name from the migration parameter list is supported
   by the destination. For example, the destination supports the num-queues
   migration parameter.

3. Each migration parameter value from the migration parameter list is
   supported by the destination. For example, the destination supports
   num-queues=4.

The migration compatibility check can be performed without initiating a
migration. Therefore, this process can be used to select the migration
destination.

The following steps perform the migration:

1. Configure the destination so it is prepared to load the device state,
   including applying the migration parameter list. This may involve
   instantiating a new device instance or resetting an existing device instance
   to a configuration that is compatible with the source.

   The details of how to do this for VFIO/mdev drivers and vfio-user device
   backend programs is described below.

2. Save the device state on the source and load it on the destination.

3. If migration succeeds then the destination resumes operation and the source
   must not resume operation. If the migration fails then the source resumes
   operation and the destination must not resume operation.

Note that these steps impose a conservative bound on device states that can be
migrated successfully. Not all configuration parameters may be strictly
required to match on the source and destination devices. For example, if the
device's hardware interface has not yet been initialized then changes to the
advertised features may not yet affect the device driver. However, accurately
representing runtime constraints is complex and risks introducing migration
bugs, so no attempt is made to support them.

Migration Information JSON
--------------------------
Device implementations describe supported device models in the following JSON
format:

.. code:: json

  {
    "models": {
      "<model>": {
        "params": {
          "<param>": {
            "allowed_values": [<value1>, <value2>, ...]
            "description": ...
            "init_value": ...
            "off_value": ...
            "type": ...
          }
        }
      }
    }
  }

The "models" object contains one or more device model objects describing
available device models. Each member name is a unique device model string, for
example "vendor-a.com/my-nic".

The device model object contains a "params" object describes available
migration parameters. Each migration parameter object contains the following
members:

"allowed_values"
  The list all values that the device implementation accepts for this migration
  parameter. Integer ranges can be described using "<min>-<max>" strings.

  Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]

  This member is optional. When absent, any value suitable for the type may be
  given but the device implementation may refuse certain values.

"description"
  A human-readable description of the migration parameter. This is not intended
  for user interfaces but rather as a troubleshooting aid for developers. The
  description is typically written in English. This member is optional.

"init_value"
  The initial parameter value when a device instance is created. This member is
  required.

"off_value"
  The parameter value that disables the effect of this parameter. This member
  is absent if the migration parameter cannot be disabled.

"type"
  The data type ("bool", "int", "str"). This member is required.

An example of a simple device model that only one feature and a fixed resource
capacity:

.. code:: json

  {
    "models": {
      "vendor-a.com/my-nic": {
        "params": {
          "new-feature": {
            "description": "New feature that old devices lack",
            "init_value": true,
            "off_value": false,
            "type": 'bool'
          },
          "num-resources": {
            "allowed_values": [64],
            "description": "Number of resources",
            "init_value": 64,
            "type": 'int'
          }
        }
      }
    }
  }

Newly created instances of this device will enable "new-feature", but it can be
disabled for migration compatibility with old device instances.

The number of device resources are fixed at 64, so only device instances that
also have exactly 64 resources can be migrated to this device implementation.

VFIO mdev Drivers
-----------------
The following mdev type sysfs attrs are available for managing device
instances::

  /sys/.../<parent-device>/mdev_supported_types/<type-id>/
    create - writing a UUID to this file instantiates a device
    migration_info.json - read-only migration information JSON

TODO The JSON can be represented as a file system hierarchy but sysfs seems
limited to <kobject>/<group>/<attr> and <kobject>/<attr> so it is not possible
to express deeper attr groups like <kobject>/migration/params/<param>/<attr>?

Device models supported by an mdev driver and their details can be read from
the migration_info.json attr. Each mdev type supports one device model. If a
parent device supports multiple device models then each device model has an
mdev type. There may be multiple mdev types for a single device model when they
offer different migration parameters such as resource capacity or feature
availability.

For example, a graphics card that supports 4 GB and 8 GB device instances would
provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
migration parameters, respectively.

The following mdev device sysfs attrs relate to a specific device instance::

  /sys/.../<parent-device>/<uuid>/
    mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch migration/model
    migration/ - migration related files
      <param> - read/write migration parameter "param"
      ...

When the device is created all migration/<param> attrs take their
migration_info.json "init_value".

When preparing for migration on the source, each migration parameter from
migration/<param> is read and added to the migration parameter list if its
value differs from "off_value" in migration_info.json. If a migration parameter
in the list is not available on the destination, then migration is not
possible. If a migration parameter value is not in the destination
"allowed_values" migration_info.json then migration is not possible.

In order to prepare an mdev device instance for an incoming migration on the
destination, the "off_value" from migration_info.json is written to each
migration parameter in migration/<param>. Then the migration parameter list
from the source is written to migration/<param> one migration parameter at a
time. If an error occurs while writing a migration parameter on the destination
then migration is not possible. Once the migration parameter list has been
written the mdev can be opened and migration can proceed.

An open mdev device typically does not allow migration parameters to be changed
at runtime. However, certain migration/params attrs may allow writes at
runtime. Usually these migration parameters only affect the device state
representation and not the hardware interface. This makes it possible to
upgrade or downgrade the device state representation at runtime so that
migration is possible to newer or older device implementations.

vfio-user Device Emulation Programs
-----------------------------------
Device emulation programs often support a simple invocation model where running
the program creates a single device instance. The lifecycle of the device
instance is tied to the lifecycle of the process. Such device emulation
programs are described below.

More complex device emulation programs may host multiple devices. The interface
for configuring these device emulation programs is not standardized. Therefore,
migrating these devices is beyond the scope of this document.

The migration information JSON is printed to standard output by a vfio-user
device emulation program as follows:

.. code:: bash

  $ my-device --print-migration-info-json

The device is instantiated by launching the destination process with the
migration parameter list from the source:

.. code:: bash

  $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]

This example shows how to instantiate the device with migration parameters
``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
<value>`` option formats are accepted.

The ``--m-`` prefix is used to allow the device emulation program to implement
device implementation-specific command-line options without conflicting with
the migration parameter namespace.

When preparing for migration on the source, each migration parameter from the
migration info JSON is added to the migration parameter list if its value
differs from "off_value". If a migration parameter in the list is not available
on the destination, then migration is not possible. If a migration parameter
value is not in the destination "allowed_values" migration_info.json then
migration is not possible.

On the destination, a command-line is generated from the migration parameter
list. For each destination migration parameter missing from the migration
parameter list a command-line option is added with the destination "off_value".
The device emulation program prints an error message to standard error and
terminates with exit status 1 if the device could not be instantiated.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
@ 2020-11-10 11:12 ` Paolo Bonzini
  2020-11-11 14:36   ` Stefan Hajnoczi
  2020-11-10 20:14 ` Alex Williamson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Paolo Bonzini @ 2020-11-10 11:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, Dr. David Alan Gilbert,
	Yan Zhao, Kirti Wankhede, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

On 10/11/20 10:53, Stefan Hajnoczi wrote:
> "allowed_values"
>    The list all values that the device implementation accepts for this migration
>    parameter. Integer ranges can be described using "<min>-<max>" strings.
> 
>    Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> 
>    This member is optional. When absent, any value suitable for the type may be
>    given but the device implementation may refuse certain values.

I'd rather make this simpler:

- remove allowed_values for strings.  Effect: discourages using strings 
as enums, leaving them only for free-form values such as vendor name or 
model name.

- remove allowed_values for bools.  If off_value is absent the only 
allowed value is init_value.  If off_value is present, both true and 
false are allowed (and !off_value is the "on_value", so to speak).

- change allowed_values into allowed_min and allowed_max for int values. 
  Advantage: avoids having to parse strings as ranges.  Disadvantage: 
removes expressiveness (cannot say "x must be a power of two"), but I'm 
not sure it's worth the extra complication.

Thanks,

Paolo

> "description"
>    A human-readable description of the migration parameter. This is not intended
>    for user interfaces but rather as a troubleshooting aid for developers. The
>    description is typically written in English. This member is optional.
> 
> "init_value"
>    The initial parameter value when a device instance is created. This member is
>    required.
> 
> "off_value"
>    The parameter value that disables the effect of this parameter. This member
>    is absent if the migration parameter cannot be disabled.
> 
> "type"
>    The data type ("bool", "int", "str"). This member is required.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
  2020-11-10 11:12 ` Paolo Bonzini
@ 2020-11-10 20:14 ` Alex Williamson
  2020-11-11 11:48   ` Cornelia Huck
  2020-11-11 15:10   ` Stefan Hajnoczi
  2020-11-11 11:19 ` Cornelia Huck
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 38+ messages in thread
From: Alex Williamson @ 2020-11-10 20:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng,  Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Gerd Hoffmann, Felipe Franciosi, Christophe de Dinechin,
	Thanos Makatos

On Tue, 10 Nov 2020 09:53:49 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:
> VFIO mdev Drivers
> -----------------
> The following mdev type sysfs attrs are available for managing device
> instances::
> 
>   /sys/.../<parent-device>/mdev_supported_types/<type-id>/
>     create - writing a UUID to this file instantiates a device
>     migration_info.json - read-only migration information JSON
> 
> TODO The JSON can be represented as a file system hierarchy but sysfs seems
> limited to <kobject>/<group>/<attr> and <kobject>/<attr> so it is not possible
> to express deeper attr groups like <kobject>/migration/params/<param>/<attr>?


Complex structured formats have been proposed in other threads related
to migration compatibility and generally been dismissed as not adhering
to the standards of sysfs per:

Documentation/filesystems/sysfs.rst:
---
Attributes
~~~~~~~~~~

Attributes can be exported for kobjects in the form of regular files in
the filesystem. Sysfs forwards file I/O operations to methods defined
for the attributes, providing a means to read and write kernel
attributes.

Attributes should be ASCII text files, preferably with only one value
per file. It is noted that it may not be efficient to contain only one
value per file, so it is socially acceptable to express an array of
values of the same type.

Mixing types, expressing multiple lines of data, and doing fancy
formatting of data is heavily frowned upon. Doing these things may get
you publicly humiliated and your code rewritten without notice.
---

We'd either need to address your TODO and create a hierarchical
representation or find another means to exchange this format.


> Device models supported by an mdev driver and their details can be read from
> the migration_info.json attr. Each mdev type supports one device model. If a
> parent device supports multiple device models then each device model has an
> mdev type. There may be multiple mdev types for a single device model when they
> offer different migration parameters such as resource capacity or feature
> availability.
> 
> For example, a graphics card that supports 4 GB and 8 GB device instances would
> provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> migration parameters, respectively.


I think this example could be expanded for clarity.  I think this is
suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
implement some common device model, ie. com.gfx/GPU, where the
migration parameter 'memory' for each defaults to a value matching the
type name.  But it seems like this can also lead to some combinatorial
challenges for management tools if these parameters are writable.  For
example, should a management tool create a gfx-4GB device and change to
memory parameter to 8192 or a gfx-8GB device with the default parameter?


> The following mdev device sysfs attrs relate to a specific device instance::
> 
>   /sys/.../<parent-device>/<uuid>/
>     mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch migration/model


We need a mechanism that translates to non-mdev vfio devices as well,
the device "model" creates a clean separation from an mdev-type, we
shouldn't reintroduce that dependency here.


>     migration/ - migration related files
>       <param> - read/write migration parameter "param"
>       ...
> 
> When the device is created all migration/<param> attrs take their
> migration_info.json "init_value".
> 
> When preparing for migration on the source, each migration parameter from
> migration/<param> is read and added to the migration parameter list if its
> value differs from "off_value" in migration_info.json. If a migration parameter
> in the list is not available on the destination, then migration is not
> possible. If a migration parameter value is not in the destination
> "allowed_values" migration_info.json then migration is not possible.
> 
> In order to prepare an mdev device instance for an incoming migration on the
> destination, the "off_value" from migration_info.json is written to each
> migration parameter in migration/<param>. Then the migration parameter list
> from the source is written to migration/<param> one migration parameter at a
> time. If an error occurs while writing a migration parameter on the destination
> then migration is not possible. Once the migration parameter list has been
> written the mdev can be opened and migration can proceed.


What's the logic behind setting the value twice?  If we have a
preconfigured pool of devices where the off_value might use less
resources, we risk that resources might be consumed elsewhere if we
release them and try to get them back.  It also seems rather
inefficient.

 
> An open mdev device typically does not allow migration parameters to be changed
> at runtime. However, certain migration/params attrs may allow writes at
> runtime. Usually these migration parameters only affect the device state
> representation and not the hardware interface. This makes it possible to
> upgrade or downgrade the device state representation at runtime so that
> migration is possible to newer or older device implementations.


Which begs the question of how we'd determine which can be modified
runtime...  Thanks,

Alex



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
  2020-11-10 11:12 ` Paolo Bonzini
  2020-11-10 20:14 ` Alex Williamson
@ 2020-11-11 11:19 ` Cornelia Huck
  2020-11-11 15:35   ` Stefan Hajnoczi
  2020-11-11 12:56 ` Dr. David Alan Gilbert
  2020-11-11 16:18 ` Thanos Makatos
  4 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2020-11-11 11:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Kirti Wankhede, Thanos Makatos,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

On Tue, 10 Nov 2020 09:53:49 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

(...)

> The meaning of the migration parameter and its possible values are specific to
> the device, but values are based on one of the following types:
> * bool - booleans (on/off)
> * int - integers (0, 1, 2, ...)
> * str - character strings
> 
> Migration parameters are represented as <name>=<value> in this document.
> Examples include my-feature=on and num-queues=4.
> 
> When a new migration parameter is introduced, its absence must have the same
> effect as before the migration parameter was introduced. For example, if
> my-feature=on|off is added to control the availability of a new device feature,
> then my-feature=off is equivalent to omitting the migration parameter.

Maybe this could be made more clear by using a non-bool parameter as
an example?

For the num-queues parameter used as an example above, if num-queues=2
would lead to the same effect as before, omitting the num-queues
parameter must be treated as if num-queues had been specified as 2.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10 20:14 ` Alex Williamson
@ 2020-11-11 11:48   ` Cornelia Huck
  2020-11-11 15:14     ` Stefan Hajnoczi
  2020-11-11 15:10   ` Stefan Hajnoczi
  1 sibling, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2020-11-11 11:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Kirti Wankhede, Thanos Makatos,
	Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

On Tue, 10 Nov 2020 13:14:04 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 10 Nov 2020 09:53:49 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:

> > Device models supported by an mdev driver and their details can be read from
> > the migration_info.json attr. Each mdev type supports one device model. If a
> > parent device supports multiple device models then each device model has an
> > mdev type. There may be multiple mdev types for a single device model when they
> > offer different migration parameters such as resource capacity or feature
> > availability.
> > 
> > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > migration parameters, respectively.  
> 
> 
> I think this example could be expanded for clarity.  I think this is
> suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> implement some common device model, ie. com.gfx/GPU, where the
> migration parameter 'memory' for each defaults to a value matching the
> type name.  But it seems like this can also lead to some combinatorial
> challenges for management tools if these parameters are writable.  For
> example, should a management tool create a gfx-4GB device and change to
> memory parameter to 8192 or a gfx-8GB device with the default parameter?

I would expect that the mdev types need to match in the first place.
What role would the memory= parameter play, then? Allowing gfx-4GB to
have memory=8192 feels wrong to me.

(...)

> > An open mdev device typically does not allow migration parameters to be changed
> > at runtime. However, certain migration/params attrs may allow writes at
> > runtime. Usually these migration parameters only affect the device state
> > representation and not the hardware interface. This makes it possible to
> > upgrade or downgrade the device state representation at runtime so that
> > migration is possible to newer or older device implementations.  

This refers to generation of device implementations, but not to dynamic
configuration changes. Maybe I'm just confused by this sentence, but
how are we supposed to get changes while the mdev is live across?

> 
> 
> Which begs the question of how we'd determine which can be modified
> runtime...  Thanks,
> 
> Alex
> 
> 
And this as well. Do we need different categories?



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2020-11-11 11:19 ` Cornelia Huck
@ 2020-11-11 12:56 ` Dr. David Alan Gilbert
  2020-11-11 15:34   ` Stefan Hajnoczi
  2020-11-11 16:18 ` Thanos Makatos
  4 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-11-11 12:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel, Yan Zhao,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> v3:
>  * Introduce migration info JSON to describe migration parameters
>  * Rework mdev sysfs interface
>  * Propose standard interface for vfio-user device emulation programs
> 
> VFIO Migration
> ==============
> This document describes how to ensure migration compatibility for VFIO devices,
> including mdev and vfio-user devices.
> 
> Overview
> --------
> VFIO devices can save and load a *device state*. Saving a device state produces
> a snapshot of a VFIO device that can be loaded again at a later point in time
> to resume the device from the snapshot.
> 
> The process of saving a device state and loading it later is called
> *migration*. The device state may be loaded by the same device instance that
> saved it or by a new instance, possibly running on a different machine.
> 
> A VFIO/mdev driver together with the physical device provides the functionality
> of a device. Alternatively, a vfio-user device emulation program can provide
> the functionality of a device. These are called *device implementations*.
> 
> The device implementation where a migration originates is called the *source*
> and the device implementation that a migration targets is called the
> *destination*.
> 
> Although it is possible to migrate device state without regard to migration
> compatibility, this is prone to failure except in trivial cases. Device
> implementations vary in feature availability and resource capacity so that it
> is difficult to be confident that a migration of a complex device will succeed.
> Furthermore, without migration compatibility checking it is possible that
> migration appears to succeed but leaves the device in an inoperable state,
> leading to data loss or corruption.
> 
> This document describes how to establish migration compatibility between the
> source and destination. A check can be performed before migrating and can
> therefore be used to select a suitable migration destination. When
> compatibility has been established, the probability of migrating successfully
> is high and a successful migration does not leave the device inoperable due to
> silent migration problems.
> 
> Migration Parameters
> --------------------
> *Migration parameters* are used to describe characteristics that must match
> between source and destination to achieve migration compatibility.
> 
> The first implementation of a simple device may not require migration
> parameters if the source and destination are always compatible. As the device
> evolves, the source and destination may differ and migration parameters are
> required to express these differences. More complex devices may require
> migration parameters from the start due to optional functionality that is not
> guaranteed to be present in both source and destination.
> 
> A migration parameter consists of a name and a value. The name is a UTF-8
> string that does not contain equals ('='), backslash ('/'), or whitespace
> characters. The value is a UTF-8 string that does not contain newline
> characters ('\n').
> 
> The meaning of the migration parameter and its possible values are specific to
> the device, but values are based on one of the following types:
> * bool - booleans (on/off)
> * int - integers (0, 1, 2, ...)
> * str - character strings
> 
> Migration parameters are represented as <name>=<value> in this document.
> Examples include my-feature=on and num-queues=4.
> 
> When a new migration parameter is introduced, its absence must have the same
> effect as before the migration parameter was introduced. For example, if
> my-feature=on|off is added to control the availability of a new device feature,
> then my-feature=off is equivalent to omitting the migration parameter.
> 
> Hardware Interface Compatibility
> --------------------------------
> VFIO devices have a *hardware interface* consisting of device regions and
> interrupts. Aspects of the hardware interface can vary between device
> implementations and require migration parameters to express migration
> compatibility requirements.
> 
> Examples of migration parameters include:
> * Feature availability - feature bitmasks, hardware revision numbers, etc. If
>   the destination may lack support for optional features or hardware interface
>   revisions, then migration parameters are required.
> * Functionality - hardware register blocks that are only present on certain
>   device instances. If there are multiple devices sub-models that have
>   different hardware interfaces then migration parameters are required.
> * Resource capacity - size of display framebuffers, number of queues, queue
>   size, etc.
> 
> These examples demonstrate aspects of the hardware interface that must not
> change unexpectedly. Were they to differ between source and destination, the
> chance of device driver malfunction would be high because the layout of the
> hardware interface would change or assumptions the device driver makes about
> available functionality would be violated. Migration parameters are used to
> preserve the hardware interface across migration and explicitly represent
> variations between device implementations.
> 
> Hardware interfaces sometimes support reporting an event when a change occurs.
> In those cases it may be possible to support visible changes in the hardware
> interface across migration. In most other cases migration must not result in a
> visible change in the hardware interface.
> 
> Migration parameters are not necessary for read-only values exposed through the
> hardware interface, such as MAC address EEPROMs or serial numbers, so long as
> all device implementations can be configured with the same range of input
> values for these read-only values. This is possible because migration
> parameters do not capture the full configuration of the device, only aspects
> that affect migration compatibility.
> 
> Device configuration that is not visible through the hardware interface, such
> as a host file system path of a disk image file or the physical network port
> assigned to a network card, usually does not require migration parameters
> because they can be changed without breaking migration compatibility.
> 
> The disk image file may indirectly affect the hardware interface, for example
> by constraining the device's block size to a specific value. In this case a
> block-size=N migration parameter is required to ensure migration compatibility,
> but the host file system path of the disk image file still does not require a
> migration parameter.
> 
> Device State Representation
> ---------------------------
> Device state contains both data accessible through the device's hardware
> interface and device-internal state needed to restore device operation.
> 
> The contents of hardware registers are usually included in the device state if
> they can change at runtime. Hardware registers with constant or computed data
> may not need to be part of the device state provided that device
> implementations can produce the necessary data.
> 
> Device-internal state includes the portion of the device's state that cannot be
> reconstructed from the hardware interface alone. Defining device-internal state
> in the most general way instead of exposing device implementation details
> allows for flexibility in the future. For example, device implementations often
> maintain a ring index, which is not available through the hardware interface,
> to keep track of which ring elements have already been consumed. The ring index
> must be included in the device state so that the destination can resume
> processing from the correct point in the ring. Representing this as an index
> into the ring in the hardware interface is more general than adding device
> implementation-specific request tracking data structures into the device state.
> 
> The *device state representation* defines the binary data layout of the device
> state. The device state representation is specific to each device and is beyond
> the scope of this document, but aspects pertaining to migration compatibility
> are discussed here.
> 
> Each change to the device state representation that affects migration
> compatibility requires a migration parameter. When a new field is added to the
> device state representation then a new migration parameter must be added to
> reflect this change. Often a single migration parameter expresses both a change
> to the hardware interface and the device state representation. It is also
> possible to change the device state representation without changing the
> hardware interface, for example when some state was forgotten while designing
> the previous device state representation.
> 
> The device state representation may support adding extra data that can be
> safely ignored by old device implementations. In this case migration
> compatibility is unaffected and a migration parameter is not required to
> indicate such extra data has been added.
> 
> Device Models
> -------------
> The combination of the hardware interface, device state representation, and
> migration parameter definitions is called a *device model*. Device models are
> identified by a unique UTF-8 string starting with a domain name and followed by
> path components separated with backslashes ('/'). Examples include
> vendor-a.com/my-nic, gitlab.com/user/my-device, virtio-spec.org/pci/virtio-net,
> and qemu.org/pci/10ec/8139.
> 
> The unique device model string is not changed as the device evolves. Instead,
> migration parameters are added to express variations in a device.
> 
> The device model is not tied to a specific device implementation. The same
> device model could be implemented as a VFIO/dev driver or as a vfio-user device
> emulation program.
> 
> Multiple device implementations can support the same device model. Doing so
> means that the device implementations can offer migration compatiblity because
> they support the same hardware interface, device state representation, and
> migration parameters.
> 
> Multiple device models can exist for the same hardware interface, each with a
> different device state representation and migration parameters. This makes it
> possible to fork and independently develop device models.
> 
> Orchestrating Migrations
> ------------------------
> In order to migrate a device a *migration parameter list* must first be built
> on the source. Each migration parameter is added to the list if it is in
> effect. For example, the migration parameter list for a device with
> new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> parameter was introduced with the off value disabling its effect.

What component builds that list (i.e. what component needs to know the
history that new-feature=off was the default - ah I think you answer
that below).

> The following conditions must be met to establish migration compatibility:
> 
> 1. The source and destination device model strings match.
> 
> 2. Each migration parameter name from the migration parameter list is supported
>    by the destination. For example, the destination supports the num-queues
>    migration parameter.
> 
> 3. Each migration parameter value from the migration parameter list is
>    supported by the destination. For example, the destination supports
>    num-queues=4.

Hmm, are combinations of parameter checks needed - i.e. is it possible
that a destination supports    num-queues=4 and  new-feature=on/off -
but only supports new-feature=on when num-queues>2 ?

> The migration compatibility check can be performed without initiating a
> migration. Therefore, this process can be used to select the migration
> destination.
> 
> The following steps perform the migration:
> 
> 1. Configure the destination so it is prepared to load the device state,
>    including applying the migration parameter list. This may involve
>    instantiating a new device instance or resetting an existing device instance
>    to a configuration that is compatible with the source.
> 
>    The details of how to do this for VFIO/mdev drivers and vfio-user device
>    backend programs is described below.
> 
> 2. Save the device state on the source and load it on the destination.

Which is true for almost everything, unles sit turned out to have
significant amounts of RAM on board;  do we have a way to deal with that
for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
this for now)

> 3. If migration succeeds then the destination resumes operation and the source
>    must not resume operation. If the migration fails then the source resumes
>    operation and the destination must not resume operation.
> 
> Note that these steps impose a conservative bound on device states that can be
> migrated successfully. Not all configuration parameters may be strictly
> required to match on the source and destination devices. For example, if the
> device's hardware interface has not yet been initialized then changes to the
> advertised features may not yet affect the device driver. However, accurately
> representing runtime constraints is complex and risks introducing migration
> bugs, so no attempt is made to support them.
> 
> Migration Information JSON
> --------------------------
> Device implementations describe supported device models in the following JSON
> format:
> 
> .. code:: json
> 
>   {
>     "models": {
>       "<model>": {
>         "params": {
>           "<param>": {
>             "allowed_values": [<value1>, <value2>, ...]
>             "description": ...
>             "init_value": ...
>             "off_value": ...
>             "type": ...
>           }
>         }
>       }
>     }
>   }
> 
> The "models" object contains one or more device model objects describing
> available device models. Each member name is a unique device model string, for
> example "vendor-a.com/my-nic".
> 
> The device model object contains a "params" object describes available
> migration parameters. Each migration parameter object contains the following
> members:
> 
> "allowed_values"
>   The list all values that the device implementation accepts for this migration
>   parameter. Integer ranges can be described using "<min>-<max>" strings.
> 
>   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> 
>   This member is optional. When absent, any value suitable for the type may be
>   given but the device implementation may refuse certain values.

JSON isn't a great choice for specifying ranges of integers

> "description"
>   A human-readable description of the migration parameter. This is not intended
>   for user interfaces but rather as a troubleshooting aid for developers. The
>   description is typically written in English. This member is optional.

> "init_value"
>   The initial parameter value when a device instance is created. This member is
>   required.
> 
> "off_value"
>   The parameter value that disables the effect of this parameter. This member
>   is absent if the migration parameter cannot be disabled.
> 
> "type"
>   The data type ("bool", "int", "str"). This member is required.
> 
> An example of a simple device model that only one feature and a fixed resource
> capacity:
> 
> .. code:: json
> 
>   {
>     "models": {
>       "vendor-a.com/my-nic": {
>         "params": {
>           "new-feature": {
>             "description": "New feature that old devices lack",
>             "init_value": true,
>             "off_value": false,
>             "type": 'bool'
>           },
>           "num-resources": {
>             "allowed_values": [64],
>             "description": "Number of resources",
>             "init_value": 64,
>             "type": 'int'
>           }
>         }
>       }
>     }
>   }
> 
> Newly created instances of this device will enable "new-feature", but it can be
> disabled for migration compatibility with old device instances.
> 
> The number of device resources are fixed at 64, so only device instances that
> also have exactly 64 resources can be migrated to this device implementation.
> 
> VFIO mdev Drivers
> -----------------
> The following mdev type sysfs attrs are available for managing device
> instances::
> 
>   /sys/.../<parent-device>/mdev_supported_types/<type-id>/
>     create - writing a UUID to this file instantiates a device
>     migration_info.json - read-only migration information JSON
> 
> TODO The JSON can be represented as a file system hierarchy but sysfs seems
> limited to <kobject>/<group>/<attr> and <kobject>/<attr> so it is not possible
> to express deeper attr groups like <kobject>/migration/params/<param>/<attr>?
> 
> Device models supported by an mdev driver and their details can be read from
> the migration_info.json attr. Each mdev type supports one device model. If a
> parent device supports multiple device models then each device model has an
> mdev type. There may be multiple mdev types for a single device model when they
> offer different migration parameters such as resource capacity or feature
> availability.
> 
> For example, a graphics card that supports 4 GB and 8 GB device instances would
> provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> migration parameters, respectively.
> 
> The following mdev device sysfs attrs relate to a specific device instance::
> 
>   /sys/.../<parent-device>/<uuid>/
>     mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch migration/model
>     migration/ - migration related files
>       <param> - read/write migration parameter "param"
>       ...
> 
> When the device is created all migration/<param> attrs take their
> migration_info.json "init_value".
> 
> When preparing for migration on the source, each migration parameter from
> migration/<param> is read and added to the migration parameter list if its
> value differs from "off_value" in migration_info.json. If a migration parameter
> in the list is not available on the destination, then migration is not
> possible. If a migration parameter value is not in the destination
> "allowed_values" migration_info.json then migration is not possible.
> 
> In order to prepare an mdev device instance for an incoming migration on the
> destination, the "off_value" from migration_info.json is written to each
> migration parameter in migration/<param>. Then the migration parameter list
> from the source is written to migration/<param> one migration parameter at a
> time. If an error occurs while writing a migration parameter on the destination
> then migration is not possible. Once the migration parameter list has been
> written the mdev can be opened and migration can proceed.
> 
> An open mdev device typically does not allow migration parameters to be changed
> at runtime. However, certain migration/params attrs may allow writes at
> runtime. Usually these migration parameters only affect the device state
> representation and not the hardware interface. This makes it possible to
> upgrade or downgrade the device state representation at runtime so that
> migration is possible to newer or older device implementations.
> 
> vfio-user Device Emulation Programs
> -----------------------------------
> Device emulation programs often support a simple invocation model where running
> the program creates a single device instance. The lifecycle of the device
> instance is tied to the lifecycle of the process. Such device emulation
> programs are described below.
> 
> More complex device emulation programs may host multiple devices. The interface
> for configuring these device emulation programs is not standardized. Therefore,
> migrating these devices is beyond the scope of this document.
> 
> The migration information JSON is printed to standard output by a vfio-user
> device emulation program as follows:
> 
> .. code:: bash
> 
>   $ my-device --print-migration-info-json
> 
> The device is instantiated by launching the destination process with the
> migration parameter list from the source:
> 
> .. code:: bash
> 
>   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> 
> This example shows how to instantiate the device with migration parameters
> ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> <value>`` option formats are accepted.
> 
> The ``--m-`` prefix is used to allow the device emulation program to implement
> device implementation-specific command-line options without conflicting with
> the migration parameter namespace.

That feels like an odd syntax to me.

> When preparing for migration on the source, each migration parameter from the
> migration info JSON is added to the migration parameter list if its value
> differs from "off_value". If a migration parameter in the list is not available
> on the destination, then migration is not possible. If a migration parameter
> value is not in the destination "allowed_values" migration_info.json then
> migration is not possible.
> 
> On the destination, a command-line is generated from the migration parameter
> list. For each destination migration parameter missing from the migration
> parameter list a command-line option is added with the destination "off_value".
> The device emulation program prints an error message to standard error and
> terminates with exit status 1 if the device could not be instantiated.

I still don't think this revision answers the question of how a VM
management program picks a sane set of parameter values for a new VM
it's creating, especially if it wants it to be migratable.  That's
something your version stuff in V1 seemed nice for.

Dave


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10 11:12 ` Paolo Bonzini
@ 2020-11-11 14:36   ` Stefan Hajnoczi
  2020-11-11 15:48     ` Daniel P. Berrangé
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-11 14:36 UTC (permalink / raw)
  To: Daniel Berrange
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> On 10/11/20 10:53, Stefan Hajnoczi wrote:
> > "allowed_values"
> >    The list all values that the device implementation accepts for this migration
> >    parameter. Integer ranges can be described using "<min>-<max>" strings.
> > 
> >    Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > 
> >    This member is optional. When absent, any value suitable for the type may be
> >    given but the device implementation may refuse certain values.
> 
> I'd rather make this simpler:
> 
> - remove allowed_values for strings.  Effect: discourages using strings as
> enums, leaving them only for free-form values such as vendor name or model
> name.

And introduce an enum type?

> - remove allowed_values for bools.  If off_value is absent the only allowed
> value is init_value.  If off_value is present, both true and false are
> allowed (and !off_value is the "on_value", so to speak).

Makes sense.

> - change allowed_values into allowed_min and allowed_max for int values.
> Advantage: avoids having to parse strings as ranges.  Disadvantage: removes
> expressiveness (cannot say "x must be a power of two"), but I'm not sure
> it's worth the extra complication.

Yes, the current syntax supports sparse ranges and multiple ranges.

The trade-off is that a tool cannot validate inputs beforehand. You need
to instantiate the device to see if it accepts your inputs. This is not
great for management tools because they cannot select a destination
device if they don't know which exact values are supported.

Daniel Berrange raised this requirement in a previous revision, so I
wonder what his thoughts are?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-10 20:14 ` Alex Williamson
  2020-11-11 11:48   ` Cornelia Huck
@ 2020-11-11 15:10   ` Stefan Hajnoczi
  2020-11-11 15:28     ` Cornelia Huck
  1 sibling, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-11 15:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Gerd Hoffmann, Felipe Franciosi, Christophe de Dinechin,
	Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 7056 bytes --]

On Tue, Nov 10, 2020 at 01:14:04PM -0700, Alex Williamson wrote:
> On Tue, 10 Nov 2020 09:53:49 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Documentation/filesystems/sysfs.rst:
> ---
> Attributes
> ~~~~~~~~~~
> 
> Attributes can be exported for kobjects in the form of regular files in
> the filesystem. Sysfs forwards file I/O operations to methods defined
> for the attributes, providing a means to read and write kernel
> attributes.
> 
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> 
> Mixing types, expressing multiple lines of data, and doing fancy
> formatting of data is heavily frowned upon. Doing these things may get
> you publicly humiliated and your code rewritten without notice.
> ---
> 
> We'd either need to address your TODO and create a hierarchical
> representation or find another means to exchange this format.

Okay, thanks for pointing this out. If the limitations on sysfs
directory structure are really what I think they are, then we can work
around the lack of sub-directories by flattening the hierarchical
information in an attribute name prefix, but it's ugly:

  <parent-device>/<mdev_supported_types>/<type-id>/
    migration_param_FOO_off_value
    migration_param_FOO_init_value
    migration_param_FOO_description
    migration_param_FOO_type

It makes enumerating migration parameters more awkward for userspace
because they need to skip many of the files when scanning for parameter
names.

Or we could create a kobject for each migration parameter, but that
seems wrong too.

Or we could investigate other file systems like configfs. Maybe this is
why tracefs and other specific file systems exist - sysfs is too
limited?

> > Device models supported by an mdev driver and their details can be read from
> > the migration_info.json attr. Each mdev type supports one device model. If a
> > parent device supports multiple device models then each device model has an
> > mdev type. There may be multiple mdev types for a single device model when they
> > offer different migration parameters such as resource capacity or feature
> > availability.
> > 
> > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > migration parameters, respectively.
> 
> 
> I think this example could be expanded for clarity.  I think this is
> suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> implement some common device model, ie. com.gfx/GPU, where the
> migration parameter 'memory' for each defaults to a value matching the
> type name.  But it seems like this can also lead to some combinatorial
> challenges for management tools if these parameters are writable.  For
> example, should a management tool create a gfx-4GB device and change to
> memory parameter to 8192 or a gfx-8GB device with the default parameter?

Right, if gfx-4GB and gfx-8GB both offer variable "memory" migration
parameters. Userspace will eliminate mdevs whose device model string and
allowed parameter values are incompatible, and then it will choose a
remaining mdev type. If creating the device fails then it can try
another remaining mdev type.

> > The following mdev device sysfs attrs relate to a specific device instance::
> > 
> >   /sys/.../<parent-device>/<uuid>/
> >     mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch migration/model
> 
> 
> We need a mechanism that translates to non-mdev vfio devices as well,
> the device "model" creates a clean separation from an mdev-type, we
> shouldn't reintroduce that dependency here.

Okay. The user will need the device model string and the migration
parameter info.

Is there an example of a non-mdev VFIO device that has software
functionality (e.g. device-specific sysfs attrs)?

> >     migration/ - migration related files
> >       <param> - read/write migration parameter "param"
> >       ...
> > 
> > When the device is created all migration/<param> attrs take their
> > migration_info.json "init_value".
> > 
> > When preparing for migration on the source, each migration parameter from
> > migration/<param> is read and added to the migration parameter list if its
> > value differs from "off_value" in migration_info.json. If a migration parameter
> > in the list is not available on the destination, then migration is not
> > possible. If a migration parameter value is not in the destination
> > "allowed_values" migration_info.json then migration is not possible.
> > 
> > In order to prepare an mdev device instance for an incoming migration on the
> > destination, the "off_value" from migration_info.json is written to each
> > migration parameter in migration/<param>. Then the migration parameter list
> > from the source is written to migration/<param> one migration parameter at a
> > time. If an error occurs while writing a migration parameter on the destination
> > then migration is not possible. Once the migration parameter list has been
> > written the mdev can be opened and migration can proceed.
> 
> 
> What's the logic behind setting the value twice?  If we have a
> preconfigured pool of devices where the off_value might use less
> resources, we risk that resources might be consumed elsewhere if we
> release them and try to get them back.  It also seems rather
> inefficient.

The description above was sub-optimal. Each parameter only needs to be
written once:

  for param in dest_params:
      if param in source_params:
          val = source_params[param]
      else:
          val = param_json['off_value']

      sysfs_write(f'migration/{param}', val)

We either write the value from the source or the off_value from the
destination.

> > An open mdev device typically does not allow migration parameters to be changed
> > at runtime. However, certain migration/params attrs may allow writes at
> > runtime. Usually these migration parameters only affect the device state
> > representation and not the hardware interface. This makes it possible to
> > upgrade or downgrade the device state representation at runtime so that
> > migration is possible to newer or older device implementations.
> 
> 
> Which begs the question of how we'd determine which can be modified
> runtime...  Thanks,

Deciding to modify a parameter at runtime requires knowledge of what
that parameter does. (Unlike the migration compatibility algorithm,
which blindly processes all migration parameters.)

Therefore, I'm not sure it's necessary to add metadata for this. The
user must know what they are doing when modifying parameters at runtime.
If the device implementation doesn't support modifying the parameter at
runtime then -EBUSY can be returned from write(2).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 11:48   ` Cornelia Huck
@ 2020-11-11 15:14     ` Stefan Hajnoczi
  2020-11-11 15:35       ` Cornelia Huck
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-11 15:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 3245 bytes --]

On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:
> On Tue, 10 Nov 2020 13:14:04 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> > On Tue, 10 Nov 2020 09:53:49 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > > Device models supported by an mdev driver and their details can be read from
> > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > parent device supports multiple device models then each device model has an
> > > mdev type. There may be multiple mdev types for a single device model when they
> > > offer different migration parameters such as resource capacity or feature
> > > availability.
> > > 
> > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > migration parameters, respectively.  
> > 
> > 
> > I think this example could be expanded for clarity.  I think this is
> > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > implement some common device model, ie. com.gfx/GPU, where the
> > migration parameter 'memory' for each defaults to a value matching the
> > type name.  But it seems like this can also lead to some combinatorial
> > challenges for management tools if these parameters are writable.  For
> > example, should a management tool create a gfx-4GB device and change to
> > memory parameter to 8192 or a gfx-8GB device with the default parameter?
> 
> I would expect that the mdev types need to match in the first place.
> What role would the memory= parameter play, then? Allowing gfx-4GB to
> have memory=8192 feels wrong to me.

Yes, I expected these mdev types to only accept a fixed "memory" value,
but there's nothing stopping a driver author from making "memory" accept
any value.

> > > An open mdev device typically does not allow migration parameters to be changed
> > > at runtime. However, certain migration/params attrs may allow writes at
> > > runtime. Usually these migration parameters only affect the device state
> > > representation and not the hardware interface. This makes it possible to
> > > upgrade or downgrade the device state representation at runtime so that
> > > migration is possible to newer or older device implementations.  
> 
> This refers to generation of device implementations, but not to dynamic
> configuration changes. Maybe I'm just confused by this sentence, but
> how are we supposed to get changes while the mdev is live across?

This is about dynamic configuration changes. For example, if a field was
forgotten in the device state representation then a migration parameter
can be added to enable the fix. When the parameter is off the device
state is incomplete but migration to old device implementations still
works. An old device can be migrated to a new device implementation with
the parameter turned off. And then you can safely enable the migration
parameter at runtime without powering off the guest because it's purely
a device state representation change, not a hardware interface change
that would disturb the guest.

This is kind of similar to QEMU migration subsections.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:10   ` Stefan Hajnoczi
@ 2020-11-11 15:28     ` Cornelia Huck
  2020-11-16 11:36       ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2020-11-11 15:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

On Wed, 11 Nov 2020 15:10:14 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Nov 10, 2020 at 01:14:04PM -0700, Alex Williamson wrote:
> > On Tue, 10 Nov 2020 09:53:49 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Documentation/filesystems/sysfs.rst:
> > ---
> > Attributes
> > ~~~~~~~~~~
> > 
> > Attributes can be exported for kobjects in the form of regular files in
> > the filesystem. Sysfs forwards file I/O operations to methods defined
> > for the attributes, providing a means to read and write kernel
> > attributes.
> > 
> > Attributes should be ASCII text files, preferably with only one value
> > per file. It is noted that it may not be efficient to contain only one
> > value per file, so it is socially acceptable to express an array of
> > values of the same type.
> > 
> > Mixing types, expressing multiple lines of data, and doing fancy
> > formatting of data is heavily frowned upon. Doing these things may get
> > you publicly humiliated and your code rewritten without notice.
> > ---
> > 
> > We'd either need to address your TODO and create a hierarchical
> > representation or find another means to exchange this format.  
> 
> Okay, thanks for pointing this out. If the limitations on sysfs
> directory structure are really what I think they are, then we can work
> around the lack of sub-directories by flattening the hierarchical
> information in an attribute name prefix, but it's ugly:
> 
>   <parent-device>/<mdev_supported_types>/<type-id>/
>     migration_param_FOO_off_value
>     migration_param_FOO_init_value
>     migration_param_FOO_description
>     migration_param_FOO_type
> 
> It makes enumerating migration parameters more awkward for userspace
> because they need to skip many of the files when scanning for parameter
> names.
> 
> Or we could create a kobject for each migration parameter, but that
> seems wrong too.

Hm, ISTR that you can do something with ksets.

> 
> Or we could investigate other file systems like configfs. Maybe this is
> why tracefs and other specific file systems exist - sysfs is too
> limited?

If you want to express complex data, sysfs is quickly hitting its
limits. The benefits of using sysfs are basically that sysfs is always
present (and therefore readily consumed by existing tooling), and that
you have device properties properly grouped with the device.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 12:56 ` Dr. David Alan Gilbert
@ 2020-11-11 15:34   ` Stefan Hajnoczi
  2020-11-11 15:41     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-11 15:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel, Yan Zhao,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 7252 bytes --]

On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > Orchestrating Migrations
> > ------------------------
> > In order to migrate a device a *migration parameter list* must first be built
> > on the source. Each migration parameter is added to the list if it is in
> > effect. For example, the migration parameter list for a device with
> > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > parameter was introduced with the off value disabling its effect.
> 
> What component builds that list (i.e. what component needs to know the
> history that new-feature=off was the default - ah I think you answer
> that below).

Yep. Thanks for noting this. I'll need to reorder things so it is clear.

> > The following conditions must be met to establish migration compatibility:
> > 
> > 1. The source and destination device model strings match.
> > 
> > 2. Each migration parameter name from the migration parameter list is supported
> >    by the destination. For example, the destination supports the num-queues
> >    migration parameter.
> > 
> > 3. Each migration parameter value from the migration parameter list is
> >    supported by the destination. For example, the destination supports
> >    num-queues=4.
> 
> Hmm, are combinations of parameter checks needed - i.e. is it possible
> that a destination supports    num-queues=4 and  new-feature=on/off -
> but only supports new-feature=on when num-queues>2 ?

Yes, it's possible but cannot be expressed in the migration info JSON.

We need to choose a level of expressiveness that will be useful enough
without being complex. In the extreme the migration info would contain
Turing complete validation expressions (e.g. JavaScript) so that any
relationship can be expressed, but I doubt that complexity is needed.
The other extreme is just booleans and (opaque) strings for maximum
simplicity.

If the syntax is not expressive enough then it's impossible to check
migration compatibility without actually creating a new device instance
on the destination. Daniel Berrange raised the requirement of checking
migration compatibility without creating the device since this helps
with selecting a migration destination.

> > The migration compatibility check can be performed without initiating a
> > migration. Therefore, this process can be used to select the migration
> > destination.
> > 
> > The following steps perform the migration:
> > 
> > 1. Configure the destination so it is prepared to load the device state,
> >    including applying the migration parameter list. This may involve
> >    instantiating a new device instance or resetting an existing device instance
> >    to a configuration that is compatible with the source.
> > 
> >    The details of how to do this for VFIO/mdev drivers and vfio-user device
> >    backend programs is described below.
> > 
> > 2. Save the device state on the source and load it on the destination.
> 
> Which is true for almost everything, unles sit turned out to have
> significant amounts of RAM on board;  do we have a way to deal with that
> for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> this for now)

Step 2 includes iterative migration. I should have mentioned that in the
document.

> > "allowed_values"
> >   The list all values that the device implementation accepts for this migration
> >   parameter. Integer ranges can be described using "<min>-<max>" strings.
> > 
> >   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > 
> >   This member is optional. When absent, any value suitable for the type may be
> >   given but the device implementation may refuse certain values.
> 
> JSON isn't a great choice for specifying ranges of integers

Agreed :)

> > The device is instantiated by launching the destination process with the
> > migration parameter list from the source:
> > 
> > .. code:: bash
> > 
> >   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > 
> > This example shows how to instantiate the device with migration parameters
> > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > <value>`` option formats are accepted.
> > 
> > The ``--m-`` prefix is used to allow the device emulation program to implement
> > device implementation-specific command-line options without conflicting with
> > the migration parameter namespace.
> 
> That feels like an odd syntax to me.

Unfortunately we cannot use --<param>. I also considered using a JSON
input file but that makes it harder to invoke the device emulation
program manually for testing/development. I bet I'd have to look up the
JSON syntax every time whereas it's easy to remember how to format a
command-line parameter.

The other one I considered was using '--' or another marker to separate
device implementation-specific command-line arguments from migration
parameters. However, doing so places requirements on the device
emulation program's command-line parsing library and I think people will
be unhappy if their favorite Go, Rust, Python, etc library cannot handle
the command-line options due to our weird syntax.

Any ideas for a better syntax?

> > When preparing for migration on the source, each migration parameter from the
> > migration info JSON is added to the migration parameter list if its value
> > differs from "off_value". If a migration parameter in the list is not available
> > on the destination, then migration is not possible. If a migration parameter
> > value is not in the destination "allowed_values" migration_info.json then
> > migration is not possible.
> > 
> > On the destination, a command-line is generated from the migration parameter
> > list. For each destination migration parameter missing from the migration
> > parameter list a command-line option is added with the destination "off_value".
> > The device emulation program prints an error message to standard error and
> > terminates with exit status 1 if the device could not be instantiated.
> 
> I still don't think this revision answers the question of how a VM
> management program picks a sane set of parameter values for a new VM
> it's creating, especially if it wants it to be migratable.  That's
> something your version stuff in V1 seemed nice for.

Good point. If we're creating a VM and expect to migrate between two
device implementations, how do we choose the migration parameters?

I can see a solution for that: grab the set of "init_values" from both
device implementations and use the one that both accept. This is O(N^2)
so it's not great when there are many device implementations involved.
It's O(N) with version numbers because you can keep an intersection set
of supported version numbers.

This point definitely needs to be included in the document. Is my answer
acceptable or do you think versions are really needed?

It's also hard to answer "which of these two migration parameter lists
is better/more modern?" without versions when non-bool migration
parameters are involved.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 11:19 ` Cornelia Huck
@ 2020-11-11 15:35   ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-11 15:35 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Kirti Wankhede, Thanos Makatos,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

On Wed, Nov 11, 2020 at 12:19:18PM +0100, Cornelia Huck wrote:
> On Tue, 10 Nov 2020 09:53:49 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> (...)
> 
> > The meaning of the migration parameter and its possible values are specific to
> > the device, but values are based on one of the following types:
> > * bool - booleans (on/off)
> > * int - integers (0, 1, 2, ...)
> > * str - character strings
> > 
> > Migration parameters are represented as <name>=<value> in this document.
> > Examples include my-feature=on and num-queues=4.
> > 
> > When a new migration parameter is introduced, its absence must have the same
> > effect as before the migration parameter was introduced. For example, if
> > my-feature=on|off is added to control the availability of a new device feature,
> > then my-feature=off is equivalent to omitting the migration parameter.
> 
> Maybe this could be made more clear by using a non-bool parameter as
> an example?
> 
> For the num-queues parameter used as an example above, if num-queues=2
> would lead to the same effect as before, omitting the num-queues
> parameter must be treated as if num-queues had been specified as 2.

Will fix, thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:14     ` Stefan Hajnoczi
@ 2020-11-11 15:35       ` Cornelia Huck
  2020-11-16 11:02         ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2020-11-11 15:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 3923 bytes --]

On Wed, 11 Nov 2020 15:14:49 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:
> > On Tue, 10 Nov 2020 13:14:04 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:  
> > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:  
> >   
> > > > Device models supported by an mdev driver and their details can be read from
> > > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > > parent device supports multiple device models then each device model has an
> > > > mdev type. There may be multiple mdev types for a single device model when they
> > > > offer different migration parameters such as resource capacity or feature
> > > > availability.
> > > > 
> > > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > > migration parameters, respectively.    
> > > 
> > > 
> > > I think this example could be expanded for clarity.  I think this is
> > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > implement some common device model, ie. com.gfx/GPU, where the
> > > migration parameter 'memory' for each defaults to a value matching the
> > > type name.  But it seems like this can also lead to some combinatorial
> > > challenges for management tools if these parameters are writable.  For
> > > example, should a management tool create a gfx-4GB device and change to
> > > memory parameter to 8192 or a gfx-8GB device with the default parameter?  
> > 
> > I would expect that the mdev types need to match in the first place.
> > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > have memory=8192 feels wrong to me.  
> 
> Yes, I expected these mdev types to only accept a fixed "memory" value,
> but there's nothing stopping a driver author from making "memory" accept
> any value.

I'm wondering how useful the memory parameter is, then. The layer
checking for compatibility can filter out inconsistent settings, but
why would we need to express something that is already implied in the
mdev type separately?

> 
> > > > An open mdev device typically does not allow migration parameters to be changed
> > > > at runtime. However, certain migration/params attrs may allow writes at
> > > > runtime. Usually these migration parameters only affect the device state
> > > > representation and not the hardware interface. This makes it possible to
> > > > upgrade or downgrade the device state representation at runtime so that
> > > > migration is possible to newer or older device implementations.    
> > 
> > This refers to generation of device implementations, but not to dynamic
> > configuration changes. Maybe I'm just confused by this sentence, but
> > how are we supposed to get changes while the mdev is live across?  
> 
> This is about dynamic configuration changes. For example, if a field was
> forgotten in the device state representation then a migration parameter
> can be added to enable the fix. When the parameter is off the device
> state is incomplete but migration to old device implementations still
> works. An old device can be migrated to a new device implementation with
> the parameter turned off. And then you can safely enable the migration
> parameter at runtime without powering off the guest because it's purely
> a device state representation change, not a hardware interface change
> that would disturb the guest.
> 
> This is kind of similar to QEMU migration subsections.

Ok, I was a bit confused here.

So, we build the stream with the then-current parameters? How is the
compat-checking layer supposed to deal with parameters changing after
the check -- is it a "you get to keep the pieces" situation?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:34   ` Stefan Hajnoczi
@ 2020-11-11 15:41     ` Dr. David Alan Gilbert
  2020-11-16 14:38       ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-11-11 15:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel, Yan Zhao,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Orchestrating Migrations
> > > ------------------------
> > > In order to migrate a device a *migration parameter list* must first be built
> > > on the source. Each migration parameter is added to the list if it is in
> > > effect. For example, the migration parameter list for a device with
> > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > parameter was introduced with the off value disabling its effect.
> > 
> > What component builds that list (i.e. what component needs to know the
> > history that new-feature=off was the default - ah I think you answer
> > that below).
> 
> Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> 
> > > The following conditions must be met to establish migration compatibility:
> > > 
> > > 1. The source and destination device model strings match.
> > > 
> > > 2. Each migration parameter name from the migration parameter list is supported
> > >    by the destination. For example, the destination supports the num-queues
> > >    migration parameter.
> > > 
> > > 3. Each migration parameter value from the migration parameter list is
> > >    supported by the destination. For example, the destination supports
> > >    num-queues=4.
> > 
> > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > that a destination supports    num-queues=4 and  new-feature=on/off -
> > but only supports new-feature=on when num-queues>2 ?
> 
> Yes, it's possible but cannot be expressed in the migration info JSON.
> 
> We need to choose a level of expressiveness that will be useful enough
> without being complex. In the extreme the migration info would contain
> Turing complete validation expressions (e.g. JavaScript) so that any
> relationship can be expressed, but I doubt that complexity is needed.
> The other extreme is just booleans and (opaque) strings for maximum
> simplicity.
> 
> If the syntax is not expressive enough then it's impossible to check
> migration compatibility without actually creating a new device instance
> on the destination. Daniel Berrange raised the requirement of checking
> migration compatibility without creating the device since this helps
> with selecting a migration destination.

Right, but my worry isn't the JSON description, it's the set of 3
conditions above; they need to state that only some combinations need to
be valid.

> 
> > > The migration compatibility check can be performed without initiating a
> > > migration. Therefore, this process can be used to select the migration
> > > destination.
> > > 
> > > The following steps perform the migration:
> > > 
> > > 1. Configure the destination so it is prepared to load the device state,
> > >    including applying the migration parameter list. This may involve
> > >    instantiating a new device instance or resetting an existing device instance
> > >    to a configuration that is compatible with the source.
> > > 
> > >    The details of how to do this for VFIO/mdev drivers and vfio-user device
> > >    backend programs is described below.
> > > 
> > > 2. Save the device state on the source and load it on the destination.
> > 
> > Which is true for almost everything, unles sit turned out to have
> > significant amounts of RAM on board;  do we have a way to deal with that
> > for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> > this for now)
> 
> Step 2 includes iterative migration. I should have mentioned that in the
> document.

OK.

> > > "allowed_values"
> > >   The list all values that the device implementation accepts for this migration
> > >   parameter. Integer ranges can be described using "<min>-<max>" strings.
> > > 
> > >   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > > 
> > >   This member is optional. When absent, any value suitable for the type may be
> > >   given but the device implementation may refuse certain values.
> > 
> > JSON isn't a great choice for specifying ranges of integers
> 
> Agreed :)
> 
> > > The device is instantiated by launching the destination process with the
> > > migration parameter list from the source:
> > > 
> > > .. code:: bash
> > > 
> > >   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > > 
> > > This example shows how to instantiate the device with migration parameters
> > > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > > <value>`` option formats are accepted.
> > > 
> > > The ``--m-`` prefix is used to allow the device emulation program to implement
> > > device implementation-specific command-line options without conflicting with
> > > the migration parameter namespace.
> > 
> > That feels like an odd syntax to me.
> 
> Unfortunately we cannot use --<param>. I also considered using a JSON
> input file but that makes it harder to invoke the device emulation
> program manually for testing/development. I bet I'd have to look up the
> JSON syntax every time whereas it's easy to remember how to format a
> command-line parameter.
> 
> The other one I considered was using '--' or another marker to separate
> device implementation-specific command-line arguments from migration
> parameters. However, doing so places requirements on the device
> emulation program's command-line parsing library and I think people will
> be unhappy if their favorite Go, Rust, Python, etc library cannot handle
> the command-line options due to our weird syntax.
> 
> Any ideas for a better syntax?

I'd be happy with a --param name=value   repeatedly, but also know that
some option parsers don't like that.

> > > When preparing for migration on the source, each migration parameter from the
> > > migration info JSON is added to the migration parameter list if its value
> > > differs from "off_value". If a migration parameter in the list is not available
> > > on the destination, then migration is not possible. If a migration parameter
> > > value is not in the destination "allowed_values" migration_info.json then
> > > migration is not possible.
> > > 
> > > On the destination, a command-line is generated from the migration parameter
> > > list. For each destination migration parameter missing from the migration
> > > parameter list a command-line option is added with the destination "off_value".
> > > The device emulation program prints an error message to standard error and
> > > terminates with exit status 1 if the device could not be instantiated.
> > 
> > I still don't think this revision answers the question of how a VM
> > management program picks a sane set of parameter values for a new VM
> > it's creating, especially if it wants it to be migratable.  That's
> > something your version stuff in V1 seemed nice for.
> 
> Good point. If we're creating a VM and expect to migrate between two
> device implementations, how do we choose the migration parameters?
> 
> I can see a solution for that: grab the set of "init_values" from both
> device implementations and use the one that both accept. This is O(N^2)
> so it's not great when there are many device implementations involved.
> It's O(N) with version numbers because you can keep an intersection set
> of supported version numbers.

Which is actually more complex if there's only some combinations that
work.

> This point definitely needs to be included in the document. Is my answer
> acceptable or do you think versions are really needed?
> 
> It's also hard to answer "which of these two migration parameter lists
> is better/more modern?" without versions when non-bool migration
> parameters are involved.

Dave

> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 14:36   ` Stefan Hajnoczi
@ 2020-11-11 15:48     ` Daniel P. Berrangé
  2020-11-12 15:26       ` Cornelia Huck
                         ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Daniel P. Berrangé @ 2020-11-11 15:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Yan Zhao, quintela,
	Jason Wang, Zeng, Xin, qemu-devel, Dr. David Alan Gilbert,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

On Wed, Nov 11, 2020 at 02:36:15PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> > On 10/11/20 10:53, Stefan Hajnoczi wrote:
> > > "allowed_values"
> > >    The list all values that the device implementation accepts for this migration
> > >    parameter. Integer ranges can be described using "<min>-<max>" strings.
> > > 
> > >    Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], [true]
> > > 
> > >    This member is optional. When absent, any value suitable for the type may be
> > >    given but the device implementation may refuse certain values.
> > 
> > I'd rather make this simpler:
> > 
> > - remove allowed_values for strings.  Effect: discourages using strings as
> > enums, leaving them only for free-form values such as vendor name or model
> > name.
> 
> And introduce an enum type?
> 
> > - remove allowed_values for bools.  If off_value is absent the only allowed
> > value is init_value.  If off_value is present, both true and false are
> > allowed (and !off_value is the "on_value", so to speak).
> 
> Makes sense.
> 
> > - change allowed_values into allowed_min and allowed_max for int values.
> > Advantage: avoids having to parse strings as ranges.  Disadvantage: removes
> > expressiveness (cannot say "x must be a power of two"), but I'm not sure
> > it's worth the extra complication.
> 
> Yes, the current syntax supports sparse ranges and multiple ranges.
> 
> The trade-off is that a tool cannot validate inputs beforehand. You need
> to instantiate the device to see if it accepts your inputs. This is not
> great for management tools because they cannot select a destination
> device if they don't know which exact values are supported.
> 
> Daniel Berrange raised this requirement in a previous revision, so I
> wonder what his thoughts are?

In terms of validation I can't help but feel the whole proposal is
really very complicated.

In validating QEMU migration compatibility we merely compare the
versioned machine type.

IIUC, in this proposal, it would be more like exploding the machine
type into all its 100's of properties and then comparing each one
individually.

I really prefer the simpler model of QEMU versioned machine types
where compatibility is a simple string comparison, hiding the
100's of individual config parameters.  

Of course there are scenarios where this will lead a mgmt app to
refuse a migration, when it could in fact have permitted it.

eg  consider   pc-i440fx-4.0  and pc-i440fx-5.0 machine types,
which only differ in the value  "foo=7" and "foo=8" respectively.

Now if the target only supported machine type pc-i440fx-5.0, then
with a basic string comparison of machine type versin, we can't
migrate from a host uing pc-i440fx-4.0

If we exploded the machine type into its params, we could see that
we can migrate from pc-i440fx-4.0 to pc-i440fx-5.0, simply by
overriding the value of "foo".

So, yes, dealing with individual params is more flexible, but it
comes at an enourmous cost in complexity to process all the
parameters. I'm not convinced this is a good tradeoff. 


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [RFC v3] VFIO Migration
  2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2020-11-11 12:56 ` Dr. David Alan Gilbert
@ 2020-11-11 16:18 ` Thanos Makatos
  2020-11-16 15:24   ` Stefan Hajnoczi
  4 siblings, 1 reply; 38+ messages in thread
From: Thanos Makatos @ 2020-11-11 16:18 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	Swapnil Ingle, quintela, Jason Wang, Zeng, Xin,
	Dr. David Alan Gilbert, John Levon, Yan Zhao, Kirti Wankhede,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Paolo Bonzini, changpeng.liu


> VFIO Migration
> ==============
> This document describes how to ensure migration compatibility for VFIO
> devices,
> including mdev and vfio-user devices.

Is this something all VFIO/user devices will have to support? If it's not
mandatory, how can a device advertise support?

> Multiple device implementations can support the same device model. Doing
> so
> means that the device implementations can offer migration compatiblity
> because
> they support the same hardware interface, device state representation, and
> migration parameters.

Does the above mean that a passthrough function can be migrated to a vfio-user
program and vice versa? If so, then it's worth mentioning.

> More complex device emulation programs may host multiple devices. The
> interface
> for configuring these device emulation programs is not standardized.
> Therefore,
> migrating these devices is beyond the scope of this document.

Most likely a device emulation program hosting multile devices would allow
some form of communication for control purposes (e.g. SPDK implements a JSON-RPC
server). So maybe it's possible to define interacting with such programs in
this document?

> 
> The migration information JSON is printed to standard output by a vfio-user
> device emulation program as follows:
> 
> .. code:: bash
> 
>   $ my-device --print-migration-info-json
> 
> The device is instantiated by launching the destination process with the
> migration parameter list from the source:

Must 'my-device --print-migration-info-json' always generate the same migration
information JSON? If so, then what if the output generated by
'my-device --print-migration-info-json' depends on additional arguments passed
to 'my-device' when it was originally started?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:48     ` Daniel P. Berrangé
@ 2020-11-12 15:26       ` Cornelia Huck
  2020-11-16 10:48       ` Stefan Hajnoczi
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Cornelia Huck @ 2020-11-12 15:26 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Yan Zhao, quintela,
	Jason Wang, Zeng, Xin, qemu-devel, Dr. David Alan Gilbert,
	Kirti Wankhede, Thanos Makatos, Alex Williamson, Gerd Hoffmann,
	Stefan Hajnoczi, Felipe Franciosi, Christophe de Dinechin,
	Paolo Bonzini

On Wed, 11 Nov 2020 15:48:50 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> In terms of validation I can't help but feel the whole proposal is
> really very complicated.
> 
> In validating QEMU migration compatibility we merely compare the
> versioned machine type.
> 
> IIUC, in this proposal, it would be more like exploding the machine
> type into all its 100's of properties and then comparing each one
> individually.
> 
> I really prefer the simpler model of QEMU versioned machine types
> where compatibility is a simple string comparison, hiding the
> 100's of individual config parameters.  
> 
> Of course there are scenarios where this will lead a mgmt app to
> refuse a migration, when it could in fact have permitted it.
> 
> eg  consider   pc-i440fx-4.0  and pc-i440fx-5.0 machine types,
> which only differ in the value  "foo=7" and "foo=8" respectively.
> 
> Now if the target only supported machine type pc-i440fx-5.0, then
> with a basic string comparison of machine type versin, we can't
> migrate from a host uing pc-i440fx-4.0
> 
> If we exploded the machine type into its params, we could see that
> we can migrate from pc-i440fx-4.0 to pc-i440fx-5.0, simply by
> overriding the value of "foo".
> 
> So, yes, dealing with individual params is more flexible, but it
> comes at an enourmous cost in complexity to process all the
> parameters. I'm not convinced this is a good tradeoff. 

For mdev devices, we could have something similar to versioned machine
types by introducing versioned mdev types. (Which would fit well with
mdev types having to match strictly for migration to be possible.)

For other use cases, we would need to introduce a new construct.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:48     ` Daniel P. Berrangé
  2020-11-12 15:26       ` Cornelia Huck
@ 2020-11-16 10:48       ` Stefan Hajnoczi
  2020-11-16 11:15       ` Stefan Hajnoczi
  2020-11-16 12:06       ` Michael S. Tsirkin
  3 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 10:48 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Yan Zhao, quintela,
	Jason Wang, Zeng, Xin, qemu-devel, Dr. David Alan Gilbert,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On Wed, Nov 11, 2020 at 03:48:50PM +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 11, 2020 at 02:36:15PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> > > On 10/11/20 10:53, Stefan Hajnoczi wrote:
> In terms of validation I can't help but feel the whole proposal is
> really very complicated.
> 
> In validating QEMU migration compatibility we merely compare the
> versioned machine type.
> 
> IIUC, in this proposal, it would be more like exploding the machine
> type into all its 100's of properties and then comparing each one
> individually.
> 
> I really prefer the simpler model of QEMU versioned machine types
> where compatibility is a simple string comparison, hiding the
> 100's of individual config parameters.  
> 
> Of course there are scenarios where this will lead a mgmt app to
> refuse a migration, when it could in fact have permitted it.
> 
> eg  consider   pc-i440fx-4.0  and pc-i440fx-5.0 machine types,
> which only differ in the value  "foo=7" and "foo=8" respectively.
> 
> Now if the target only supported machine type pc-i440fx-5.0, then
> with a basic string comparison of machine type versin, we can't
> migrate from a host uing pc-i440fx-4.0
> 
> If we exploded the machine type into its params, we could see that
> we can migrate from pc-i440fx-4.0 to pc-i440fx-5.0, simply by
> overriding the value of "foo".
> 
> So, yes, dealing with individual params is more flexible, but it
> comes at an enourmous cost in complexity to process all the
> parameters. I'm not convinced this is a good tradeoff. 

A single standard version number is not enough since there are optional
features and resource capacity (number of queues, memory sizes, etc)
varies between implementations.

At best, a version number can summarize multiple migration parameters,
but it cannot eliminate all of them.

If we don't care about checking compatiblity ahead of time then we can
use just a device model and version, but then migration fails when the
source and destination end up being incompatible.

Since you raised the requirement of checking migration compatibility
ahead of time, I don't see a way to avoid the complexity.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:35       ` Cornelia Huck
@ 2020-11-16 11:02         ` Stefan Hajnoczi
  2020-11-16 13:52           ` Cornelia Huck
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 11:02 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 5535 bytes --]

On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:
> On Wed, 11 Nov 2020 15:14:49 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:
> > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > Alex Williamson <alex.williamson@redhat.com> wrote:  
> > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:  
> > >   
> > > > > Device models supported by an mdev driver and their details can be read from
> > > > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > > > parent device supports multiple device models then each device model has an
> > > > > mdev type. There may be multiple mdev types for a single device model when they
> > > > > offer different migration parameters such as resource capacity or feature
> > > > > availability.
> > > > > 
> > > > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > > > migration parameters, respectively.    
> > > > 
> > > > 
> > > > I think this example could be expanded for clarity.  I think this is
> > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > migration parameter 'memory' for each defaults to a value matching the
> > > > type name.  But it seems like this can also lead to some combinatorial
> > > > challenges for management tools if these parameters are writable.  For
> > > > example, should a management tool create a gfx-4GB device and change to
> > > > memory parameter to 8192 or a gfx-8GB device with the default parameter?  
> > > 
> > > I would expect that the mdev types need to match in the first place.
> > > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > > have memory=8192 feels wrong to me.  
> > 
> > Yes, I expected these mdev types to only accept a fixed "memory" value,
> > but there's nothing stopping a driver author from making "memory" accept
> > any value.
> 
> I'm wondering how useful the memory parameter is, then. The layer
> checking for compatibility can filter out inconsistent settings, but
> why would we need to express something that is already implied in the
> mdev type separately?

To avoid tying device instances to specific mdev types. An mdev type is
a device implementation, but the goal is to enable migration between
device implementations (new/old or completely different
implementations).

Imagine a new physical device that now offers variable memory because
users found the static mdev types too constraining.  How do you migrate
back and forth between new and old physical devices if the migration
parameters don't describe the memory size? Migration parameters make it
possible. Without them the management tool needs to hard-code knowledge
of specific mdev types that support migration.

> > > > > An open mdev device typically does not allow migration parameters to be changed
> > > > > at runtime. However, certain migration/params attrs may allow writes at
> > > > > runtime. Usually these migration parameters only affect the device state
> > > > > representation and not the hardware interface. This makes it possible to
> > > > > upgrade or downgrade the device state representation at runtime so that
> > > > > migration is possible to newer or older device implementations.    
> > > 
> > > This refers to generation of device implementations, but not to dynamic
> > > configuration changes. Maybe I'm just confused by this sentence, but
> > > how are we supposed to get changes while the mdev is live across?  
> > 
> > This is about dynamic configuration changes. For example, if a field was
> > forgotten in the device state representation then a migration parameter
> > can be added to enable the fix. When the parameter is off the device
> > state is incomplete but migration to old device implementations still
> > works. An old device can be migrated to a new device implementation with
> > the parameter turned off. And then you can safely enable the migration
> > parameter at runtime without powering off the guest because it's purely
> > a device state representation change, not a hardware interface change
> > that would disturb the guest.
> > 
> > This is kind of similar to QEMU migration subsections.
> 
> Ok, I was a bit confused here.
> 
> So, we build the stream with the then-current parameters? How is the
> compat-checking layer supposed to deal with parameters changing after
> the check -- is it a "you get to keep the pieces" situation?

Migration compatibility checking is part of orchestrating the migration.
The migration parameters are assumed to be immutable during the
migration process (i.e. the management tool won't let you change them).
But you are free to change them while there is no ongoing migration.

Changing parameters at runtime is something that requires knowledge from
the user or management tool. "I want to upgrade the device to fix a bug
and I know it affects migration compatibility." However, the migration
compatibility check still does its job: if you changed a parameter you
might find the old source is no longer compatible because it lacks
support for the new parameter you set. In that case you could revert the
parameter before migrating back to the old source.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:48     ` Daniel P. Berrangé
  2020-11-12 15:26       ` Cornelia Huck
  2020-11-16 10:48       ` Stefan Hajnoczi
@ 2020-11-16 11:15       ` Stefan Hajnoczi
  2020-11-16 11:41         ` Daniel P. Berrangé
  2020-11-16 12:48         ` Gerd Hoffmann
  2020-11-16 12:06       ` Michael S. Tsirkin
  3 siblings, 2 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 11:15 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Yan Zhao, quintela,
	Jason Wang, Zeng, Xin, qemu-devel, Dr. David Alan Gilbert,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

On Wed, Nov 11, 2020 at 03:48:50PM +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 11, 2020 at 02:36:15PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> > > On 10/11/20 10:53, Stefan Hajnoczi wrote:
> > Yes, the current syntax supports sparse ranges and multiple ranges.
> > 
> > The trade-off is that a tool cannot validate inputs beforehand. You need
> > to instantiate the device to see if it accepts your inputs. This is not
> > great for management tools because they cannot select a destination
> > device if they don't know which exact values are supported.
> > 
> > Daniel Berrange raised this requirement in a previous revision, so I
> > wonder what his thoughts are?
> 
> In terms of validation I can't help but feel the whole proposal is
> really very complicated.
> 
> In validating QEMU migration compatibility we merely compare the
> versioned machine type.

Thinking more about this, maybe the big picture is:

Today the managment tool controls the variables in the migration (the
device configuration). It has knowledge of the VMM, can set a machine
type, apply a device configuration on top, and then migrate safely.

VFIO changes this model because VMMs and management tools do not have
knowledge of specific device implementations. The device implementation
is a new source of variables in the migration and the management tool no
longer has the full picture.

I'm trying to define a standard interface for exposing migration
compatibility information from device implementations to management
tools, and a general algorithm that management tools can use without
knowledge of specific device implementations.

It is possible to simplify the problem, but we'll lose freedom. For
example, hard coding knowledge of the device implementation into the
management tool eliminates the need for a general migration checking
algorithm. Or we might be able to simplify it by explicitly not
supporting cross-device implementation migration (although that would
place stricter rules on what a new version of an existing device can
change in order to preserve migration compatibility).

I have doubts that these trade-offs can be made without losing support
for use cases that are necessary.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:28     ` Cornelia Huck
@ 2020-11-16 11:36       ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 11:36 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On Wed, Nov 11, 2020 at 04:28:10PM +0100, Cornelia Huck wrote:
> On Wed, 11 Nov 2020 15:10:14 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Tue, Nov 10, 2020 at 01:14:04PM -0700, Alex Williamson wrote:
> > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Or we could create a kobject for each migration parameter, but that
> > seems wrong too.
> 
> Hm, ISTR that you can do something with ksets.

Thanks for the idea! Researching it turned up kobject_create_and_add(),
which seems to solve the problem without the heavyweight stuff I was
concerned about (defining dummy ktypes, sending uevents, etc).

I'll bring back the sysfs hierarchy in the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 11:15       ` Stefan Hajnoczi
@ 2020-11-16 11:41         ` Daniel P. Berrangé
  2020-11-16 12:03           ` Michael S. Tsirkin
  2020-11-16 12:48         ` Gerd Hoffmann
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel P. Berrangé @ 2020-11-16 11:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Yan Zhao, quintela,
	Jason Wang, Zeng, Xin, qemu-devel, Dr. David Alan Gilbert,
	Kirti Wankhede, Thanos Makatos, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 11:15:24AM +0000, Stefan Hajnoczi wrote:
> On Wed, Nov 11, 2020 at 03:48:50PM +0000, Daniel P. Berrangé wrote:
> > On Wed, Nov 11, 2020 at 02:36:15PM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Nov 10, 2020 at 12:12:31PM +0100, Paolo Bonzini wrote:
> > > > On 10/11/20 10:53, Stefan Hajnoczi wrote:
> > > Yes, the current syntax supports sparse ranges and multiple ranges.
> > > 
> > > The trade-off is that a tool cannot validate inputs beforehand. You need
> > > to instantiate the device to see if it accepts your inputs. This is not
> > > great for management tools because they cannot select a destination
> > > device if they don't know which exact values are supported.
> > > 
> > > Daniel Berrange raised this requirement in a previous revision, so I
> > > wonder what his thoughts are?
> > 
> > In terms of validation I can't help but feel the whole proposal is
> > really very complicated.
> > 
> > In validating QEMU migration compatibility we merely compare the
> > versioned machine type.
> 
> Thinking more about this, maybe the big picture is:
> 
> Today the managment tool controls the variables in the migration (the
> device configuration). It has knowledge of the VMM, can set a machine
> type, apply a device configuration on top, and then migrate safely.
> 
> VFIO changes this model because VMMs and management tools do not have
> knowledge of specific device implementations. The device implementation
> is a new source of variables in the migration and the management tool no
> longer has the full picture.

This is not all that different from what we have today. eg QEMU exposes
several 100 devices impls, each with countless properties. Mgmt tools
like libvirt, or OpenStack/oVirt above don't support all these device
impls, nor do they support all the properties.

IOW, in many cases no configuration is exposed for many of the device
tunables, mgmt tools just rely on the machine type defaults for the
majority of them, and only do tuning for a relatively small subset.

So the machine type acts as a simplifying layer for the mgmt app,
enabling them to safely ignore majority of tunables, and only focus
on the small number of tunables they actually care about changing
or setting.

> I'm trying to define a standard interface for exposing migration
> compatibility information from device implementations to management
> tools, and a general algorithm that management tools can use without
> knowledge of specific device implementations.

For a given type of device I expect there would be some core set of
config parameters that would have to be common to any impl, plus
some set of config params that are specific to just one impl.

If the mgmt app only cares about the core set of config params, then
we should ensure that they can do migration compatibility checks without
needing to care about all the extra irrelevant config params.

If apps want to use some parameters that are custom to specific dev
impls, then they'll have to have logic to expose those params, and
also logic to validate them on migration - if they are frontend ABI
sensitive config parameters, as opposed to backend only.

> It is possible to simplify the problem, but we'll lose freedom. For
> example, hard coding knowledge of the device implementation into the
> management tool eliminates the need for a general migration checking
> algorithm. Or we might be able to simplify it by explicitly not
> supporting cross-device implementation migration (although that would
> place stricter rules on what a new version of an existing device can
> change in order to preserve migration compatibility).

Is migrating between 2 different vendors' impls of the same core
device spec really a thing that's needed ? 

> I have doubts that these trade-offs can be made without losing support
> for use cases that are necessary.

From my POV, the key goal is that it should be possible to migrate
between two hosts without needing to check every single possible
config parameter that the device supports. It should only be neccessary
to check the parameters that are actually changed from their default
values. Then there just needs to be some simple string parameter that
encodes a particular set of devices, akin to the versioned machine
type.

Applications that want to migration between cross-vendor device impls
could opt-in to checking every single little parameter, but most can
just stick with a much simplified view where they only have to check
the parameters that they've actually overriden/exposed.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 11:41         ` Daniel P. Berrangé
@ 2020-11-16 12:03           ` Michael S. Tsirkin
  2020-11-16 12:05             ` Daniel P. Berrangé
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-16 12:03 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Thanos Makatos, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 11:41:25AM +0000, Daniel P. Berrangé wrote:
> > It is possible to simplify the problem, but we'll lose freedom. For
> > example, hard coding knowledge of the device implementation into the
> > management tool eliminates the need for a general migration checking
> > algorithm. Or we might be able to simplify it by explicitly not
> > supporting cross-device implementation migration (although that would
> > place stricter rules on what a new version of an existing device can
> > change in order to preserve migration compatibility).
> 
> Is migrating between 2 different vendors' impls of the same core
> device spec really a thing that's needed ? 

If there's intent to have this supercede vhost-user then certainly.
Same I'm guessing for NVMe.


> > I have doubts that these trade-offs can be made without losing support
> > for use cases that are necessary.
> 
> >From my POV, the key goal is that it should be possible to migrate
> between two hosts without needing to check every single possible
> config parameter that the device supports. It should only be neccessary
> to check the parameters that are actually changed from their default
> values. Then there just needs to be some simple string parameter that
> encodes a particular set of devices, akin to the versioned machine
> type.
> 
> Applications that want to migration between cross-vendor device impls
> could opt-in to checking every single little parameter, but most can
> just stick with a much simplified view where they only have to check
> the parameters that they've actually overriden/exposed.
> 
> Regards,
> Daniel

It's a problem even for a single vendor. And we have lots of experience
telling us it's a messy, difficult one. Just punting and saying
vendors will do the right thing will not lead to quality
implementations.


> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 12:03           ` Michael S. Tsirkin
@ 2020-11-16 12:05             ` Daniel P. Berrangé
  2020-11-16 12:34               ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel P. Berrangé @ 2020-11-16 12:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Thanos Makatos, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 07:03:03AM -0500, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2020 at 11:41:25AM +0000, Daniel P. Berrangé wrote:
> > > It is possible to simplify the problem, but we'll lose freedom. For
> > > example, hard coding knowledge of the device implementation into the
> > > management tool eliminates the need for a general migration checking
> > > algorithm. Or we might be able to simplify it by explicitly not
> > > supporting cross-device implementation migration (although that would
> > > place stricter rules on what a new version of an existing device can
> > > change in order to preserve migration compatibility).
> > 
> > Is migrating between 2 different vendors' impls of the same core
> > device spec really a thing that's needed ? 
> 
> If there's intent to have this supercede vhost-user then certainly.
> Same I'm guessing for NVMe.
> 
> 
> > > I have doubts that these trade-offs can be made without losing support
> > > for use cases that are necessary.
> > 
> > >From my POV, the key goal is that it should be possible to migrate
> > between two hosts without needing to check every single possible
> > config parameter that the device supports. It should only be neccessary
> > to check the parameters that are actually changed from their default
> > values. Then there just needs to be some simple string parameter that
> > encodes a particular set of devices, akin to the versioned machine
> > type.
> > 
> > Applications that want to migration between cross-vendor device impls
> > could opt-in to checking every single little parameter, but most can
> > just stick with a much simplified view where they only have to check
> > the parameters that they've actually overriden/exposed.
> 
> It's a problem even for a single vendor. And we have lots of experience
> telling us it's a messy, difficult one. Just punting and saying
> vendors will do the right thing will not lead to quality
> implementations.

I'm not suggesting we punt on the problem. I'm saying that checking for
migration compatibility should not need to be made more complex than what
we already do for QEMU. The core problem being tackled is essentially the
same in both cases.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:48     ` Daniel P. Berrangé
                         ` (2 preceding siblings ...)
  2020-11-16 11:15       ` Stefan Hajnoczi
@ 2020-11-16 12:06       ` Michael S. Tsirkin
  3 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-16 12:06 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Paolo Bonzini, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

On Wed, Nov 11, 2020 at 03:48:50PM +0000, Daniel P. Berrangé wrote:
> In terms of validation I can't help but feel the whole proposal is
> really very complicated.
> 
> In validating QEMU migration compatibility we merely compare the
> versioned machine type.
> 
> IIUC, in this proposal, it would be more like exploding the machine
> type into all its 100's of properties and then comparing each one
> individually.
> 
> I really prefer the simpler model of QEMU versioned machine types
> where compatibility is a simple string comparison, hiding the
> 100's of individual config parameters.  

I think we need to ship a tool with QEMU that handles this complexity.
If the tool spits out a very long string with all the needed parameters,
management won't need to care that it's not just a short
machine type.

-- 
MST



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 12:05             ` Daniel P. Berrangé
@ 2020-11-16 12:34               ` Michael S. Tsirkin
  2020-11-16 12:45                 ` Daniel P. Berrangé
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-16 12:34 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Thanos Makatos, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 12:05:18PM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 16, 2020 at 07:03:03AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Nov 16, 2020 at 11:41:25AM +0000, Daniel P. Berrangé wrote:
> > > > It is possible to simplify the problem, but we'll lose freedom. For
> > > > example, hard coding knowledge of the device implementation into the
> > > > management tool eliminates the need for a general migration checking
> > > > algorithm. Or we might be able to simplify it by explicitly not
> > > > supporting cross-device implementation migration (although that would
> > > > place stricter rules on what a new version of an existing device can
> > > > change in order to preserve migration compatibility).
> > > 
> > > Is migrating between 2 different vendors' impls of the same core
> > > device spec really a thing that's needed ? 
> > 
> > If there's intent to have this supercede vhost-user then certainly.
> > Same I'm guessing for NVMe.
> > 
> > 
> > > > I have doubts that these trade-offs can be made without losing support
> > > > for use cases that are necessary.
> > > 
> > > >From my POV, the key goal is that it should be possible to migrate
> > > between two hosts without needing to check every single possible
> > > config parameter that the device supports. It should only be neccessary
> > > to check the parameters that are actually changed from their default
> > > values. Then there just needs to be some simple string parameter that
> > > encodes a particular set of devices, akin to the versioned machine
> > > type.
> > > 
> > > Applications that want to migration between cross-vendor device impls
> > > could opt-in to checking every single little parameter, but most can
> > > just stick with a much simplified view where they only have to check
> > > the parameters that they've actually overriden/exposed.
> > 
> > It's a problem even for a single vendor. And we have lots of experience
> > telling us it's a messy, difficult one. Just punting and saying
> > vendors will do the right thing will not lead to quality
> > implementations.
> 
> I'm not suggesting we punt on the problem. I'm saying that checking for
> migration compatibility should not need to be made more complex than what
> we already do for QEMU. The core problem being tackled is essentially the
> same in both cases.
> 
> Regards,
> Daniel

There's a difference: in case of QEMU versions are release based.  At
release time a new version is generated.  So QEMU upstream ships version
X and Red Hat ships Y at a different time and they are not compatible.

This won't work for devices: same device needs to work with
both upstream and Red Hat and migrate upstream-upstream and Red Hat-Red Hat
(though not upstream-Red Hat).


> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 12:34               ` Michael S. Tsirkin
@ 2020-11-16 12:45                 ` Daniel P. Berrangé
  2020-11-16 12:51                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel P. Berrangé @ 2020-11-16 12:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Thanos Makatos, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 07:34:25AM -0500, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2020 at 12:05:18PM +0000, Daniel P. Berrangé wrote:
> > On Mon, Nov 16, 2020 at 07:03:03AM -0500, Michael S. Tsirkin wrote:
> > > On Mon, Nov 16, 2020 at 11:41:25AM +0000, Daniel P. Berrangé wrote:
> > > > > It is possible to simplify the problem, but we'll lose freedom. For
> > > > > example, hard coding knowledge of the device implementation into the
> > > > > management tool eliminates the need for a general migration checking
> > > > > algorithm. Or we might be able to simplify it by explicitly not
> > > > > supporting cross-device implementation migration (although that would
> > > > > place stricter rules on what a new version of an existing device can
> > > > > change in order to preserve migration compatibility).
> > > > 
> > > > Is migrating between 2 different vendors' impls of the same core
> > > > device spec really a thing that's needed ? 
> > > 
> > > If there's intent to have this supercede vhost-user then certainly.
> > > Same I'm guessing for NVMe.
> > > 
> > > 
> > > > > I have doubts that these trade-offs can be made without losing support
> > > > > for use cases that are necessary.
> > > > 
> > > > >From my POV, the key goal is that it should be possible to migrate
> > > > between two hosts without needing to check every single possible
> > > > config parameter that the device supports. It should only be neccessary
> > > > to check the parameters that are actually changed from their default
> > > > values. Then there just needs to be some simple string parameter that
> > > > encodes a particular set of devices, akin to the versioned machine
> > > > type.
> > > > 
> > > > Applications that want to migration between cross-vendor device impls
> > > > could opt-in to checking every single little parameter, but most can
> > > > just stick with a much simplified view where they only have to check
> > > > the parameters that they've actually overriden/exposed.
> > > 
> > > It's a problem even for a single vendor. And we have lots of experience
> > > telling us it's a messy, difficult one. Just punting and saying
> > > vendors will do the right thing will not lead to quality
> > > implementations.
> > 
> > I'm not suggesting we punt on the problem. I'm saying that checking for
> > migration compatibility should not need to be made more complex than what
> > we already do for QEMU. The core problem being tackled is essentially the
> > same in both cases.
> 
> There's a difference: in case of QEMU versions are release based.  At
> release time a new version is generated.  So QEMU upstream ships version
> X and Red Hat ships Y at a different time and they are not compatible.

That's a difference that Red Hat maintainers chose to introduce. RHEL
could have stuck with upstream QEMU machine types if it wished to, but
it chose to ship different machine types, because it made life easier
to backport features that impacted machine types, and also to some extent
to let us fix migration compat screw ups. We could have stuck to upstream
machine types though and remained compatible.  Many other distros do just
that.

> This won't work for devices: same device needs to work with
> both upstream and Red Hat and migrate upstream-upstream and Red Hat-Red Hat
> (though not upstream-Red Hat).

That's fine, we can cope with that. It simply means whomever owns
responsibility for maintaining the code has to be more careful about
changes they make in their downstream.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 11:15       ` Stefan Hajnoczi
  2020-11-16 11:41         ` Daniel P. Berrangé
@ 2020-11-16 12:48         ` Gerd Hoffmann
  2020-11-16 12:54           ` Michael S. Tsirkin
  1 sibling, 1 reply; 38+ messages in thread
From: Gerd Hoffmann @ 2020-11-16 12:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Felipe Franciosi, Christophe de Dinechin,
	Thanos Makatos

> > In validating QEMU migration compatibility we merely compare the
> > versioned machine type.
> 
> Thinking more about this, maybe the big picture is:
> 
> Today the managment tool controls the variables in the migration (the
> device configuration). It has knowledge of the VMM, can set a machine
> type, apply a device configuration on top, and then migrate safely.
> 
> VFIO changes this model because VMMs and management tools do not have
> knowledge of specific device implementations. The device implementation
> is a new source of variables in the migration and the management tool no
> longer has the full picture.

Well.  We actually have the variables.  They are device properties.
Then the qemu compat properties basically map a machine type to a
set of device properties.  That way we hide the complexity inside
qemu.  The management apps have to deal with the (versioned) machine
type only.

I guess now we have to decide whenever we want expose the individual
variables or whenever we want something like "profiles", i.e. basically
a set of variables with a name attached.

At the end of the day it is a complexity vs. flexibility tradeoff ...

take care,
  Gerd



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 12:45                 ` Daniel P. Berrangé
@ 2020-11-16 12:51                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-16 12:51 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: John G Johnson, Tian, Kevin, Yan Zhao, quintela, Jason Wang,
	Zeng, Xin, qemu-devel, Dr. David Alan Gilbert, Kirti Wankhede,
	Thanos Makatos, Alex Williamson, Gerd Hoffmann, Stefan Hajnoczi,
	Felipe Franciosi, Christophe de Dinechin, Paolo Bonzini

On Mon, Nov 16, 2020 at 12:45:49PM +0000, Daniel P. Berrangé wrote:
> > This won't work for devices: same device needs to work with
> > both upstream and Red Hat and migrate upstream-upstream and Red Hat-Red Hat
> > (though not upstream-Red Hat).
> 
> That's fine, we can cope with that. It simply means whomever owns
> responsibility for maintaining the code has to be more careful about
> changes they make in their downstream.

When we are talking about device vendors, "has to be more careful"
equals broken code. We need to make things super easy
for vendors doing the right thing, like publishing the migration
format, including a standard place to publish it and a tool
that uses that info to make migration just work.

-- 
MST



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 12:48         ` Gerd Hoffmann
@ 2020-11-16 12:54           ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-16 12:54 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: John G Johnson, Tian, Kevin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Stefan Hajnoczi, Felipe Franciosi,
	Christophe de Dinechin, Thanos Makatos

On Mon, Nov 16, 2020 at 01:48:58PM +0100, Gerd Hoffmann wrote:
> > > In validating QEMU migration compatibility we merely compare the
> > > versioned machine type.
> > 
> > Thinking more about this, maybe the big picture is:
> > 
> > Today the managment tool controls the variables in the migration (the
> > device configuration). It has knowledge of the VMM, can set a machine
> > type, apply a device configuration on top, and then migrate safely.
> > 
> > VFIO changes this model because VMMs and management tools do not have
> > knowledge of specific device implementations. The device implementation
> > is a new source of variables in the migration and the management tool no
> > longer has the full picture.
> 
> Well.  We actually have the variables.  They are device properties.
> Then the qemu compat properties basically map a machine type to a
> set of device properties.  That way we hide the complexity inside
> qemu.  The management apps have to deal with the (versioned) machine
> type only.
> 
> I guess now we have to decide whenever we want expose the individual
> variables or whenever we want something like "profiles", i.e. basically
> a set of variables with a name attached.
> 
> At the end of the day it is a complexity vs. flexibility tradeoff ...
> 
> take care,
>   Gerd

BTW it's not too bad if, for starters, we are doing exactly what we are
used to be doing, just ask vendors to supply the list all variables and
include it in the QEMU code. Has a benefit of limiting our support
matrix so we can make initial protocol changes with reasonable certainty
they do not break devices.  Once number of supported devices grows we
can think of sane ways to relax that...

-- 
MST



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 11:02         ` Stefan Hajnoczi
@ 2020-11-16 13:52           ` Cornelia Huck
  2020-11-16 17:30             ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2020-11-16 13:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Kirti Wankhede, Dr. David Alan Gilbert, Alex Williamson,
	Thanos Makatos, Gerd Hoffmann, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

[-- Attachment #1: Type: text/plain, Size: 3854 bytes --]

On Mon, 16 Nov 2020 11:02:51 +0000
Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:
> > On Wed, 11 Nov 2020 15:14:49 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >   
> > > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:  
> > > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:    
> > > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:    
> > > >     
> > > > > > Device models supported by an mdev driver and their details can be read from
> > > > > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > > > > parent device supports multiple device models then each device model has an
> > > > > > mdev type. There may be multiple mdev types for a single device model when they
> > > > > > offer different migration parameters such as resource capacity or feature
> > > > > > availability.
> > > > > > 
> > > > > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > > > > migration parameters, respectively.      
> > > > > 
> > > > > 
> > > > > I think this example could be expanded for clarity.  I think this is
> > > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > > migration parameter 'memory' for each defaults to a value matching the
> > > > > type name.  But it seems like this can also lead to some combinatorial
> > > > > challenges for management tools if these parameters are writable.  For
> > > > > example, should a management tool create a gfx-4GB device and change to
> > > > > memory parameter to 8192 or a gfx-8GB device with the default parameter?    
> > > > 
> > > > I would expect that the mdev types need to match in the first place.
> > > > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > > > have memory=8192 feels wrong to me.    
> > > 
> > > Yes, I expected these mdev types to only accept a fixed "memory" value,
> > > but there's nothing stopping a driver author from making "memory" accept
> > > any value.  
> > 
> > I'm wondering how useful the memory parameter is, then. The layer
> > checking for compatibility can filter out inconsistent settings, but
> > why would we need to express something that is already implied in the
> > mdev type separately?  
> 
> To avoid tying device instances to specific mdev types. An mdev type is
> a device implementation, but the goal is to enable migration between
> device implementations (new/old or completely different
> implementations).
> 
> Imagine a new physical device that now offers variable memory because
> users found the static mdev types too constraining.  How do you migrate
> back and forth between new and old physical devices if the migration
> parameters don't describe the memory size? Migration parameters make it
> possible. Without them the management tool needs to hard-code knowledge
> of specific mdev types that support migration.

But doesn't the management tool *still* need to keep hardcoded
information about what the value of that memory parameter was for an
existing mdev type? If we have gfx-variable with a memory parameter,
fine; but if the target is supposed to accept a gfx-4GB device, it
should simply instantiate a gfx-4GB device.

I'm getting a bit worried about the complexity of the checking that
management software is supposed to perform. Is it really that bad to
restrict the models to a few, well-defined ones? Especially in the mdev
case, where we have control about what is getting instantiated?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 15:41     ` Dr. David Alan Gilbert
@ 2020-11-16 14:38       ` Stefan Hajnoczi
  2020-11-17  9:44         ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 14:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, qemu-devel, Yan Zhao,
	Kirti Wankhede, Paolo Bonzini, Alex Williamson, Gerd Hoffmann,
	Felipe Franciosi, Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 3722 bytes --]

On Wed, Nov 11, 2020 at 03:41:59PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > Orchestrating Migrations
> > > > ------------------------
> > > > In order to migrate a device a *migration parameter list* must first be built
> > > > on the source. Each migration parameter is added to the list if it is in
> > > > effect. For example, the migration parameter list for a device with
> > > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > > parameter was introduced with the off value disabling its effect.
> > > 
> > > What component builds that list (i.e. what component needs to know the
> > > history that new-feature=off was the default - ah I think you answer
> > > that below).
> > 
> > Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> > 
> > > > The following conditions must be met to establish migration compatibility:
> > > > 
> > > > 1. The source and destination device model strings match.
> > > > 
> > > > 2. Each migration parameter name from the migration parameter list is supported
> > > >    by the destination. For example, the destination supports the num-queues
> > > >    migration parameter.
> > > > 
> > > > 3. Each migration parameter value from the migration parameter list is
> > > >    supported by the destination. For example, the destination supports
> > > >    num-queues=4.
> > > 
> > > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > > that a destination supports    num-queues=4 and  new-feature=on/off -
> > > but only supports new-feature=on when num-queues>2 ?
> > 
> > Yes, it's possible but cannot be expressed in the migration info JSON.
> > 
> > We need to choose a level of expressiveness that will be useful enough
> > without being complex. In the extreme the migration info would contain
> > Turing complete validation expressions (e.g. JavaScript) so that any
> > relationship can be expressed, but I doubt that complexity is needed.
> > The other extreme is just booleans and (opaque) strings for maximum
> > simplicity.
> > 
> > If the syntax is not expressive enough then it's impossible to check
> > migration compatibility without actually creating a new device instance
> > on the destination. Daniel Berrange raised the requirement of checking
> > migration compatibility without creating the device since this helps
> > with selecting a migration destination.
> 
> Right, but my worry isn't the JSON description, it's the set of 3
> conditions above; they need to state that only some combinations need to
> be valid.

Yes, the proposed syntax is simply not expressive enough. The migration
compatibility check will pass and then the destination will refuse to
set up the device (before the device state is transferred).

Any suggestions for a syntax without full-blown arithmetic and logic
expressions?

> > Any ideas for a better syntax?
> 
> I'd be happy with a --param name=value   repeatedly, but also know that
> some option parsers don't like that.

Another wart, Sphinx considers repeated options an error so you cannot
document options using rST option syntax. I remember having this problem
when documenting virtiofsd's command-line options :).

If something comes to mind please let me know. I'm not set on a
particular syntax, but I'd like to choose the one that is both
human-friendly and compatible with option parsers while avoiding
namespace collisions with the device implementation's own options.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-11 16:18 ` Thanos Makatos
@ 2020-11-16 15:24   ` Stefan Hajnoczi
  2020-11-24 17:29     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-11-16 15:24 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	Swapnil Ingle, quintela, Jason Wang, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Yan Zhao, Kirti Wankhede,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Paolo Bonzini, John Levon, changpeng.liu

[-- Attachment #1: Type: text/plain, Size: 4701 bytes --]

On Wed, Nov 11, 2020 at 04:18:34PM +0000, Thanos Makatos wrote:
> 
> > VFIO Migration
> > ==============
> > This document describes how to ensure migration compatibility for VFIO
> > devices,
> > including mdev and vfio-user devices.
> 
> Is this something all VFIO/user devices will have to support? If it's not
> mandatory, how can a device advertise support?

The --print-migration-info-json command-line option described below must
be implemented by the vfio-user device emulation program. Similarly,
VFIO/mdev devices must provide the migration/ sysfs group.

If the device implementation does not expose these standard interfaces
then management tools can still attempt to migrate them, but there is no
migration compatibility check or algorithm for setting up the
destination device. In other words, it will only succeed with some luck
or by hardcoding knowledge of the specific device implementation into
the management tool.

> 
> > Multiple device implementations can support the same device model. Doing
> > so
> > means that the device implementations can offer migration compatiblity
> > because
> > they support the same hardware interface, device state representation, and
> > migration parameters.
> 
> Does the above mean that a passthrough function can be migrated to a vfio-user
> program and vice versa? If so, then it's worth mentioning.

Yes, if they are migration compatible (they support the same device
model and migration parameters) then migration is possible. I'll make
this clear in the next revision.

Note VFIO migration is currently only working for mdev devices. Alex
Williamson mentioned that it could be extended to core VFIO PCI devices
(without mdev) in the future.

> > More complex device emulation programs may host multiple devices. The
> > interface
> > for configuring these device emulation programs is not standardized.
> > Therefore,
> > migrating these devices is beyond the scope of this document.
> 
> Most likely a device emulation program hosting multile devices would allow
> some form of communication for control purposes (e.g. SPDK implements a JSON-RPC
> server). So maybe it's possible to define interacting with such programs in
> this document?

Yes, it's definitely possible. There needs to be agreement on the RPC
mechanism. QEMU implements QMP, SPDK has something similar but
different, gRPC/Protobuf is popular, and D-Bus is another alternative. I
asked about RPC mechanisms on the muser Slack instance to see if there
was consensus but it seems to be a bit early for that.

Perhaps the most realistic option will be to define bindings to several
RPC mechanisms. That way everyone can use their preferred RPC mechanism,
at the cost of requiring management tools to support more than one
(which some already do, e.g. libvirt uses XDR itself but also implements
QEMU's QMP).

> > 
> > The migration information JSON is printed to standard output by a vfio-user
> > device emulation program as follows:
> > 
> > .. code:: bash
> > 
> >   $ my-device --print-migration-info-json
> > 
> > The device is instantiated by launching the destination process with the
> > migration parameter list from the source:
> 
> Must 'my-device --print-migration-info-json' always generate the same migration
> information JSON? If so, then what if the output generated by
> 'my-device --print-migration-info-json' depends on additional arguments passed
> to 'my-device' when it was originally started?

Yes, it needs to be stable in the sense that you can invoke the program
with --print-migration-info-json and then expect launching the program
to succeed with migration parameters that are valid according to the
JSON.

Running the same device emulation binary on different hosts can produce
different JSON. This is because the binary may rely on host hardware
resources or features (e.g. does this host have GPUs available?).

It gets trickier when considering host reboots. I think the JSON can
change between reboots. However, the management tools may cache the JSON
so there needs to be a rule about when to refresh it.

Regarding additional command-line arguments, they can affect the JSON
output. For example, they could include the connection details to an
iSCSI LUN and affect the block size migration parameter. This leads to
the same issue - can they be cached by the management tool? The answer
is the same - stability is needed in the short-term to avoid unexpected
failures when launching the program, but over the longer term we should
allow JSON changes.

Thanks for raising these points. I'll add details to the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 13:52           ` Cornelia Huck
@ 2020-11-16 17:30             ` Alex Williamson
  2020-11-24 17:24               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2020-11-16 17:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	quintela, Jason Wang, Felipe Franciosi, Zeng, Xin, qemu-devel,
	Dr. David Alan Gilbert, Kirti Wankhede, Thanos Makatos,
	Gerd Hoffmann, Stefan Hajnoczi, Paolo Bonzini,
	Christophe de Dinechin, Yan Zhao

On Mon, 16 Nov 2020 14:52:26 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 16 Nov 2020 11:02:51 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:  
> > > On Wed, 11 Nov 2020 15:14:49 +0000
> > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >     
> > > > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:    
> > > > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > > > Alex Williamson <alex.williamson@redhat.com> wrote:      
> > > > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:      
> > > > >       
> > > > > > > Device models supported by an mdev driver and their details can be read from
> > > > > > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > > > > > parent device supports multiple device models then each device model has an
> > > > > > > mdev type. There may be multiple mdev types for a single device model when they
> > > > > > > offer different migration parameters such as resource capacity or feature
> > > > > > > availability.
> > > > > > > 
> > > > > > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > > > > > migration parameters, respectively.        
> > > > > > 
> > > > > > 
> > > > > > I think this example could be expanded for clarity.  I think this is
> > > > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > > > migration parameter 'memory' for each defaults to a value matching the
> > > > > > type name.  But it seems like this can also lead to some combinatorial
> > > > > > challenges for management tools if these parameters are writable.  For
> > > > > > example, should a management tool create a gfx-4GB device and change to
> > > > > > memory parameter to 8192 or a gfx-8GB device with the default parameter?      
> > > > > 
> > > > > I would expect that the mdev types need to match in the first place.
> > > > > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > > > > have memory=8192 feels wrong to me.      
> > > > 
> > > > Yes, I expected these mdev types to only accept a fixed "memory" value,
> > > > but there's nothing stopping a driver author from making "memory" accept
> > > > any value.    
> > > 
> > > I'm wondering how useful the memory parameter is, then. The layer
> > > checking for compatibility can filter out inconsistent settings, but
> > > why would we need to express something that is already implied in the
> > > mdev type separately?    
> > 
> > To avoid tying device instances to specific mdev types. An mdev type is
> > a device implementation, but the goal is to enable migration between
> > device implementations (new/old or completely different
> > implementations).
> > 
> > Imagine a new physical device that now offers variable memory because
> > users found the static mdev types too constraining.  How do you migrate
> > back and forth between new and old physical devices if the migration
> > parameters don't describe the memory size? Migration parameters make it
> > possible. Without them the management tool needs to hard-code knowledge
> > of specific mdev types that support migration.  
> 
> But doesn't the management tool *still* need to keep hardcoded
> information about what the value of that memory parameter was for an
> existing mdev type? If we have gfx-variable with a memory parameter,
> fine; but if the target is supposed to accept a gfx-4GB device, it
> should simply instantiate a gfx-4GB device.
> 
> I'm getting a bit worried about the complexity of the checking that
> management software is supposed to perform. Is it really that bad to
> restrict the models to a few, well-defined ones? Especially in the mdev
> case, where we have control about what is getting instantiated?

This is exactly what I was noting with the combinatorial challenges of
the management tool.  If a vendor chooses to use a generic base device
model which they modify with parameters to match an assortment of mdev
types, then management tools will need to match every mdev type
implementing that device model to determine if compatible parameters
exist.  OTOH, the vendor could choose to create a device model that
specifically describes a single configuration of known parameters.

For example, mdev type gfx-4GB might be a device model com.gfx/GPU with
a fixed memory parameter of 4GB or it could be a device model
com.gfx/GPU-4G with no additional parameter.  The hard part is when the
vendor offers an mdev type gfx-varGB with device model com.gfx/GPU and
available memory options of 1GB, 2GB, 4GB, 8GB.  At that point a
management tool might decide to create a gfx-varGB device instance and
tune the memory parameter or create a gfx-4GB instance, either would be
correct and we've expressed no preference for one or the other.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 14:38       ` Stefan Hajnoczi
@ 2020-11-17  9:44         ` Michael S. Tsirkin
  2020-12-01 13:17           ` Stefan Hajnoczi
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2020-11-17  9:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, Dr. David Alan Gilbert,
	qemu-devel, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Thanos Makatos

On Mon, Nov 16, 2020 at 02:38:12PM +0000, Stefan Hajnoczi wrote:
> On Wed, Nov 11, 2020 at 03:41:59PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > Orchestrating Migrations
> > > > > ------------------------
> > > > > In order to migrate a device a *migration parameter list* must first be built
> > > > > on the source. Each migration parameter is added to the list if it is in
> > > > > effect. For example, the migration parameter list for a device with
> > > > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > > > parameter was introduced with the off value disabling its effect.
> > > > 
> > > > What component builds that list (i.e. what component needs to know the
> > > > history that new-feature=off was the default - ah I think you answer
> > > > that below).
> > > 
> > > Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> > > 
> > > > > The following conditions must be met to establish migration compatibility:
> > > > > 
> > > > > 1. The source and destination device model strings match.
> > > > > 
> > > > > 2. Each migration parameter name from the migration parameter list is supported
> > > > >    by the destination. For example, the destination supports the num-queues
> > > > >    migration parameter.
> > > > > 
> > > > > 3. Each migration parameter value from the migration parameter list is
> > > > >    supported by the destination. For example, the destination supports
> > > > >    num-queues=4.
> > > > 
> > > > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > > > that a destination supports    num-queues=4 and  new-feature=on/off -
> > > > but only supports new-feature=on when num-queues>2 ?
> > > 
> > > Yes, it's possible but cannot be expressed in the migration info JSON.
> > > 
> > > We need to choose a level of expressiveness that will be useful enough
> > > without being complex. In the extreme the migration info would contain
> > > Turing complete validation expressions (e.g. JavaScript) so that any
> > > relationship can be expressed, but I doubt that complexity is needed.
> > > The other extreme is just booleans and (opaque) strings for maximum
> > > simplicity.
> > > 
> > > If the syntax is not expressive enough then it's impossible to check
> > > migration compatibility without actually creating a new device instance
> > > on the destination. Daniel Berrange raised the requirement of checking
> > > migration compatibility without creating the device since this helps
> > > with selecting a migration destination.
> > 
> > Right, but my worry isn't the JSON description, it's the set of 3
> > conditions above; they need to state that only some combinations need to
> > be valid.
> 
> Yes, the proposed syntax is simply not expressive enough. The migration
> compatibility check will pass and then the destination will refuse to
> set up the device (before the device state is transferred).
> 
> Any suggestions for a syntax without full-blown arithmetic and logic
> expressions?
> 
> > > Any ideas for a better syntax?
> > 
> > I'd be happy with a --param name=value   repeatedly, but also know that
> > some option parsers don't like that.
> 
> Another wart, Sphinx considers repeated options an error so you cannot
> document options using rST option syntax. I remember having this problem
> when documenting virtiofsd's command-line options :).
> 
> If something comes to mind please let me know. I'm not set on a
> particular syntax, but I'd like to choose the one that is both
> human-friendly and compatible with option parsers while avoiding
> namespace collisions with the device implementation's own options.
> 
> Stefan


I think the simplest way is just to include and open-source tool
for figuring all this out together with qemu.
Any vendor interested in supporting migration with qemu
will then just submit a patch for that tool.

This will also help make sure this interface
is not just a way to bypass GPL, we can ask that the
supporting server is opensource.

And it will help us guide vendors towards supporting migration
correctly.

-- 
MST



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 17:30             ` Alex Williamson
@ 2020-11-24 17:24               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-11-24 17:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	Felipe Franciosi, quintela, Jason Wang, Cornelia Huck, Zeng, Xin,
	qemu-devel, Kirti Wankhede, Thanos Makatos, Gerd Hoffmann,
	Stefan Hajnoczi, Paolo Bonzini, Christophe de Dinechin, Yan Zhao

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Mon, 16 Nov 2020 14:52:26 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Mon, 16 Nov 2020 11:02:51 +0000
> > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > > On Wed, Nov 11, 2020 at 04:35:43PM +0100, Cornelia Huck wrote:  
> > > > On Wed, 11 Nov 2020 15:14:49 +0000
> > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >     
> > > > > On Wed, Nov 11, 2020 at 12:48:53PM +0100, Cornelia Huck wrote:    
> > > > > > On Tue, 10 Nov 2020 13:14:04 -0700
> > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:      
> > > > > > > On Tue, 10 Nov 2020 09:53:49 +0000
> > > > > > > Stefan Hajnoczi <stefanha@redhat.com> wrote:      
> > > > > >       
> > > > > > > > Device models supported by an mdev driver and their details can be read from
> > > > > > > > the migration_info.json attr. Each mdev type supports one device model. If a
> > > > > > > > parent device supports multiple device models then each device model has an
> > > > > > > > mdev type. There may be multiple mdev types for a single device model when they
> > > > > > > > offer different migration parameters such as resource capacity or feature
> > > > > > > > availability.
> > > > > > > > 
> > > > > > > > For example, a graphics card that supports 4 GB and 8 GB device instances would
> > > > > > > > provide gfx-4GB and gfx-8GB mdev types with memory=4096 and memory=8192
> > > > > > > > migration parameters, respectively.        
> > > > > > > 
> > > > > > > 
> > > > > > > I think this example could be expanded for clarity.  I think this is
> > > > > > > suggesting we have mdev_types of gfx-4GB and gfx-8GB, which each
> > > > > > > implement some common device model, ie. com.gfx/GPU, where the
> > > > > > > migration parameter 'memory' for each defaults to a value matching the
> > > > > > > type name.  But it seems like this can also lead to some combinatorial
> > > > > > > challenges for management tools if these parameters are writable.  For
> > > > > > > example, should a management tool create a gfx-4GB device and change to
> > > > > > > memory parameter to 8192 or a gfx-8GB device with the default parameter?      
> > > > > > 
> > > > > > I would expect that the mdev types need to match in the first place.
> > > > > > What role would the memory= parameter play, then? Allowing gfx-4GB to
> > > > > > have memory=8192 feels wrong to me.      
> > > > > 
> > > > > Yes, I expected these mdev types to only accept a fixed "memory" value,
> > > > > but there's nothing stopping a driver author from making "memory" accept
> > > > > any value.    
> > > > 
> > > > I'm wondering how useful the memory parameter is, then. The layer
> > > > checking for compatibility can filter out inconsistent settings, but
> > > > why would we need to express something that is already implied in the
> > > > mdev type separately?    
> > > 
> > > To avoid tying device instances to specific mdev types. An mdev type is
> > > a device implementation, but the goal is to enable migration between
> > > device implementations (new/old or completely different
> > > implementations).
> > > 
> > > Imagine a new physical device that now offers variable memory because
> > > users found the static mdev types too constraining.  How do you migrate
> > > back and forth between new and old physical devices if the migration
> > > parameters don't describe the memory size? Migration parameters make it
> > > possible. Without them the management tool needs to hard-code knowledge
> > > of specific mdev types that support migration.  
> > 
> > But doesn't the management tool *still* need to keep hardcoded
> > information about what the value of that memory parameter was for an
> > existing mdev type? If we have gfx-variable with a memory parameter,
> > fine; but if the target is supposed to accept a gfx-4GB device, it
> > should simply instantiate a gfx-4GB device.
> > 
> > I'm getting a bit worried about the complexity of the checking that
> > management software is supposed to perform. Is it really that bad to
> > restrict the models to a few, well-defined ones? Especially in the mdev
> > case, where we have control about what is getting instantiated?
> 
> This is exactly what I was noting with the combinatorial challenges of
> the management tool.  If a vendor chooses to use a generic base device
> model which they modify with parameters to match an assortment of mdev
> types, then management tools will need to match every mdev type
> implementing that device model to determine if compatible parameters
> exist.  OTOH, the vendor could choose to create a device model that
> specifically describes a single configuration of known parameters.
> 
> For example, mdev type gfx-4GB might be a device model com.gfx/GPU with
> a fixed memory parameter of 4GB or it could be a device model
> com.gfx/GPU-4G with no additional parameter.  The hard part is when the
> vendor offers an mdev type gfx-varGB with device model com.gfx/GPU and
> available memory options of 1GB, 2GB, 4GB, 8GB.  At that point a
> management tool might decide to create a gfx-varGB device instance and
> tune the memory parameter or create a gfx-4GB instance, either would be
> correct and we've expressed no preference for one or the other.  Thanks,

What you've described here is exactly what happens with QEMU/libvirts
confusion of CPU models.  Both QEMU and Libvirt have their idea of what
a named CPU model means and then add/subtract flags to get what they
want.
When libvirt wants a CPU model that doesn't quite match what it has
(e.g. a host-compatibility thing where the host is a CPU it didn't know)
it's heuristics to either start from above and remove things or start
from below and add them.

Dave

> Alex
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-16 15:24   ` Stefan Hajnoczi
@ 2020-11-24 17:29     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2020-11-24 17:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John G Johnson, Tian, Kevin, mtsirkin, Daniel P. Berrangé,
	Swapnil Ingle, quintela, Jason Wang, Zeng, Xin, qemu-devel,
	John Levon, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Thanos Makatos, changpeng.liu

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 11, 2020 at 04:18:34PM +0000, Thanos Makatos wrote:
> > 
> > > VFIO Migration
> > > ==============
> > > This document describes how to ensure migration compatibility for VFIO
> > > devices,
> > > including mdev and vfio-user devices.
> > 
> > Is this something all VFIO/user devices will have to support? If it's not
> > mandatory, how can a device advertise support?
> 
> The --print-migration-info-json command-line option described below must
> be implemented by the vfio-user device emulation program. Similarly,
> VFIO/mdev devices must provide the migration/ sysfs group.
> 
> If the device implementation does not expose these standard interfaces
> then management tools can still attempt to migrate them, but there is no
> migration compatibility check or algorithm for setting up the
> destination device. In other words, it will only succeed with some luck
> or by hardcoding knowledge of the specific device implementation into
> the management tool.
> 
> > 
> > > Multiple device implementations can support the same device model. Doing
> > > so
> > > means that the device implementations can offer migration compatiblity
> > > because
> > > they support the same hardware interface, device state representation, and
> > > migration parameters.
> > 
> > Does the above mean that a passthrough function can be migrated to a vfio-user
> > program and vice versa? If so, then it's worth mentioning.
> 
> Yes, if they are migration compatible (they support the same device
> model and migration parameters) then migration is possible. I'll make
> this clear in the next revision.
> 
> Note VFIO migration is currently only working for mdev devices. Alex
> Williamson mentioned that it could be extended to core VFIO PCI devices
> (without mdev) in the future.
> 
> > > More complex device emulation programs may host multiple devices. The
> > > interface
> > > for configuring these device emulation programs is not standardized.
> > > Therefore,
> > > migrating these devices is beyond the scope of this document.
> > 
> > Most likely a device emulation program hosting multile devices would allow
> > some form of communication for control purposes (e.g. SPDK implements a JSON-RPC
> > server). So maybe it's possible to define interacting with such programs in
> > this document?
> 
> Yes, it's definitely possible. There needs to be agreement on the RPC
> mechanism. QEMU implements QMP, SPDK has something similar but
> different, gRPC/Protobuf is popular, and D-Bus is another alternative. I
> asked about RPC mechanisms on the muser Slack instance to see if there
> was consensus but it seems to be a bit early for that.
> 
> Perhaps the most realistic option will be to define bindings to several
> RPC mechanisms. That way everyone can use their preferred RPC mechanism,
> at the cost of requiring management tools to support more than one
> (which some already do, e.g. libvirt uses XDR itself but also implements
> QEMU's QMP).
> 
> > > 
> > > The migration information JSON is printed to standard output by a vfio-user
> > > device emulation program as follows:
> > > 
> > > .. code:: bash
> > > 
> > >   $ my-device --print-migration-info-json
> > > 
> > > The device is instantiated by launching the destination process with the
> > > migration parameter list from the source:
> > 
> > Must 'my-device --print-migration-info-json' always generate the same migration
> > information JSON? If so, then what if the output generated by
> > 'my-device --print-migration-info-json' depends on additional arguments passed
> > to 'my-device' when it was originally started?
> 
> Yes, it needs to be stable in the sense that you can invoke the program
> with --print-migration-info-json and then expect launching the program
> to succeed with migration parameters that are valid according to the
> JSON.
> 
> Running the same device emulation binary on different hosts can produce
> different JSON. This is because the binary may rely on host hardware
> resources or features (e.g. does this host have GPUs available?).
> 
> It gets trickier when considering host reboots. I think the JSON can
> change between reboots. However, the management tools may cache the JSON
> so there needs to be a rule about when to refresh it.

libvirt does something similar for QEMU's current capabilities; it
normally works fine; very occasionally you have to flush the cache
though if you do something surprising which causes it to change
capabilities.

Dave

> Regarding additional command-line arguments, they can affect the JSON
> output. For example, they could include the connection details to an
> iSCSI LUN and affect the block size migration parameter. This leads to
> the same issue - can they be cached by the management tool? The answer
> is the same - stability is needed in the short-term to avoid unexpected
> failures when launching the program, but over the longer term we should
> allow JSON changes.
> 
> Thanks for raising these points. I'll add details to the next revision.
> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC v3] VFIO Migration
  2020-11-17  9:44         ` Michael S. Tsirkin
@ 2020-12-01 13:17           ` Stefan Hajnoczi
  0 siblings, 0 replies; 38+ messages in thread
From: Stefan Hajnoczi @ 2020-12-01 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John G Johnson, Tian, Kevin, Daniel P. Berrangé,
	quintela, Jason Wang, Zeng, Xin, Dr. David Alan Gilbert,
	qemu-devel, Yan Zhao, Kirti Wankhede, Paolo Bonzini,
	Alex Williamson, Gerd Hoffmann, Felipe Franciosi,
	Christophe de Dinechin, Thanos Makatos

[-- Attachment #1: Type: text/plain, Size: 5170 bytes --]

On Tue, Nov 17, 2020 at 04:44:52AM -0500, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2020 at 02:38:12PM +0000, Stefan Hajnoczi wrote:
> > On Wed, Nov 11, 2020 at 03:41:59PM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > > Orchestrating Migrations
> > > > > > ------------------------
> > > > > > In order to migrate a device a *migration parameter list* must first be built
> > > > > > on the source. Each migration parameter is added to the list if it is in
> > > > > > effect. For example, the migration parameter list for a device with
> > > > > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature migration
> > > > > > parameter was introduced with the off value disabling its effect.
> > > > > 
> > > > > What component builds that list (i.e. what component needs to know the
> > > > > history that new-feature=off was the default - ah I think you answer
> > > > > that below).
> > > > 
> > > > Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> > > > 
> > > > > > The following conditions must be met to establish migration compatibility:
> > > > > > 
> > > > > > 1. The source and destination device model strings match.
> > > > > > 
> > > > > > 2. Each migration parameter name from the migration parameter list is supported
> > > > > >    by the destination. For example, the destination supports the num-queues
> > > > > >    migration parameter.
> > > > > > 
> > > > > > 3. Each migration parameter value from the migration parameter list is
> > > > > >    supported by the destination. For example, the destination supports
> > > > > >    num-queues=4.
> > > > > 
> > > > > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > > > > that a destination supports    num-queues=4 and  new-feature=on/off -
> > > > > but only supports new-feature=on when num-queues>2 ?
> > > > 
> > > > Yes, it's possible but cannot be expressed in the migration info JSON.
> > > > 
> > > > We need to choose a level of expressiveness that will be useful enough
> > > > without being complex. In the extreme the migration info would contain
> > > > Turing complete validation expressions (e.g. JavaScript) so that any
> > > > relationship can be expressed, but I doubt that complexity is needed.
> > > > The other extreme is just booleans and (opaque) strings for maximum
> > > > simplicity.
> > > > 
> > > > If the syntax is not expressive enough then it's impossible to check
> > > > migration compatibility without actually creating a new device instance
> > > > on the destination. Daniel Berrange raised the requirement of checking
> > > > migration compatibility without creating the device since this helps
> > > > with selecting a migration destination.
> > > 
> > > Right, but my worry isn't the JSON description, it's the set of 3
> > > conditions above; they need to state that only some combinations need to
> > > be valid.
> > 
> > Yes, the proposed syntax is simply not expressive enough. The migration
> > compatibility check will pass and then the destination will refuse to
> > set up the device (before the device state is transferred).
> > 
> > Any suggestions for a syntax without full-blown arithmetic and logic
> > expressions?
> > 
> > > > Any ideas for a better syntax?
> > > 
> > > I'd be happy with a --param name=value   repeatedly, but also know that
> > > some option parsers don't like that.
> > 
> > Another wart, Sphinx considers repeated options an error so you cannot
> > document options using rST option syntax. I remember having this problem
> > when documenting virtiofsd's command-line options :).
> > 
> > If something comes to mind please let me know. I'm not set on a
> > particular syntax, but I'd like to choose the one that is both
> > human-friendly and compatible with option parsers while avoiding
> > namespace collisions with the device implementation's own options.
> > 
> > Stefan
> 
> 
> I think the simplest way is just to include and open-source tool
> for figuring all this out together with qemu.
> Any vendor interested in supporting migration with qemu
> will then just submit a patch for that tool.
> 
> This will also help make sure this interface
> is not just a way to bypass GPL, we can ask that the
> supporting server is opensource.
> 
> And it will help us guide vendors towards supporting migration
> correctly.

Can you describe the tool's command-line interface in more detail? Does
this tool completely replace the VFIO/mdev sysfs and vfio-user
command-line interfaces?

Is it just for vfio-user devices or also for VFIO/mdev devices?

Regarding GPL, I guess you mean the tool maintainers in QEMU would only
accept patches if the corresponding device backend implementation is
open source (GPL, MIT, BSD, Apache, etc)? I'm not sure if that helps
since proprietary vendors ship their own QEMU or can replace this tool
with another binary.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-12-01 13:19 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10  9:53 [RFC v3] VFIO Migration Stefan Hajnoczi
2020-11-10 11:12 ` Paolo Bonzini
2020-11-11 14:36   ` Stefan Hajnoczi
2020-11-11 15:48     ` Daniel P. Berrangé
2020-11-12 15:26       ` Cornelia Huck
2020-11-16 10:48       ` Stefan Hajnoczi
2020-11-16 11:15       ` Stefan Hajnoczi
2020-11-16 11:41         ` Daniel P. Berrangé
2020-11-16 12:03           ` Michael S. Tsirkin
2020-11-16 12:05             ` Daniel P. Berrangé
2020-11-16 12:34               ` Michael S. Tsirkin
2020-11-16 12:45                 ` Daniel P. Berrangé
2020-11-16 12:51                   ` Michael S. Tsirkin
2020-11-16 12:48         ` Gerd Hoffmann
2020-11-16 12:54           ` Michael S. Tsirkin
2020-11-16 12:06       ` Michael S. Tsirkin
2020-11-10 20:14 ` Alex Williamson
2020-11-11 11:48   ` Cornelia Huck
2020-11-11 15:14     ` Stefan Hajnoczi
2020-11-11 15:35       ` Cornelia Huck
2020-11-16 11:02         ` Stefan Hajnoczi
2020-11-16 13:52           ` Cornelia Huck
2020-11-16 17:30             ` Alex Williamson
2020-11-24 17:24               ` Dr. David Alan Gilbert
2020-11-11 15:10   ` Stefan Hajnoczi
2020-11-11 15:28     ` Cornelia Huck
2020-11-16 11:36       ` Stefan Hajnoczi
2020-11-11 11:19 ` Cornelia Huck
2020-11-11 15:35   ` Stefan Hajnoczi
2020-11-11 12:56 ` Dr. David Alan Gilbert
2020-11-11 15:34   ` Stefan Hajnoczi
2020-11-11 15:41     ` Dr. David Alan Gilbert
2020-11-16 14:38       ` Stefan Hajnoczi
2020-11-17  9:44         ` Michael S. Tsirkin
2020-12-01 13:17           ` Stefan Hajnoczi
2020-11-11 16:18 ` Thanos Makatos
2020-11-16 15:24   ` Stefan Hajnoczi
2020-11-24 17:29     ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.