linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
@ 2020-04-13  5:52 Yan Zhao
  2020-04-13  5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
                   ` (4 more replies)
  0 siblings, 5 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-13  5:52 UTC (permalink / raw)
  To: intel-gvt-dev
  Cc: libvir-list, kvm, linux-doc, linux-kernel, aik, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, eauger, yi.l.liu, xin.zeng, ziye.yang,
	mlevitsk, pasic, felipe, changpeng.liu, Ken.Xue, jonathan.davies,
	shaopeng.he, alex.williamson, eskultet, dgilbert, cohuck,
	kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede, berrange,
	dinechin, corbet, Yan Zhao

This patchset introduces a migration_version attribute under sysfs of VFIO
Mediated devices.

This migration_version attribute is used to check migration compatibility
between two mdev devices.

Currently, it has two locations:
(1) under mdev_type node,
    which can be used even before device creation, but only for mdev
    devices of the same mdev type.
(2) under mdev device node,
    which can only be used after the mdev devices are created, but the src
    and target mdev devices are not necessarily be of the same mdev type
(The second location is newly added in v5, in order to keep consistent
with the migration_version node for migratable pass-though devices)

Patch 1 defines migration_version attribute for the first location in
Documentation/vfio-mediated-device.txt

Patch 2 uses GVT as an example for patch 1 to show how to expose
migration_version attribute and check migration compatibility in vendor
driver.

Patch 3 defines migration_version attribute for the second location in
Documentation/vfio-mediated-device.txt

Patch 4 uses GVT as an example for patch 3 to show how to expose
migration_version attribute and check migration compatibility in vendor
driver.

(The previous "Reviewed-by" and "Acked-by" for patch 1 and patch 2 are
kept in v5, as there are only small changes to commit messages of the two
patches.)

v5:
added patch 2 and 4 for mdev device part of migration_version attribute.

v4:
1. fixed indentation/spell errors, reworded several error messages
2. added a missing memory free for error handling in patch 2

v3:
1. renamed version to migration_version
2. let errno to be freely defined by vendor driver
3. let checking mdev_type be prerequisite of migration compatibility check
4. reworded most part of patch 1
5. print detailed error log in patch 2 and generate migration_version
string at init time

v2:
1. renamed patched 1
2. made definition of device version string completely private to vendor
driver
3. reverted changes to sample mdev drivers
4. described intent and usage of version attribute more clearly.


Yan Zhao (4):
  vfio/mdev: add migration_version attribute for mdev (under mdev_type
    node)
  drm/i915/gvt: export migration_version to mdev sysfs (under mdev_type
    node)
  vfio/mdev: add migration_version attribute for mdev (under mdev device
    node)
  drm/i915/gvt: export migration_version to mdev sysfs (under mdev
    device node)

 .../driver-api/vfio-mediated-device.rst       | 183 ++++++++++++++++++
 drivers/gpu/drm/i915/gvt/Makefile             |   2 +-
 drivers/gpu/drm/i915/gvt/gvt.c                |  39 ++++
 drivers/gpu/drm/i915/gvt/gvt.h                |   7 +
 drivers/gpu/drm/i915/gvt/kvmgt.c              |  55 ++++++
 drivers/gpu/drm/i915/gvt/migration_version.c  | 170 ++++++++++++++++
 drivers/gpu/drm/i915/gvt/vgpu.c               |  13 +-
 7 files changed, 466 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node)
  2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
@ 2020-04-13  5:54 ` Yan Zhao
  2020-04-15  7:28   ` Erik Skultety
  2020-04-13  5:54 ` [PATCH v5 2/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-13  5:54 UTC (permalink / raw)
  To: intel-gvt-dev
  Cc: libvir-list, kvm, linux-doc, linux-kernel, aik, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, eauger, yi.l.liu, xin.zeng, ziye.yang,
	mlevitsk, pasic, felipe, changpeng.liu, Ken.Xue, jonathan.davies,
	shaopeng.he, alex.williamson, eskultet, dgilbert, cohuck,
	kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede, berrange,
	dinechin, corbet, Yan Zhao

migration_version attribute is used to check migration compatibility
between two mdev devices of the same mdev type.
The key is that it's rw and its data is opaque to userspace.

Userspace reads migration_version of mdev device at source side and
writes the value to migration_version attribute of mdev device at target
side. It judges migration compatibility according to whether the read
and write operations succeed or fail.

Currently, it is able to read/write migration_version attribute under two
places:

(1) under mdev_type node
userspace is able to know whether two mdev devices are compatible before
a mdev device is created.

userspace also needs to check whether the two mdev devices are of the same
mdev type before checking the migration_version attribute. It also needs
to check device creation parameters if aggregation is supported in future.

(2) under mdev device node
userspace is able to know whether two mdev devices are compatible after
they are all created. But it does not need to check mdev type and device
creation parameter for aggregation as device vendor driver would have
incorporated those information into the migration_version attribute.

             __    userspace
              /\              \
             /                 \write
            / read              \
   ________/__________       ___\|/_____________
  | migration_version |     | migration_version |-->check migration
  ---------------------     ---------------------   compatibility
    mdev device A               mdev device B

This patch is for mdev documentation about the first place (under
mdev_type node)

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Erik Skultety <eskultet@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Christophe de Dinechin <dinechin@redhat.com>

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>

---
v5:
updated commit message a little to indicate this patch is for
migration_version attribute under mdev_type node

v4:
fixed a typo. (Cornelia Huck)

v3:
1. renamed version to migration_version
(Christophe de Dinechin, Cornelia Huck, Alex Williamson)
2. let errno to be freely defined by vendor driver
(Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
3. let checking mdev_type be prerequisite of migration compatibility
check. (Alex Williamson)
4. reworded example usage section.
(most of this section came from Alex Williamson)
5. reworded attribute intention section (Cornelia Huck)

v2:
1. added detailed intent and usage
2. made definition of version string completely private to vendor driver
   (Alex Williamson)
3. abandoned changes to sample mdev drivers (Alex Williamson)
4. mandatory --> optional (Cornelia Huck)
5. added description for errno (Cornelia Huck)
---
 .../driver-api/vfio-mediated-device.rst       | 113 ++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..2d1f3c0f3c8f 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
   |     |   |--- available_instances
   |     |   |--- device_api
   |     |   |--- description
+  |     |   |--- migration_version
   |     |   |--- [devices]
   |     |--- [<type-id>]
   |     |   |--- create
@@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
   |     |   |--- available_instances
   |     |   |--- device_api
   |     |   |--- description
+  |     |   |--- migration_version
   |     |   |--- [devices]
   |     |--- [<type-id>]
   |          |--- create
@@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
   |          |--- available_instances
   |          |--- device_api
   |          |--- description
+  |          |--- migration_version
   |          |--- [devices]
 
 * [mdev_supported_types]
@@ -246,6 +249,116 @@ Directories and files under the sysfs for Each Physical Device
   This attribute should show the number of devices of type <type-id> that can be
   created.
 
+* migration_version
+
+  This attribute is rw, and is optional.
+  It is used to check migration compatibility between two mdev devices of the
+  same mdev type. Absence of this attribute means the device of type <type-id>
+  does not support migration.
+  This attribute provides a way to check migration compatibility between two
+  mdev devices from userspace even before device creation. The intended usage is
+  for userspace to read the migration_version attribute from one mdev device and
+  then writing that value to the migration_version attribute of the other mdev
+  device. The second mdev device indicates compatibility via the return code of
+  the write operation. This makes compatibility between mdev devices completely
+  vendor-defined and opaque to userspace. Userspace should do nothing more
+  than verify the mdev types match and then use the migration_version attribute
+  to confirm source to target compatibility.
+
+  Reading/Writing Attribute Data:
+  read(2) will fail if device of type <type-id> does not support migration and
+          otherwise succeed and return migration_version string of the device of
+          type <type-id>.
+
+          This migration_version string is vendor defined and opaque to the
+          userspace. Vendor is free to include whatever they feel is relevant.
+          e.g. <pciid of parent device>-<software version>.
+
+          Restrictions on this migration_version string:
+            1. It should only contain ascii characters
+            2. MAX Length is PATH_MAX (4096)
+
+  write(2) expects migration_version string of source mdev device, and will
+          succeed if it is determined to be compatible and otherwise fail with
+          vendor specific errno.
+
+  Errno:
+  -An errno on read(2) indicates the device of type <type-id> does not support
+  migration;
+  -An errno on write(2) indicates the devices are incompatible or the target
+  doesn't support migration.
+  Vendor driver is free to define specific errno and is suggested to
+  print detailed error in syslog for diagnose purpose.
+
+  Userspace should treat ANY of below conditions as two mdev devices not
+  compatible:
+  (0) The mdev devices are not of the same type
+  (1) any one of the two mdev devices does not have a migration_version
+  attribute
+  (2) error when reading from migration_version attribute of one mdev device
+  (3) error when writing migration_version string of one mdev device to
+  migration_version attribute of the other mdev device
+
+  Userspace should regard two mdev devices compatible when ALL of below
+  conditions are met:
+  (0) The mdev devices are of the same type
+  (1) success when reading from migration_version attribute of one mdev device.
+  (2) success when writing migration_version string of one mdev device to
+  migration_version attribute of the other mdev device.
+
+  Example Usage:
+  (1) Compare mdev types:
+
+  The mdev type of an instantiated device can be read from the mdev_type link
+  within the device instance in sysfs, for example:
+
+  # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
+
+  The mdev types available on a given host system can also be found through
+  /sys/class/mdev_bus, for example:
+
+  # ls /sys/class/mdev_bus/*/mdev_supported_types/
+
+  Migration is only possible between devices of the same mdev type.
+
+  (2) Retrieve the mdev source migration_version:
+
+  The migration_version information can either be read from the mdev_type link
+  on an instantiated device:
+
+  # cat /sys/bus/mdev/devices/$UUID1/mdev_type/migration_version
+
+  Or it can be read from the mdev type definition, for example:
+
+  # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/migration_version
+
+  If reading the source migration_version generates an error, migration is not
+  possible.
+  NB, there might be several parent devices for a given mdev type on a host
+  system, each may support or expose different migration_versions.
+  Matching the specific mdev type to a parent may become important in such
+  configurations.
+
+  (3) Test source migration_version at target:
+
+  Given a migration_version as outlined above, its compatibility to an
+  instantiated device of the same mdev type can be tested as:
+  # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/migration_version
+
+  If this write fails, the source and target migration versions are not
+  compatible or the target does not support migration.
+
+  Compatibility can also be tested prior to target device creation using the
+  mdev type definition for a parent device with a previously found matching mdev
+  type, for example:
+
+  # echo $VERSION > \
+  /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/migration_version
+
+  Again, an error writing the migration_version indicates that an instance of
+  this mdev type would not support a migration from the provided migration
+  version.
+
 * [device]
 
   This directory contains links to the devices of type <type-id> that have been
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 2/4] drm/i915/gvt: export migration_version to mdev sysfs (under mdev_type node)
  2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
  2020-04-13  5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
@ 2020-04-13  5:54 ` Yan Zhao
  2020-04-13  5:55 ` [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node) Yan Zhao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-13  5:54 UTC (permalink / raw)
  To: intel-gvt-dev
  Cc: libvir-list, kvm, linux-doc, linux-kernel, aik, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, eauger, yi.l.liu, xin.zeng, ziye.yang,
	mlevitsk, pasic, felipe, changpeng.liu, Ken.Xue, jonathan.davies,
	shaopeng.he, alex.williamson, eskultet, dgilbert, cohuck,
	kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede, berrange,
	dinechin, corbet, Yan Zhao

This patch implements the mdev_type part of migration_version attribute
for Intel's vGPU mdev devices.

migration_version attribute under mdev_type node is rw.
It is located at
/sys/class/mdev_bus/0000:00:02.0/mdev_supported_types/$MDEV_TYPE/
or
/sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/$MDEV_TYPE/

It's used to check migration compatibility for two mdev devices of the
same mdev type.
migration_version string is defined by vendor driver and opaque to
userspace.

For Intel vGPU of gen8 and gen9, the format of migration_version string
is:
  <vendor id>-<device id>-<vgpu type>-<software version>.

For future platforms, the format of migration_version string is to be
expanded to include more meta data to identify Intel vGPUs for live
migration compatibility check

For old platforms, and for GVT not supporting vGPU live migration
feature, -ENODEV is returned on read(2)/write(2) of migration_version
attribute.
For vGPUs running old GVT who do not expose migration_version
attribute, live migration is regarded as not supported for those vGPUs.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Erik Skultety <eskultet@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
c: Neo Jia <cjia@nvidia.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>

Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>

---
v5:
updated commit message to indicate this patch introduces migration_version
attributes under mdev_type sysfs directory

v4:
1. fixed Indentation/spell issues and reworded several error messages
(Cornelia Huck)
2. added kfree(version) in snprintf failure case (Zhenyu Wang)

v3:
1. renamed version to migration_version
(Christophe de Dinechin, Cornelia Huck, Alex Williamson)
2. instead of generating migration version strings each time, storing
them in vgpu types generated during initialization.
(Zhenyu Wang, Cornelia Huck)
3. replaced multiple snprintf to one big snprintf in
intel_gvt_get_vfio_migration_version()
(Dr. David Alan Gilbert)
4. printed detailed error log
(Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
5. incorporated <software version> into migration_version string
(Alex Williamson)
6. do not use ifndef macro to switch off migration_version attribute
(Zhenyu Wang)

v2:
1. removed 32 common part of version string
(Alex Williamson)
2. do not register version attribute for GVT not supporting live
migration.(Cornelia Huck)
3. for platforms out of gen8, gen9, return -EINVAL --> -ENODEV for
incompatible. (Cornelia Huck)
---
 drivers/gpu/drm/i915/gvt/Makefile            |   2 +-
 drivers/gpu/drm/i915/gvt/gvt.c               |  39 +++++
 drivers/gpu/drm/i915/gvt/gvt.h               |   5 +
 drivers/gpu/drm/i915/gvt/migration_version.c | 170 +++++++++++++++++++
 drivers/gpu/drm/i915/gvt/vgpu.c              |  13 +-
 5 files changed, 226 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c

diff --git a/drivers/gpu/drm/i915/gvt/Makefile b/drivers/gpu/drm/i915/gvt/Makefile
index 9c5bc39a2095..11c6aba0bf0a 100644
--- a/drivers/gpu/drm/i915/gvt/Makefile
+++ b/drivers/gpu/drm/i915/gvt/Makefile
@@ -3,7 +3,7 @@ GVT_DIR := gvt
 GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \
 	interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \
 	execlist.o scheduler.o sched_policy.o mmio_context.o cmd_parser.o debugfs.o \
-	fb_decoder.o dmabuf.o page_track.o migrate.o
+	fb_decoder.o dmabuf.o page_track.o migrate.o migration_version.o
 
 ccflags-y				+= -I $(srctree)/$(src) -I $(srctree)/$(src)/$(GVT_DIR)/
 i915-y					+= $(addprefix $(GVT_DIR)/, $(GVT_SOURCE))
diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index d89dbc29bb96..fb464e3b2a57 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -106,14 +106,53 @@ static ssize_t description_show(struct kobject *kobj, struct device *dev,
 		       type->weight);
 }
 
+static ssize_t migration_version_show(struct kobject *kobj, struct device *dev,
+					char *buf)
+{
+	struct intel_vgpu_type *type;
+	void *gvt = kdev_to_i915(dev)->gvt;
+
+	type = intel_gvt_find_vgpu_type(gvt, kobject_name(kobj));
+	if (!type || !type->migration_version) {
+		gvt_err("Migration not supported on type %s. Please search previous detailed log\n",
+				kobject_name(kobj));
+		return -ENODEV;
+	}
+
+	return snprintf(buf, strlen(type->migration_version) + 2,
+			"%s\n", type->migration_version);
+}
+
+static ssize_t migration_version_store(struct kobject *kobj, struct device *dev,
+					const char *buf, size_t count)
+{
+	int ret = 0;
+	struct intel_vgpu_type *type;
+	void *gvt = kdev_to_i915(dev)->gvt;
+
+	type = intel_gvt_find_vgpu_type(gvt, kobject_name(kobj));
+	if (!type || !type->migration_version) {
+		gvt_err("Migration not supported on type %s. Please search previous detailed log\n",
+				kobject_name(kobj));
+		return -ENODEV;
+	}
+
+	ret = intel_gvt_check_vfio_migration_version(gvt,
+			type->migration_version, buf);
+
+	return (ret < 0 ? ret : count);
+}
+
 static MDEV_TYPE_ATTR_RO(available_instances);
 static MDEV_TYPE_ATTR_RO(device_api);
 static MDEV_TYPE_ATTR_RO(description);
+static MDEV_TYPE_ATTR_RW(migration_version);
 
 static struct attribute *gvt_type_attrs[] = {
 	&mdev_type_attr_available_instances.attr,
 	&mdev_type_attr_device_api.attr,
 	&mdev_type_attr_description.attr,
+	&mdev_type_attr_migration_version.attr,
 	NULL,
 };
 
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index c60df1e1d613..b26e42596565 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -277,6 +277,7 @@ struct intel_vgpu_type {
 	unsigned int fence;
 	unsigned int weight;
 	enum intel_vgpu_edid resolution;
+	char *migration_version;
 };
 
 struct intel_gvt {
@@ -709,6 +710,10 @@ int submit_context(struct intel_vgpu *vgpu,
 		struct execlist_ctx_descriptor_format *desc,
 		bool emulate_schedule_in);
 
+ssize_t intel_gvt_check_vfio_migration_version(struct intel_gvt *gvt,
+		const char *self, const char *remote);
+char *intel_gvt_get_vfio_migration_version(struct intel_gvt *gvt,
+		const char *vgpu_type);
 
 #include "trace.h"
 #include "mpt.h"
diff --git a/drivers/gpu/drm/i915/gvt/migration_version.c b/drivers/gpu/drm/i915/gvt/migration_version.c
new file mode 100644
index 000000000000..ded43b7d9e95
--- /dev/null
+++ b/drivers/gpu/drm/i915/gvt/migration_version.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright(c) 2011-2017 Intel Corporation. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *    Yan Zhao <yan.y.zhao@intel.com>
+ */
+#include <linux/vfio.h>
+#include "i915_drv.h"
+#include "gvt.h"
+
+#define INV_SOFTWARE_VERSION (-1U)
+#define VENDOR_ID_LEN (4)
+#define DEVICE_ID_LEN (4)
+#define VGPU_TYPE_LEN (16)
+#define SOFTWARE_VER_LEN (8)
+
+/* total length of vfio migration version string.
+ * never exceed limit of PATH_MAX (4096)
+ */
+#define MIGRATION_VERSION_TOTAL_LEN (VENDOR_ID_LEN + DEVICE_ID_LEN + \
+					VGPU_TYPE_LEN + SOFTWARE_VER_LEN + 4)
+
+#define GVT_VFIO_MIGRATION_SOFTWARE_VERSION INV_SOFTWARE_VERSION
+
+
+#define PRINTF_FORMAT "%04x-%04x-%s-%08x"
+#define SCANF_FORMAT "%x-%x-%16[^-]-%x"
+
+enum incompatible_reason {
+	IREASON_WRONG_REMOTE_FORMAT = 0,
+	IREASON_HARDWARE_MISMATCH,
+	IREASON_SOFTWARE_VERSION_MISMATCH,
+	IREASON_VGPU_TYPE_MISMATCH,
+};
+
+static const char *const incompatible_reason_str[] = {
+	[IREASON_WRONG_REMOTE_FORMAT] =
+		"wrong string format. probably wrong GVT version",
+	[IREASON_HARDWARE_MISMATCH] =
+		"physical device not matched",
+	[IREASON_SOFTWARE_VERSION_MISMATCH] =
+		"migration software version not matched",
+	[IREASON_VGPU_TYPE_MISMATCH] =
+		"vgpu type not matched"
+};
+
+static bool is_compatible(const char *local, const char *remote)
+{
+	bool ret;
+
+	ret = sysfs_streq(local, remote);
+
+	if (!ret) {
+		int vid_l = 0, did_l = 0, vid_r = 0, did_r = 0;
+		char type_l[VGPU_TYPE_LEN], type_r[VGPU_TYPE_LEN];
+		u32 sv_l = 0, sv_r = 0;
+		int rl = 0, rr = 0;
+		enum incompatible_reason reason = IREASON_WRONG_REMOTE_FORMAT;
+
+		memset(type_l, 0, sizeof(type_l));
+		memset(type_r, 0, sizeof(type_r));
+
+		rl = sscanf(local, SCANF_FORMAT,
+				&vid_l, &did_l, type_l, &sv_l);
+		rr = sscanf(remote, SCANF_FORMAT,
+				&vid_r, &did_r, type_r, &sv_r);
+
+		if (rl == rr) {
+			if (vid_l != vid_r || did_l != did_r)
+				reason = IREASON_HARDWARE_MISMATCH;
+			else if (sv_l != sv_r)
+				reason = IREASON_SOFTWARE_VERSION_MISMATCH;
+			else if (strncmp(type_l, type_r, VGPU_TYPE_LEN))
+				reason = IREASON_VGPU_TYPE_MISMATCH;
+		}
+
+		gvt_err("Migration version mismatched. Possible reason: %s. Local migration version:%s, Remote migration version:%s\n",
+				incompatible_reason_str[reason], local,	remote);
+
+	}
+	return ret;
+
+}
+
+
+char *
+intel_gvt_get_vfio_migration_version(struct intel_gvt *gvt,
+		const char *vgpu_type)
+{
+	int cnt = 0;
+	struct drm_i915_private *dev_priv = gvt->gt->i915;
+	char *version = NULL;
+
+	/* currently only gen8 & gen9 are supported */
+	if (!IS_GEN(dev_priv, 8) && !IS_GEN(dev_priv, 9)) {
+		gvt_err("Local hardware does not support migration on %d\n",
+				INTEL_INFO(dev_priv)->gen);
+		return NULL;
+	}
+
+	if (GVT_VFIO_MIGRATION_SOFTWARE_VERSION == INV_SOFTWARE_VERSION) {
+		gvt_err("Local GVT does not support migration\n");
+		return NULL;
+	}
+
+	version = kzalloc(MIGRATION_VERSION_TOTAL_LEN, GFP_KERNEL);
+
+	if (unlikely(!version)) {
+		gvt_err("cannot allocate memory for local migration version %s\n",
+				vgpu_type);
+		return NULL;
+	}
+
+	/* vendor id + device id + vgpu type + software version */
+	cnt = snprintf(version, MIGRATION_VERSION_TOTAL_LEN, PRINTF_FORMAT,
+			PCI_VENDOR_ID_INTEL,
+			INTEL_DEVID(dev_priv),
+			vgpu_type,
+			GVT_VFIO_MIGRATION_SOFTWARE_VERSION);
+
+	if (cnt)
+		return version;
+
+	gvt_err("cannot generate local migration version for type %s\n",
+			vgpu_type);
+	kfree(version);
+	return NULL;
+}
+
+ssize_t intel_gvt_check_vfio_migration_version(struct intel_gvt *gvt,
+		const char *self, const char *remote)
+{
+	struct drm_i915_private *dev_priv = gvt->gt->i915;
+
+	/* currently only gen8 & gen9 are supported */
+	if (!IS_GEN(dev_priv, 8) && !IS_GEN(dev_priv, 9)) {
+		gvt_err("Local hardware does not support migration on %d\n",
+				INTEL_INFO(dev_priv)->gen);
+		return -ENODEV;
+	}
+
+	if (GVT_VFIO_MIGRATION_SOFTWARE_VERSION == INV_SOFTWARE_VERSION) {
+		gvt_err("Local GVT does not support migration\n");
+		return -ENODEV;
+	}
+
+	if (!is_compatible(self, remote))
+		return -EINVAL;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 72d22d97bc37..38b2575b39b7 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -155,13 +155,18 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 			sprintf(gvt->types[i].name, "GVTg_V5_%s",
 				vgpu_types[i].name);
 
-		gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s\n",
+		gvt->types[i].migration_version =
+			intel_gvt_get_vfio_migration_version(gvt,
+					gvt->types[i].name);
+		gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s, migratio_version:%s\n",
 			     i, gvt->types[i].name,
 			     gvt->types[i].avail_instance,
 			     gvt->types[i].low_gm_size,
 			     gvt->types[i].high_gm_size, gvt->types[i].fence,
 			     gvt->types[i].weight,
-			     vgpu_edid_str(gvt->types[i].resolution));
+			     vgpu_edid_str(gvt->types[i].resolution),
+			     (gvt->types[i].migration_version ?
+			     gvt->types[i].migration_version : "null"));
 	}
 
 	gvt->num_types = i;
@@ -170,6 +175,10 @@ int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
 
 void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
 {
+	int i;
+
+	for (i = 0; i < gvt->num_types; i++)
+		kfree(gvt->types[i].migration_version);
 	kfree(gvt->types);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node)
  2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
  2020-04-13  5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
  2020-04-13  5:54 ` [PATCH v5 2/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
@ 2020-04-13  5:55 ` Yan Zhao
  2020-04-15  7:42   ` Erik Skultety
  2020-04-13  5:55 ` [PATCH v5 4/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
  2020-04-17  8:44 ` [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Cornelia Huck
  4 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-13  5:55 UTC (permalink / raw)
  To: intel-gvt-dev
  Cc: libvir-list, kvm, linux-doc, linux-kernel, aik, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, eauger, yi.l.liu, xin.zeng, ziye.yang,
	mlevitsk, pasic, felipe, changpeng.liu, Ken.Xue, jonathan.davies,
	shaopeng.he, alex.williamson, eskultet, dgilbert, cohuck,
	kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede, berrange,
	dinechin, corbet, Yan Zhao

migration_version attribute is used to check migration compatibility
between two mdev devices of the same mdev type.
The key is that it's rw and its data is opaque to userspace.

Userspace reads migration_version of mdev device at source side and
writes the value to migration_version attribute of mdev device at target
side. It judges migration compatibility according to whether the read
and write operations succeed or fail.

Currently, it is able to read/write migration_version attribute under two
places:

(1) under mdev_type node
userspace is able to know whether two mdev devices are compatible before
a mdev device is created.

userspace also needs to check whether the two mdev devices are of the same
mdev type before checking the migration_version attribute. It also needs
to check device creation parameters if aggregation is supported in future.

(2) under mdev device node
userspace is able to know whether two mdev devices are compatible after
they are all created. But it does not need to check mdev type and device
creation parameter for aggregation as device vendor driver would have
incorporated those information into the migration_version attribute.

             __    userspace
              /\              \
             /                 \write
            / read              \
   ________/__________       ___\|/_____________
  | migration_version |     | migration_version |-->check migration
  ---------------------     ---------------------   compatibility
    mdev device A               mdev device B

This patch is for mdev documentation about the second place (under
mdev device node)

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Erik Skultety <eskultet@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Neo Jia <cjia@nvidia.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Christophe de Dinechin <dinechin@redhat.com>

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 .../driver-api/vfio-mediated-device.rst       | 70 +++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 2d1f3c0f3c8f..efbadfd51b7e 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -383,6 +383,7 @@ Directories and Files Under the sysfs for Each mdev Device
          |--- remove
          |--- mdev_type {link to its type}
          |--- vendor-specific-attributes [optional]
+         |--- migration_verion [optional]
 
 * remove (write only)
 
@@ -394,6 +395,75 @@ Example::
 
 	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 
+* migration_version (rw, optional)
+  It is used to check migration compatibility between two mdev devices.
+  Absence of this attribute means the mdev device does not support migration.
+
+  This attribute provides a way to check migration compatibility between two
+  mdev devices from userspace after device created. The intended usage is
+  for userspace to read the migration_version attribute from one mdev device and
+  then writing that value to the migration_version attribute of the other mdev
+  device. The second mdev device indicates compatibility via the return code of
+  the write operation. This makes compatibility between mdev devices completely
+  vendor-defined and opaque to userspace. Userspace should do nothing more
+  than use the migration_version attribute to confirm source to target
+  compatibility.
+
+  Reading/Writing Attribute Data:
+  read(2) will fail if a mdev device does not support migration and otherwise
+        succeed and return migration_version string of the mdev device.
+
+        This migration_version string is vendor defined and opaque to the
+        userspace. Vendor is free to include whatever they feel is relevant.
+        e.g. <pciid of parent device>-<software version>.
+
+        Restrictions on this migration_version string:
+            1. It should only contain ascii characters
+            2. MAX Length is PATH_MAX (4096)
+
+  write(2) expects migration_version string of source mdev device, and will
+         succeed if it is determined to be compatible and otherwise fail with
+         vendor specific errno.
+
+  Errno:
+  -An errno on read(2) indicates the mdev devicedoes not support migration;
+  -An errno on write(2) indicates the mdev devices are incompatible or the
+   target doesn't support migration.
+  Vendor driver is free to define specific errno and is suggested to
+  print detailed error in syslog for diagnose purpose.
+
+  Userspace should treat ANY of below conditions as two mdev devices not
+  compatible:
+  (1) any one of the two mdev devices does not have a migration_version
+  attribute
+  (2) error when reading from migration_version attribute of one mdev device
+  (3) error when writing migration_version string of one mdev device to
+  migration_version attribute of the other mdev device
+
+  Userspace should regard two mdev devices compatible when ALL of below
+  conditions are met:
+  (1) success when reading from migration_version attribute of one mdev device.
+  (2) success when writing migration_version string of one mdev device to
+  migration_version attribute of the other mdev device.
+
+  Example Usage:
+  (1) Retrieve the mdev source migration_version:
+
+  # cat /sys/bus/mdev/devices/$mdev_UUID1/migration_version
+
+  If reading the source migration_version generates an error, migration is not
+  possible.
+
+  (2) Test source migration_version at target:
+
+  Given a migration_version as outlined above, its compatibility to an
+  instantiated device of the same mdev type can be tested as:
+  # echo $VERSION > /sys/bus/mdev/devices/$mdev_UUID2/migration_version
+
+  If this write fails, the source and target migration versions are not
+  compatible or the target does not support migration.
+
+
 Mediated device Hot plug
 ------------------------
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 4/4] drm/i915/gvt: export migration_version to mdev sysfs (under mdev device node)
  2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
                   ` (2 preceding siblings ...)
  2020-04-13  5:55 ` [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node) Yan Zhao
@ 2020-04-13  5:55 ` Yan Zhao
  2020-04-17  8:44 ` [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Cornelia Huck
  4 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-13  5:55 UTC (permalink / raw)
  To: intel-gvt-dev
  Cc: libvir-list, kvm, linux-doc, linux-kernel, aik, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, eauger, yi.l.liu, xin.zeng, ziye.yang,
	mlevitsk, pasic, felipe, changpeng.liu, Ken.Xue, jonathan.davies,
	shaopeng.he, alex.williamson, eskultet, dgilbert, cohuck,
	kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede, berrange,
	dinechin, corbet, Yan Zhao

mdev device par of migration_version attribute for Intel vGPU is rw.
It is located at
/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version,
or /sys/bus/mdev/devices/$mdev_UUID/migration_version

It's used to check migration compatibility for two vGPUs.
migration_version string is defined by vendor driver and opaque to
userspace.

For Intel vGPU of gen8 and gen9, the format of migration_version string
is:
  <vendor id>-<device id>-<vgpu type>-<software version>.

For future software versions, e.g. when vGPUs have aggregations, it may
also include aggregation count into migration_version string of a vGPU.

For future platforms, the format of migration_version string is to be
expanded to include more meta data to identify Intel vGPUs for live
migration compatibility check

For old platforms, and for GVT not supporting vGPU live migration
feature, -ENODEV is returned on read(2)/write(2) of migration_version
attribute.
For vGPUs running old GVT who do not expose migration_version
attribute, live migration is regarded as not supported for those vGPUs.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Erik Skultety <eskultet@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
c: Neo Jia <cjia@nvidia.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   |  2 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c | 55 ++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index b26e42596565..664efc83f82e 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -205,6 +205,8 @@ struct intel_vgpu {
 	struct idr object_idr;
 
 	u32 scan_nonprivbb;
+
+	char *migration_version;
 };
 
 static inline void *intel_vgpu_vdev(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 2f2d4c40f966..4903599cb0ef 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -728,8 +728,13 @@ static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
 	kvmgt_vdev(vgpu)->mdev = mdev;
 	mdev_set_drvdata(mdev, vgpu);
 
+	vgpu->migration_version =
+		intel_gvt_get_vfio_migration_version(gvt, type->name);
+
 	gvt_dbg_core("intel_vgpu_create succeeded for mdev: %s\n",
 		     dev_name(mdev_dev(mdev)));
+
+
 	ret = 0;
 
 out:
@@ -744,6 +749,7 @@ static int intel_vgpu_remove(struct mdev_device *mdev)
 		return -EBUSY;
 
 	intel_gvt_ops->vgpu_destroy(vgpu);
+	kfree(vgpu->migration_version);
 	return 0;
 }
 
@@ -1964,8 +1970,57 @@ static const struct attribute_group intel_vgpu_group = {
 	.attrs = intel_vgpu_attrs,
 };
 
+static ssize_t migration_version_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
+
+	if (!vgpu->migration_version) {
+		gvt_vgpu_err("Migration not supported on this vgpu. Please search previous detailed log\n");
+		return -ENODEV;
+	}
+
+	return snprintf(buf, strlen(vgpu->migration_version) + 2,
+			"%s\n", vgpu->migration_version);
+
+}
+
+static ssize_t migration_version_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
+	struct intel_gvt *gvt = vgpu->gvt;
+	int ret = 0;
+
+	if (!vgpu->migration_version) {
+		gvt_vgpu_err("Migration not supported on this vgpu. Please search previous detailed log\n");
+		return -ENODEV;
+	}
+
+	ret = intel_gvt_check_vfio_migration_version(gvt,
+			vgpu->migration_version, buf);
+	return (ret < 0 ? ret : count);
+}
+
+static DEVICE_ATTR_RW(migration_version);
+
+static struct attribute *intel_vgpu_migration_attrs[] = {
+	&dev_attr_migration_version.attr,
+	NULL,
+};
+/* this group has no name, so will be displayed
+ * immediately under sysfs node of the mdev device
+ */
+static const struct attribute_group intel_vgpu_group_empty_name = {
+	.attrs = intel_vgpu_migration_attrs,
+};
+
 static const struct attribute_group *intel_vgpu_groups[] = {
 	&intel_vgpu_group,
+	&intel_vgpu_group_empty_name,
 	NULL,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node)
  2020-04-13  5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
@ 2020-04-15  7:28   ` Erik Skultety
  2020-04-15  8:58     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Erik Skultety @ 2020-04-15  7:28 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, corbet, yi.l.liu,
	ziye.yang, mlevitsk, pasic, aik, felipe, Ken.Xue, kevin.tian,
	xin.zeng, dgilbert, zhenyuw, dinechin, changpeng.liu, cohuck,
	linux-kernel, zhi.a.wang, jonathan.davies, shaopeng.he

On Mon, Apr 13, 2020 at 01:54:03AM -0400, Yan Zhao wrote:
> migration_version attribute is used to check migration compatibility
> between two mdev devices of the same mdev type.
> The key is that it's rw and its data is opaque to userspace.
>
> Userspace reads migration_version of mdev device at source side and
> writes the value to migration_version attribute of mdev device at target
> side. It judges migration compatibility according to whether the read
> and write operations succeed or fail.
>
> Currently, it is able to read/write migration_version attribute under two
> places:
>
> (1) under mdev_type node
> userspace is able to know whether two mdev devices are compatible before
> a mdev device is created.
>
> userspace also needs to check whether the two mdev devices are of the same
> mdev type before checking the migration_version attribute. It also needs
> to check device creation parameters if aggregation is supported in future.
>
> (2) under mdev device node
> userspace is able to know whether two mdev devices are compatible after
> they are all created. But it does not need to check mdev type and device
> creation parameter for aggregation as device vendor driver would have
> incorporated those information into the migration_version attribute.
>
>              __    userspace
>               /\              \
>              /                 \write
>             / read              \
>    ________/__________       ___\|/_____________
>   | migration_version |     | migration_version |-->check migration
>   ---------------------     ---------------------   compatibility
>     mdev device A               mdev device B
>
> This patch is for mdev documentation about the first place (under
> mdev_type node)
>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Erik Skultety <eskultet@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> Cc: Neo Jia <cjia@nvidia.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Christophe de Dinechin <dinechin@redhat.com>
>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>
> ---
> v5:
> updated commit message a little to indicate this patch is for
> migration_version attribute under mdev_type node
>
> v4:
> fixed a typo. (Cornelia Huck)
>
> v3:
> 1. renamed version to migration_version
> (Christophe de Dinechin, Cornelia Huck, Alex Williamson)
> 2. let errno to be freely defined by vendor driver
> (Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
> 3. let checking mdev_type be prerequisite of migration compatibility
> check. (Alex Williamson)
> 4. reworded example usage section.
> (most of this section came from Alex Williamson)
> 5. reworded attribute intention section (Cornelia Huck)
>
> v2:
> 1. added detailed intent and usage
> 2. made definition of version string completely private to vendor driver
>    (Alex Williamson)
> 3. abandoned changes to sample mdev drivers (Alex Williamson)
> 4. mandatory --> optional (Cornelia Huck)
> 5. added description for errno (Cornelia Huck)
> ---
>  .../driver-api/vfio-mediated-device.rst       | 113 ++++++++++++++++++
>  1 file changed, 113 insertions(+)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index 25eb7d5b834b..2d1f3c0f3c8f 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- migration_version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |     |   |--- create
> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- migration_version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |          |--- create
> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
>    |          |--- available_instances
>    |          |--- device_api
>    |          |--- description
> +  |          |--- migration_version
>    |          |--- [devices]
>
>  * [mdev_supported_types]
> @@ -246,6 +249,116 @@ Directories and files under the sysfs for Each Physical Device
>    This attribute should show the number of devices of type <type-id> that can be
>    created.

I've got only a few suggestions to improve to wording in the documentation
(feel free to disagree):

>
> +* migration_version
> +
> +  This attribute is rw, and is optional.

IMO better wording: "This is an optional, RW attribute."

> +  It is used to check migration compatibility between two mdev devices of the
> +  same mdev type. Absence of this attribute means the device of type <type-id>
> +  does not support migration.
> +  This attribute provides a way to check migration compatibility between two
> +  mdev devices from userspace even before device creation. The intended usage is

^This sentence essentially duplicates the information from the first sentence,
can we condense it into something like:

"It is used to check the migration compatibility between two mdev devices of the
same mdev type. Typically, the target device has not been created yet at the
time of userspace using this attribute to check the migration compatibility."

> +  for userspace to read the migration_version attribute from one mdev device and
> +  then writing that value to the migration_version attribute of the other mdev
> +  device. The second mdev device indicates compatibility via the return code of
> +  the write operation. This makes compatibility between mdev devices completely
> +  vendor-defined and opaque to userspace. Userspace should do nothing more
> +  than verify the mdev types match and then use the migration_version attribute
> +  to confirm source to target compatibility.

I'd rephrase the ^last sentence differently:
"Therefore, userspace is only expected to verify that the mdev types of the
respective devices match and then use the migration_version attribute to
confirm migration compatibility between the source and target mdev devices."

> +
> +  Reading/Writing Attribute Data:
> +  read(2) will fail if device of type <type-id> does not support migration and
> +          otherwise succeed and return migration_version string of the device of

"returns a migration_version string of the device on success, fails with an
errno if the device doesn't support migration"

> +          type <type-id>.
> +
> +          This migration_version string is vendor defined and opaque to the
> +          userspace. Vendor is free to include whatever they feel is relevant.
> +          e.g. <pciid of parent device>-<software version>.
> +
> +          Restrictions on this migration_version string:
> +            1. It should only contain ascii characters
> +            2. MAX Length is PATH_MAX (4096)
> +
> +  write(2) expects migration_version string of source mdev device, and will
> +          succeed if it is determined to be compatible and otherwise fail with
> +          vendor specific errno.

"expects a migration_version string of the source mdev device, succeeds if the
two mdev devices are migration compatible, otherwise fails with and errno"

> +
> +  Errno:
> +  -An errno on read(2) indicates the device of type <type-id> does not support
> +  migration;
> +  -An errno on write(2) indicates the devices are incompatible or the target
> +  doesn't support migration.
> +  Vendor driver is free to define specific errno and is suggested to
> +  print detailed error in syslog for diagnose purpose.
> +
> +  Userspace should treat ANY of below conditions as two mdev devices not

Userspace should treat any of the below conditions as an indication of migration
incompatibility between two mdev devices.

> +  compatible:
> +  (0) The mdev devices are not of the same type
> +  (1) any one of the two mdev devices does not have a migration_version
> +  attribute

any of the two mdev devices is missing the migration_version attribute

> +  (2) error when reading from migration_version attribute of one mdev device

when reading the source mdev's migration_version attribute

> +  (3) error when writing migration_version string of one mdev device to
> +  migration_version attribute of the other mdev device

when writing the source mdev migration_version string to the target mdev
device's migration_version attribute

> +
> +  Userspace should regard two mdev devices compatible when ALL of below
> +  conditions are met:

Userspace can consider the two mdev devices to be compatible when all of the
below conditions are met:

> +  (0) The mdev devices are of the same type
> +  (1) success when reading from migration_version attribute of one mdev device.

reading the migration_version attribute of the source succeeds

> +  (2) success when writing migration_version string of one mdev device to
> +  migration_version attribute of the other mdev device.

writing the migration_version string to the target mdev's migration_version
attribute succeeds

> +
> +  Example Usage:
> +  (1) Compare mdev types:

Comparing two mdev types:

> +
> +  The mdev type of an instantiated device can be read from the mdev_type link
> +  within the device instance in sysfs, for example:
> +
> +  # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> +
> +  The mdev types available on a given host system can also be found through
> +  /sys/class/mdev_bus, for example:
> +
> +  # ls /sys/class/mdev_bus/*/mdev_supported_types/
> +
> +  Migration is only possible between devices of the same mdev type.
> +
> +  (2) Retrieve the mdev source migration_version:
> +
> +  The migration_version information can either be read from the mdev_type link
> +  on an instantiated device:

s/information/string

> +
> +  # cat /sys/bus/mdev/devices/$UUID1/mdev_type/migration_version
> +
> +  Or it can be read from the mdev type definition, for example:
> +
> +  # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/migration_version
> +
> +  If reading the source migration_version generates an error, migration is not
> +  possible.
> +  NB, there might be several parent devices for a given mdev type on a host
> +  system, each may support or expose different migration_versions.
> +  Matching the specific mdev type to a parent may become important in such
> +  configurations.
> +
> +  (3) Test source migration_version at target:
> +
> +  Given a migration_version as outlined above, its compatibility to an
> +  instantiated device of the same mdev type can be tested as:
> +  # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/migration_version
> +
> +  If this write fails, the source and target migration versions are not
> +  compatible or the target does not support migration.
> +
> +  Compatibility can also be tested prior to target device creation using the

prior to creation of the target device

> +  mdev type definition for a parent device with a previously found matching mdev
> +  type, for example:

using the migration_version attribute present inside a specific mdev type
directory for a given physical parent device.

> +
> +  # echo $VERSION > \
> +  /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/migration_version
> +
> +  Again, an error writing the migration_version indicates that an instance of
> +  this mdev type would not support a migration from the provided migration
> +  version.

would not support migration from the source.

--
Erik Skultety


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node)
  2020-04-13  5:55 ` [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node) Yan Zhao
@ 2020-04-15  7:42   ` Erik Skultety
  2020-04-15  9:02     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Erik Skultety @ 2020-04-15  7:42 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, corbet, yi.l.liu,
	ziye.yang, mlevitsk, pasic, aik, felipe, Ken.Xue, kevin.tian,
	xin.zeng, dgilbert, zhenyuw, dinechin, changpeng.liu, cohuck,
	linux-kernel, zhi.a.wang, jonathan.davies, shaopeng.he

On Mon, Apr 13, 2020 at 01:55:04AM -0400, Yan Zhao wrote:
> migration_version attribute is used to check migration compatibility
> between two mdev devices of the same mdev type.
> The key is that it's rw and its data is opaque to userspace.
>
> Userspace reads migration_version of mdev device at source side and
> writes the value to migration_version attribute of mdev device at target
> side. It judges migration compatibility according to whether the read
> and write operations succeed or fail.
>
> Currently, it is able to read/write migration_version attribute under two
> places:
>
> (1) under mdev_type node
> userspace is able to know whether two mdev devices are compatible before
> a mdev device is created.
>
> userspace also needs to check whether the two mdev devices are of the same
> mdev type before checking the migration_version attribute. It also needs
> to check device creation parameters if aggregation is supported in future.
>
> (2) under mdev device node
> userspace is able to know whether two mdev devices are compatible after
> they are all created. But it does not need to check mdev type and device
> creation parameter for aggregation as device vendor driver would have
> incorporated those information into the migration_version attribute.
>
>              __    userspace
>               /\              \
>              /                 \write
>             / read              \
>    ________/__________       ___\|/_____________
>   | migration_version |     | migration_version |-->check migration
>   ---------------------     ---------------------   compatibility
>     mdev device A               mdev device B
>
> This patch is for mdev documentation about the second place (under
> mdev device node)
>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Erik Skultety <eskultet@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> Cc: Neo Jia <cjia@nvidia.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Christophe de Dinechin <dinechin@redhat.com>
>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  .../driver-api/vfio-mediated-device.rst       | 70 +++++++++++++++++++
>  1 file changed, 70 insertions(+)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index 2d1f3c0f3c8f..efbadfd51b7e 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -383,6 +383,7 @@ Directories and Files Under the sysfs for Each mdev Device
>           |--- remove
>           |--- mdev_type {link to its type}
>           |--- vendor-specific-attributes [optional]
> +         |--- migration_verion [optional]
>
>  * remove (write only)
>
> @@ -394,6 +395,75 @@ Example::
>
>  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
>
> +* migration_version (rw, optional)

Hmm, ^this is not consistent with how patch 1/5 reports this information, but
looking at the existing docs we're not doing very well in terms of consistency
there either.

I suggest we go with "(read-write)" in both patch 1/5 and here and then start
the paragraph with "This is an optional attribute."

> +  It is used to check migration compatibility between two mdev devices.
> +  Absence of this attribute means the mdev device does not support migration.
> +
> +  This attribute provides a way to check migration compatibility between two
> +  mdev devices from userspace after device created. The intended usage is

after the target device has been created.

side note: maybe add something like "(see the migration_version attribute of
the device node if the target device already exists)" in the same section in
patch 1/5.

> +  for userspace to read the migration_version attribute from one mdev device and
> +  then writing that value to the migration_version attribute of the other mdev
> +  device. The second mdev device indicates compatibility via the return code of
> +  the write operation. This makes compatibility between mdev devices completely
> +  vendor-defined and opaque to userspace. Userspace should do nothing more
> +  than use the migration_version attribute to confirm source to target
> +  compatibility.

...

> +
> +  Reading/Writing Attribute Data:
> +  read(2) will fail if a mdev device does not support migration and otherwise
> +        succeed and return migration_version string of the mdev device.
> +
> +        This migration_version string is vendor defined and opaque to the
> +        userspace. Vendor is free to include whatever they feel is relevant.
> +        e.g. <pciid of parent device>-<software version>.
> +
> +        Restrictions on this migration_version string:
> +            1. It should only contain ascii characters
> +            2. MAX Length is PATH_MAX (4096)
> +
> +  write(2) expects migration_version string of source mdev device, and will
> +         succeed if it is determined to be compatible and otherwise fail with
> +         vendor specific errno.
> +
> +  Errno:
> +  -An errno on read(2) indicates the mdev devicedoes not support migration;

s/devicedoes/device does/

> +  -An errno on write(2) indicates the mdev devices are incompatible or the
> +   target doesn't support migration.
> +  Vendor driver is free to define specific errno and is suggested to
> +  print detailed error in syslog for diagnose purpose.
> +
> +  Userspace should treat ANY of below conditions as two mdev devices not
> +  compatible:
> +  (1) any one of the two mdev devices does not have a migration_version
> +  attribute
> +  (2) error when reading from migration_version attribute of one mdev device
> +  (3) error when writing migration_version string of one mdev device to
> +  migration_version attribute of the other mdev device
> +
> +  Userspace should regard two mdev devices compatible when ALL of below
> +  conditions are met:
> +  (1) success when reading from migration_version attribute of one mdev device.
> +  (2) success when writing migration_version string of one mdev device to
> +  migration_version attribute of the other mdev device.
> +
> +  Example Usage:
> +  (1) Retrieve the mdev source migration_version:
> +
> +  # cat /sys/bus/mdev/devices/$mdev_UUID1/migration_version
> +
> +  If reading the source migration_version generates an error, migration is not
> +  possible.
> +
> +  (2) Test source migration_version at target:
> +
> +  Given a migration_version as outlined above, its compatibility to an
> +  instantiated device of the same mdev type can be tested as:
> +  # echo $VERSION > /sys/bus/mdev/devices/$mdev_UUID2/migration_version
> +
> +  If this write fails, the source and target migration versions are not
> +  compatible or the target does not support migration.
> +
> +
>  Mediated device Hot plug
>  ------------------------

Overall, the same comments as in 1/5 apply text-wise.

Regards,
--
Erik Skultety


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node)
  2020-04-15  7:28   ` Erik Skultety
@ 2020-04-15  8:58     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-15  8:58 UTC (permalink / raw)
  To: Erik Skultety
  Cc: intel-gvt-dev, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, corbet, Liu, Yi L,
	Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue, Tian, Kevin,
	Zeng, Xin, dgilbert, zhenyuw, dinechin, Liu, Changpeng, cohuck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Wed, Apr 15, 2020 at 03:28:51PM +0800, Erik Skultety wrote:
> On Mon, Apr 13, 2020 at 01:54:03AM -0400, Yan Zhao wrote:
> > migration_version attribute is used to check migration compatibility
> > between two mdev devices of the same mdev type.
> > The key is that it's rw and its data is opaque to userspace.
> >
> > Userspace reads migration_version of mdev device at source side and
> > writes the value to migration_version attribute of mdev device at target
> > side. It judges migration compatibility according to whether the read
> > and write operations succeed or fail.
> >
> > Currently, it is able to read/write migration_version attribute under two
> > places:
> >
> > (1) under mdev_type node
> > userspace is able to know whether two mdev devices are compatible before
> > a mdev device is created.
> >
> > userspace also needs to check whether the two mdev devices are of the same
> > mdev type before checking the migration_version attribute. It also needs
> > to check device creation parameters if aggregation is supported in future.
> >
> > (2) under mdev device node
> > userspace is able to know whether two mdev devices are compatible after
> > they are all created. But it does not need to check mdev type and device
> > creation parameter for aggregation as device vendor driver would have
> > incorporated those information into the migration_version attribute.
> >
> >              __    userspace
> >               /\              \
> >              /                 \write
> >             / read              \
> >    ________/__________       ___\|/_____________
> >   | migration_version |     | migration_version |-->check migration
> >   ---------------------     ---------------------   compatibility
> >     mdev device A               mdev device B
> >
> > This patch is for mdev documentation about the first place (under
> > mdev_type node)
> >
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Erik Skultety <eskultet@redhat.com>
> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > Cc: Neo Jia <cjia@nvidia.com>
> > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Cc: Christophe de Dinechin <dinechin@redhat.com>
> >
> > Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> >
> > ---
> > v5:
> > updated commit message a little to indicate this patch is for
> > migration_version attribute under mdev_type node
> >
> > v4:
> > fixed a typo. (Cornelia Huck)
> >
> > v3:
> > 1. renamed version to migration_version
> > (Christophe de Dinechin, Cornelia Huck, Alex Williamson)
> > 2. let errno to be freely defined by vendor driver
> > (Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
> > 3. let checking mdev_type be prerequisite of migration compatibility
> > check. (Alex Williamson)
> > 4. reworded example usage section.
> > (most of this section came from Alex Williamson)
> > 5. reworded attribute intention section (Cornelia Huck)
> >
> > v2:
> > 1. added detailed intent and usage
> > 2. made definition of version string completely private to vendor driver
> >    (Alex Williamson)
> > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > 4. mandatory --> optional (Cornelia Huck)
> > 5. added description for errno (Cornelia Huck)
> > ---
> >  .../driver-api/vfio-mediated-device.rst       | 113 ++++++++++++++++++
> >  1 file changed, 113 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> > index 25eb7d5b834b..2d1f3c0f3c8f 100644
> > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- migration_version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |     |   |--- create
> > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- migration_version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |          |--- create
> > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |          |--- available_instances
> >    |          |--- device_api
> >    |          |--- description
> > +  |          |--- migration_version
> >    |          |--- [devices]
> >
> >  * [mdev_supported_types]
> > @@ -246,6 +249,116 @@ Directories and files under the sysfs for Each Physical Device
> >    This attribute should show the number of devices of type <type-id> that can be
> >    created.
> 
> I've got only a few suggestions to improve to wording in the documentation
> (feel free to disagree):
> 
hi Erik,
Thanks for your good suggestions. They are better to understand than
the original ones:)
I'll update the doc according to them except for below minor one --
may I just put it like this:
* migration_version (rw, optional)

Thanks
Yan

> >
> > +* migration_version
> > +
> > +  This attribute is rw, and is optional.
> 
> IMO better wording: "This is an optional, RW attribute."
>



> > +  It is used to check migration compatibility between two mdev devices of the
> > +  same mdev type. Absence of this attribute means the device of type <type-id>
> > +  does not support migration.
> > +  This attribute provides a way to check migration compatibility between two
> > +  mdev devices from userspace even before device creation. The intended usage is
> 
> ^This sentence essentially duplicates the information from the first sentence,
> can we condense it into something like:
> 
> "It is used to check the migration compatibility between two mdev devices of the
> same mdev type. Typically, the target device has not been created yet at the
> time of userspace using this attribute to check the migration compatibility."
> 
> > +  for userspace to read the migration_version attribute from one mdev device and
> > +  then writing that value to the migration_version attribute of the other mdev
> > +  device. The second mdev device indicates compatibility via the return code of
> > +  the write operation. This makes compatibility between mdev devices completely
> > +  vendor-defined and opaque to userspace. Userspace should do nothing more
> > +  than verify the mdev types match and then use the migration_version attribute
> > +  to confirm source to target compatibility.
> 
> I'd rephrase the ^last sentence differently:
> "Therefore, userspace is only expected to verify that the mdev types of the
> respective devices match and then use the migration_version attribute to
> confirm migration compatibility between the source and target mdev devices."
> 
> > +
> > +  Reading/Writing Attribute Data:
> > +  read(2) will fail if device of type <type-id> does not support migration and
> > +          otherwise succeed and return migration_version string of the device of
> 
> "returns a migration_version string of the device on success, fails with an
> errno if the device doesn't support migration"
> 
> > +          type <type-id>.
> > +
> > +          This migration_version string is vendor defined and opaque to the
> > +          userspace. Vendor is free to include whatever they feel is relevant.
> > +          e.g. <pciid of parent device>-<software version>.
> > +
> > +          Restrictions on this migration_version string:
> > +            1. It should only contain ascii characters
> > +            2. MAX Length is PATH_MAX (4096)
> > +
> > +  write(2) expects migration_version string of source mdev device, and will
> > +          succeed if it is determined to be compatible and otherwise fail with
> > +          vendor specific errno.
> 
> "expects a migration_version string of the source mdev device, succeeds if the
> two mdev devices are migration compatible, otherwise fails with and errno"
> 
> > +
> > +  Errno:
> > +  -An errno on read(2) indicates the device of type <type-id> does not support
> > +  migration;
> > +  -An errno on write(2) indicates the devices are incompatible or the target
> > +  doesn't support migration.
> > +  Vendor driver is free to define specific errno and is suggested to
> > +  print detailed error in syslog for diagnose purpose.
> > +
> > +  Userspace should treat ANY of below conditions as two mdev devices not
> 
> Userspace should treat any of the below conditions as an indication of migration
> incompatibility between two mdev devices.
> 
> > +  compatible:
> > +  (0) The mdev devices are not of the same type
> > +  (1) any one of the two mdev devices does not have a migration_version
> > +  attribute
> 
> any of the two mdev devices is missing the migration_version attribute
> 
> > +  (2) error when reading from migration_version attribute of one mdev device
> 
> when reading the source mdev's migration_version attribute
> 
> > +  (3) error when writing migration_version string of one mdev device to
> > +  migration_version attribute of the other mdev device
> 
> when writing the source mdev migration_version string to the target mdev
> device's migration_version attribute
> 
> > +
> > +  Userspace should regard two mdev devices compatible when ALL of below
> > +  conditions are met:
> 
> Userspace can consider the two mdev devices to be compatible when all of the
> below conditions are met:
> 
> > +  (0) The mdev devices are of the same type
> > +  (1) success when reading from migration_version attribute of one mdev device.
> 
> reading the migration_version attribute of the source succeeds
> 
> > +  (2) success when writing migration_version string of one mdev device to
> > +  migration_version attribute of the other mdev device.
> 
> writing the migration_version string to the target mdev's migration_version
> attribute succeeds
> 
> > +
> > +  Example Usage:
> > +  (1) Compare mdev types:
> 
> Comparing two mdev types:
> 
> > +
> > +  The mdev type of an instantiated device can be read from the mdev_type link
> > +  within the device instance in sysfs, for example:
> > +
> > +  # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> > +
> > +  The mdev types available on a given host system can also be found through
> > +  /sys/class/mdev_bus, for example:
> > +
> > +  # ls /sys/class/mdev_bus/*/mdev_supported_types/
> > +
> > +  Migration is only possible between devices of the same mdev type.
> > +
> > +  (2) Retrieve the mdev source migration_version:
> > +
> > +  The migration_version information can either be read from the mdev_type link
> > +  on an instantiated device:
> 
> s/information/string
> 
> > +
> > +  # cat /sys/bus/mdev/devices/$UUID1/mdev_type/migration_version
> > +
> > +  Or it can be read from the mdev type definition, for example:
> > +
> > +  # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/migration_version
> > +
> > +  If reading the source migration_version generates an error, migration is not
> > +  possible.
> > +  NB, there might be several parent devices for a given mdev type on a host
> > +  system, each may support or expose different migration_versions.
> > +  Matching the specific mdev type to a parent may become important in such
> > +  configurations.
> > +
> > +  (3) Test source migration_version at target:
> > +
> > +  Given a migration_version as outlined above, its compatibility to an
> > +  instantiated device of the same mdev type can be tested as:
> > +  # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/migration_version
> > +
> > +  If this write fails, the source and target migration versions are not
> > +  compatible or the target does not support migration.
> > +
> > +  Compatibility can also be tested prior to target device creation using the
> 
> prior to creation of the target device
> 
> > +  mdev type definition for a parent device with a previously found matching mdev
> > +  type, for example:
> 
> using the migration_version attribute present inside a specific mdev type
> directory for a given physical parent device.
> 
> > +
> > +  # echo $VERSION > \
> > +  /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/migration_version
> > +
> > +  Again, an error writing the migration_version indicates that an instance of
> > +  this mdev type would not support a migration from the provided migration
> > +  version.
> 
> would not support migration from the source.
> 
> --
> Erik Skultety
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node)
  2020-04-15  7:42   ` Erik Skultety
@ 2020-04-15  9:02     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-15  9:02 UTC (permalink / raw)
  To: Erik Skultety
  Cc: intel-gvt-dev, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, corbet, Liu, Yi L,
	Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue, Tian, Kevin,
	Zeng, Xin, dgilbert, zhenyuw, dinechin, Liu, Changpeng, cohuck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Wed, Apr 15, 2020 at 03:42:58PM +0800, Erik Skultety wrote:
> On Mon, Apr 13, 2020 at 01:55:04AM -0400, Yan Zhao wrote:
> > migration_version attribute is used to check migration compatibility
> > between two mdev devices of the same mdev type.
> > The key is that it's rw and its data is opaque to userspace.
> >
> > Userspace reads migration_version of mdev device at source side and
> > writes the value to migration_version attribute of mdev device at target
> > side. It judges migration compatibility according to whether the read
> > and write operations succeed or fail.
> >
> > Currently, it is able to read/write migration_version attribute under two
> > places:
> >
> > (1) under mdev_type node
> > userspace is able to know whether two mdev devices are compatible before
> > a mdev device is created.
> >
> > userspace also needs to check whether the two mdev devices are of the same
> > mdev type before checking the migration_version attribute. It also needs
> > to check device creation parameters if aggregation is supported in future.
> >
> > (2) under mdev device node
> > userspace is able to know whether two mdev devices are compatible after
> > they are all created. But it does not need to check mdev type and device
> > creation parameter for aggregation as device vendor driver would have
> > incorporated those information into the migration_version attribute.
> >
> >              __    userspace
> >               /\              \
> >              /                 \write
> >             / read              \
> >    ________/__________       ___\|/_____________
> >   | migration_version |     | migration_version |-->check migration
> >   ---------------------     ---------------------   compatibility
> >     mdev device A               mdev device B
> >
> > This patch is for mdev documentation about the second place (under
> > mdev device node)
> >
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Erik Skultety <eskultet@redhat.com>
> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > Cc: Neo Jia <cjia@nvidia.com>
> > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Cc: Christophe de Dinechin <dinechin@redhat.com>
> >
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  .../driver-api/vfio-mediated-device.rst       | 70 +++++++++++++++++++
> >  1 file changed, 70 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> > index 2d1f3c0f3c8f..efbadfd51b7e 100644
> > --- a/Documentation/driver-api/vfio-mediated-device.rst
> > +++ b/Documentation/driver-api/vfio-mediated-device.rst
> > @@ -383,6 +383,7 @@ Directories and Files Under the sysfs for Each mdev Device
> >           |--- remove
> >           |--- mdev_type {link to its type}
> >           |--- vendor-specific-attributes [optional]
> > +         |--- migration_verion [optional]
> >
> >  * remove (write only)
> >
> > @@ -394,6 +395,75 @@ Example::
> >
> >  	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
> >
> > +* migration_version (rw, optional)
> 
> Hmm, ^this is not consistent with how patch 1/5 reports this information, but
> looking at the existing docs we're not doing very well in terms of consistency
> there either.
> 
> I suggest we go with "(read-write)" in both patch 1/5 and here and then start
> the paragraph with "This is an optional attribute."
>
ok. got it.

> > +  It is used to check migration compatibility between two mdev devices.
> > +  Absence of this attribute means the mdev device does not support migration.
> > +
> > +  This attribute provides a way to check migration compatibility between two
> > +  mdev devices from userspace after device created. The intended usage is
> 
> after the target device has been created.
> 
> side note: maybe add something like "(see the migration_version attribute of
> the device node if the target device already exists)" in the same section in
> patch 1/5.

ok. good idea.
> 
> > +  for userspace to read the migration_version attribute from one mdev device and
> > +  then writing that value to the migration_version attribute of the other mdev
> > +  device. The second mdev device indicates compatibility via the return code of
> > +  the write operation. This makes compatibility between mdev devices completely
> > +  vendor-defined and opaque to userspace. Userspace should do nothing more
> > +  than use the migration_version attribute to confirm source to target
> > +  compatibility.
> 
> ...
> 
> > +
> > +  Reading/Writing Attribute Data:
> > +  read(2) will fail if a mdev device does not support migration and otherwise
> > +        succeed and return migration_version string of the mdev device.
> > +
> > +        This migration_version string is vendor defined and opaque to the
> > +        userspace. Vendor is free to include whatever they feel is relevant.
> > +        e.g. <pciid of parent device>-<software version>.
> > +
> > +        Restrictions on this migration_version string:
> > +            1. It should only contain ascii characters
> > +            2. MAX Length is PATH_MAX (4096)
> > +
> > +  write(2) expects migration_version string of source mdev device, and will
> > +         succeed if it is determined to be compatible and otherwise fail with
> > +         vendor specific errno.
> > +
> > +  Errno:
> > +  -An errno on read(2) indicates the mdev devicedoes not support migration;
> 
> s/devicedoes/device does/
> 
sorry for such kind of errors.

> > +  -An errno on write(2) indicates the mdev devices are incompatible or the
> > +   target doesn't support migration.
> > +  Vendor driver is free to define specific errno and is suggested to
> > +  print detailed error in syslog for diagnose purpose.
> > +
> > +  Userspace should treat ANY of below conditions as two mdev devices not
> > +  compatible:
> > +  (1) any one of the two mdev devices does not have a migration_version
> > +  attribute
> > +  (2) error when reading from migration_version attribute of one mdev device
> > +  (3) error when writing migration_version string of one mdev device to
> > +  migration_version attribute of the other mdev device
> > +
> > +  Userspace should regard two mdev devices compatible when ALL of below
> > +  conditions are met:
> > +  (1) success when reading from migration_version attribute of one mdev device.
> > +  (2) success when writing migration_version string of one mdev device to
> > +  migration_version attribute of the other mdev device.
> > +
> > +  Example Usage:
> > +  (1) Retrieve the mdev source migration_version:
> > +
> > +  # cat /sys/bus/mdev/devices/$mdev_UUID1/migration_version
> > +
> > +  If reading the source migration_version generates an error, migration is not
> > +  possible.
> > +
> > +  (2) Test source migration_version at target:
> > +
> > +  Given a migration_version as outlined above, its compatibility to an
> > +  instantiated device of the same mdev type can be tested as:
> > +  # echo $VERSION > /sys/bus/mdev/devices/$mdev_UUID2/migration_version
> > +
> > +  If this write fails, the source and target migration versions are not
> > +  compatible or the target does not support migration.
> > +
> > +
> >  Mediated device Hot plug
> >  ------------------------
> 
> Overall, the same comments as in 1/5 apply text-wise.
> 

got it. will align it with the first patch.

Thanks
Yan

> Regards,
> --
> Erik Skultety
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
                   ` (3 preceding siblings ...)
  2020-04-13  5:55 ` [PATCH v5 4/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
@ 2020-04-17  8:44 ` Cornelia Huck
  2020-04-17  9:52   ` Yan Zhao
  4 siblings, 1 reply; 40+ messages in thread
From: Cornelia Huck @ 2020-04-17  8:44 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, libvir-list, kvm, linux-doc, linux-kernel, aik,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, eauger, yi.l.liu,
	xin.zeng, ziye.yang, mlevitsk, pasic, felipe, changpeng.liu,
	Ken.Xue, jonathan.davies, shaopeng.he, alex.williamson, eskultet,
	dgilbert, kevin.tian, zhenyuw, zhi.a.wang, cjia, kwankhede,
	berrange, dinechin, corbet

On Mon, 13 Apr 2020 01:52:01 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> This patchset introduces a migration_version attribute under sysfs of VFIO
> Mediated devices.
> 
> This migration_version attribute is used to check migration compatibility
> between two mdev devices.
> 
> Currently, it has two locations:
> (1) under mdev_type node,
>     which can be used even before device creation, but only for mdev
>     devices of the same mdev type.
> (2) under mdev device node,
>     which can only be used after the mdev devices are created, but the src
>     and target mdev devices are not necessarily be of the same mdev type
> (The second location is newly added in v5, in order to keep consistent
> with the migration_version node for migratable pass-though devices)

What is the relationship between those two attributes?

Is existence (and compatibility) of (1) a pre-req for possible
existence (and compatibility) of (2)?

Does userspace need to check (1) or can it completely rely on (2), if
it so chooses?

If devices with a different mdev type are indeed compatible, it seems
userspace can only find out after the devices have actually been
created, as (1) does not apply?

One of my worries is that the existence of an attribute with the same
name in two similar locations might lead to confusion. But maybe it
isn't a problem.

> 
> Patch 1 defines migration_version attribute for the first location in
> Documentation/vfio-mediated-device.txt
> 
> Patch 2 uses GVT as an example for patch 1 to show how to expose
> migration_version attribute and check migration compatibility in vendor
> driver.
> 
> Patch 3 defines migration_version attribute for the second location in
> Documentation/vfio-mediated-device.txt
> 
> Patch 4 uses GVT as an example for patch 3 to show how to expose
> migration_version attribute and check migration compatibility in vendor
> driver.
> 
> (The previous "Reviewed-by" and "Acked-by" for patch 1 and patch 2 are
> kept in v5, as there are only small changes to commit messages of the two
> patches.)
> 
> v5:
> added patch 2 and 4 for mdev device part of migration_version attribute.
> 
> v4:
> 1. fixed indentation/spell errors, reworded several error messages
> 2. added a missing memory free for error handling in patch 2
> 
> v3:
> 1. renamed version to migration_version
> 2. let errno to be freely defined by vendor driver
> 3. let checking mdev_type be prerequisite of migration compatibility check
> 4. reworded most part of patch 1
> 5. print detailed error log in patch 2 and generate migration_version
> string at init time
> 
> v2:
> 1. renamed patched 1
> 2. made definition of device version string completely private to vendor
> driver
> 3. reverted changes to sample mdev drivers
> 4. described intent and usage of version attribute more clearly.
> 
> 
> Yan Zhao (4):
>   vfio/mdev: add migration_version attribute for mdev (under mdev_type
>     node)
>   drm/i915/gvt: export migration_version to mdev sysfs (under mdev_type
>     node)
>   vfio/mdev: add migration_version attribute for mdev (under mdev device
>     node)
>   drm/i915/gvt: export migration_version to mdev sysfs (under mdev
>     device node)
> 
>  .../driver-api/vfio-mediated-device.rst       | 183 ++++++++++++++++++
>  drivers/gpu/drm/i915/gvt/Makefile             |   2 +-
>  drivers/gpu/drm/i915/gvt/gvt.c                |  39 ++++
>  drivers/gpu/drm/i915/gvt/gvt.h                |   7 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c              |  55 ++++++
>  drivers/gpu/drm/i915/gvt/migration_version.c  | 170 ++++++++++++++++
>  drivers/gpu/drm/i915/gvt/vgpu.c               |  13 +-
>  7 files changed, 466 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-17  8:44 ` [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Cornelia Huck
@ 2020-04-17  9:52   ` Yan Zhao
  2020-04-17 11:24     ` Cornelia Huck
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-17  9:52 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: intel-gvt-dev, libvir-list, kvm, linux-doc, linux-kernel, aik,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, eauger, Liu, Yi L, Zeng,
	Xin, Yang, Ziye, mlevitsk, pasic, felipe, Liu, Changpeng,
	Ken.Xue, jonathan.davies, He, Shaopeng, alex.williamson,
	eskultet, dgilbert, Tian, Kevin, zhenyuw, Wang, Zhi A, cjia,
	kwankhede, berrange, dinechin, corbet

On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> On Mon, 13 Apr 2020 01:52:01 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > This patchset introduces a migration_version attribute under sysfs of VFIO
> > Mediated devices.
> > 
> > This migration_version attribute is used to check migration compatibility
> > between two mdev devices.
> > 
> > Currently, it has two locations:
> > (1) under mdev_type node,
> >     which can be used even before device creation, but only for mdev
> >     devices of the same mdev type.
> > (2) under mdev device node,
> >     which can only be used after the mdev devices are created, but the src
> >     and target mdev devices are not necessarily be of the same mdev type
> > (The second location is newly added in v5, in order to keep consistent
> > with the migration_version node for migratable pass-though devices)
> 
> What is the relationship between those two attributes?
> 
(1) is for mdev devices specifically, and (2) is provided to keep the same
sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
non-mdev devices.

in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
is binding to vfio-pci, but is able to register migration region and do
migration transactions from a vendor provided affiliate driver),
the vendor driver would export (2) directly, under device node.
It is not able to provide (1) as there're no mdev devices involved.

> Is existence (and compatibility) of (1) a pre-req for possible
> existence (and compatibility) of (2)?
>
no. (2) does not reply on (1).

> Does userspace need to check (1) or can it completely rely on (2), if
> it so chooses?
>
I think it can completely reply on (2) if compatibility check before
mdev creation is not required.

> If devices with a different mdev type are indeed compatible, it seems
> userspace can only find out after the devices have actually been
> created, as (1) does not apply?
yes, I think so. 

> One of my worries is that the existence of an attribute with the same
> name in two similar locations might lead to confusion. But maybe it
> isn't a problem.
>
Yes, I have the same feeling. but as (2) is for sysfs interface
consistency, to make it transparent to userspace tools like libvirt,
I guess the same name is necessary?

Thanks
Yan
> > 
> > Patch 1 defines migration_version attribute for the first location in
> > Documentation/vfio-mediated-device.txt
> > 
> > Patch 2 uses GVT as an example for patch 1 to show how to expose
> > migration_version attribute and check migration compatibility in vendor
> > driver.
> > 
> > Patch 3 defines migration_version attribute for the second location in
> > Documentation/vfio-mediated-device.txt
> > 
> > Patch 4 uses GVT as an example for patch 3 to show how to expose
> > migration_version attribute and check migration compatibility in vendor
> > driver.
> > 
> > (The previous "Reviewed-by" and "Acked-by" for patch 1 and patch 2 are
> > kept in v5, as there are only small changes to commit messages of the two
> > patches.)
> > 
> > v5:
> > added patch 2 and 4 for mdev device part of migration_version attribute.
> > 
> > v4:
> > 1. fixed indentation/spell errors, reworded several error messages
> > 2. added a missing memory free for error handling in patch 2
> > 
> > v3:
> > 1. renamed version to migration_version
> > 2. let errno to be freely defined by vendor driver
> > 3. let checking mdev_type be prerequisite of migration compatibility check
> > 4. reworded most part of patch 1
> > 5. print detailed error log in patch 2 and generate migration_version
> > string at init time
> > 
> > v2:
> > 1. renamed patched 1
> > 2. made definition of device version string completely private to vendor
> > driver
> > 3. reverted changes to sample mdev drivers
> > 4. described intent and usage of version attribute more clearly.
> > 
> > 
> > Yan Zhao (4):
> >   vfio/mdev: add migration_version attribute for mdev (under mdev_type
> >     node)
> >   drm/i915/gvt: export migration_version to mdev sysfs (under mdev_type
> >     node)
> >   vfio/mdev: add migration_version attribute for mdev (under mdev device
> >     node)
> >   drm/i915/gvt: export migration_version to mdev sysfs (under mdev
> >     device node)
> > 
> >  .../driver-api/vfio-mediated-device.rst       | 183 ++++++++++++++++++
> >  drivers/gpu/drm/i915/gvt/Makefile             |   2 +-
> >  drivers/gpu/drm/i915/gvt/gvt.c                |  39 ++++
> >  drivers/gpu/drm/i915/gvt/gvt.h                |   7 +
> >  drivers/gpu/drm/i915/gvt/kvmgt.c              |  55 ++++++
> >  drivers/gpu/drm/i915/gvt/migration_version.c  | 170 ++++++++++++++++
> >  drivers/gpu/drm/i915/gvt/vgpu.c               |  13 +-
> >  7 files changed, 466 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c
> > 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-17  9:52   ` Yan Zhao
@ 2020-04-17 11:24     ` Cornelia Huck
  2020-04-20  1:24       ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Cornelia Huck @ 2020-04-17 11:24 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, libvir-list, kvm, linux-doc, linux-kernel, aik,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, eauger, Liu, Yi L, Zeng,
	Xin, Yang, Ziye, mlevitsk, pasic, felipe, Liu, Changpeng,
	Ken.Xue, jonathan.davies, He, Shaopeng, alex.williamson,
	eskultet, dgilbert, Tian, Kevin, zhenyuw, Wang, Zhi A, cjia,
	kwankhede, berrange, dinechin, corbet

On Fri, 17 Apr 2020 05:52:02 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > On Mon, 13 Apr 2020 01:52:01 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > Mediated devices.
> > > 
> > > This migration_version attribute is used to check migration compatibility
> > > between two mdev devices.
> > > 
> > > Currently, it has two locations:
> > > (1) under mdev_type node,
> > >     which can be used even before device creation, but only for mdev
> > >     devices of the same mdev type.
> > > (2) under mdev device node,
> > >     which can only be used after the mdev devices are created, but the src
> > >     and target mdev devices are not necessarily be of the same mdev type
> > > (The second location is newly added in v5, in order to keep consistent
> > > with the migration_version node for migratable pass-though devices)  
> > 
> > What is the relationship between those two attributes?
> >   
> (1) is for mdev devices specifically, and (2) is provided to keep the same
> sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
> non-mdev devices.
> 
> in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> is binding to vfio-pci, but is able to register migration region and do
> migration transactions from a vendor provided affiliate driver),
> the vendor driver would export (2) directly, under device node.
> It is not able to provide (1) as there're no mdev devices involved.

Ok, creating an alternate attribute for non-mdev devices makes sense.
However, wouldn't that rather be a case (3)? The change here only
refers to mdev devices.

> 
> > Is existence (and compatibility) of (1) a pre-req for possible
> > existence (and compatibility) of (2)?
> >  
> no. (2) does not reply on (1).

Hm. Non-existence of (1) seems to imply "this type does not support
migration". If an mdev created for such a type suddenly does support
migration, it feels a bit odd.

(It obviously cannot be a prereq for what I called (3) above.)

> 
> > Does userspace need to check (1) or can it completely rely on (2), if
> > it so chooses?
> >  
> I think it can completely reply on (2) if compatibility check before
> mdev creation is not required.
> 
> > If devices with a different mdev type are indeed compatible, it seems
> > userspace can only find out after the devices have actually been
> > created, as (1) does not apply?  
> yes, I think so. 

How useful would it be for userspace to even look at (1) in that case?
It only knows if things have a chance of working if it actually goes
ahead and creates devices.

> 
> > One of my worries is that the existence of an attribute with the same
> > name in two similar locations might lead to confusion. But maybe it
> > isn't a problem.
> >  
> Yes, I have the same feeling. but as (2) is for sysfs interface
> consistency, to make it transparent to userspace tools like libvirt,
> I guess the same name is necessary?

What do we actually need here, I wonder? (1) and (2) seem to serve
slightly different purposes, while (2) and what I called (3) have the
same purpose. Is it important to userspace that (1) and (2) have the
same name?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-17 11:24     ` Cornelia Huck
@ 2020-04-20  1:24       ` Yan Zhao
  2020-04-20 22:56         ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-20  1:24 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: intel-gvt-dev, libvir-list, kvm, linux-doc, linux-kernel, aik,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, eauger, Liu, Yi L, Zeng,
	Xin, Yang, Ziye, mlevitsk, pasic, felipe, Liu, Changpeng,
	Ken.Xue, jonathan.davies, He, Shaopeng, alex.williamson,
	eskultet, dgilbert, Tian, Kevin, zhenyuw, Wang, Zhi A, cjia,
	kwankhede, berrange, dinechin, corbet

On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> On Fri, 17 Apr 2020 05:52:02 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > Mediated devices.
> > > > 
> > > > This migration_version attribute is used to check migration compatibility
> > > > between two mdev devices.
> > > > 
> > > > Currently, it has two locations:
> > > > (1) under mdev_type node,
> > > >     which can be used even before device creation, but only for mdev
> > > >     devices of the same mdev type.
> > > > (2) under mdev device node,
> > > >     which can only be used after the mdev devices are created, but the src
> > > >     and target mdev devices are not necessarily be of the same mdev type
> > > > (The second location is newly added in v5, in order to keep consistent
> > > > with the migration_version node for migratable pass-though devices)  
> > > 
> > > What is the relationship between those two attributes?
> > >   
> > (1) is for mdev devices specifically, and (2) is provided to keep the same
> > sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
> > non-mdev devices.
> > 
> > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > is binding to vfio-pci, but is able to register migration region and do
> > migration transactions from a vendor provided affiliate driver),
> > the vendor driver would export (2) directly, under device node.
> > It is not able to provide (1) as there're no mdev devices involved.
> 
> Ok, creating an alternate attribute for non-mdev devices makes sense.
> However, wouldn't that rather be a case (3)? The change here only
> refers to mdev devices.
>
as you pointed below, (3) and (2) serve the same purpose. 
and I think a possible usage is to migrate between a non-mdev device and
an mdev device. so I think it's better for them both to use (2) rather
than creating (3).
> > 
> > > Is existence (and compatibility) of (1) a pre-req for possible
> > > existence (and compatibility) of (2)?
> > >  
> > no. (2) does not reply on (1).
> 
> Hm. Non-existence of (1) seems to imply "this type does not support
> migration". If an mdev created for such a type suddenly does support
> migration, it feels a bit odd.
> 
yes. but I think if the condition happens, it should be reported a bug
to vendor driver.
should I add a line in the doc like "vendor driver should ensure that the
migration compatibility from migration_version under mdev_type should be
consistent with that from migration_version under device node" ?

> (It obviously cannot be a prereq for what I called (3) above.)
> 
> > 
> > > Does userspace need to check (1) or can it completely rely on (2), if
> > > it so chooses?
> > >  
> > I think it can completely reply on (2) if compatibility check before
> > mdev creation is not required.
> > 
> > > If devices with a different mdev type are indeed compatible, it seems
> > > userspace can only find out after the devices have actually been
> > > created, as (1) does not apply?  
> > yes, I think so. 
> 
> How useful would it be for userspace to even look at (1) in that case?
> It only knows if things have a chance of working if it actually goes
> ahead and creates devices.
>
hmm, is it useful for userspace to test the migration_version under mdev
type before it knows what mdev device to generate ?
like when the userspace wants to migrate an mdev device in src vm,
but it has not created target vm and the target mdev device.

> > 
> > > One of my worries is that the existence of an attribute with the same
> > > name in two similar locations might lead to confusion. But maybe it
> > > isn't a problem.
> > >  
> > Yes, I have the same feeling. but as (2) is for sysfs interface
> > consistency, to make it transparent to userspace tools like libvirt,
> > I guess the same name is necessary?
> 
> What do we actually need here, I wonder? (1) and (2) seem to serve
> slightly different purposes, while (2) and what I called (3) have the
> same purpose. Is it important to userspace that (1) and (2) have the
> same name?
so change (1) to migration_type_version and (2) to
migration_instance_version?
But as they are under different locations, could that location imply
enough information?


Thanks
Yan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-20  1:24       ` Yan Zhao
@ 2020-04-20 22:56         ` Alex Williamson
  2020-04-21  2:37           ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-04-20 22:56 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Cornelia Huck, intel-gvt-dev, libvir-list, kvm, linux-doc,
	linux-kernel, aik, Zhengxiao.zx, shuangtai.tst, qemu-devel,
	eauger, Liu, Yi L, Zeng, Xin, Yang, Ziye, mlevitsk, pasic,
	felipe, Liu, Changpeng, Ken.Xue, jonathan.davies, He, Shaopeng,
	eskultet, dgilbert, Tian, Kevin, zhenyuw, Wang, Zhi A, cjia,
	kwankhede, berrange, dinechin, corbet

On Sun, 19 Apr 2020 21:24:57 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > On Fri, 17 Apr 2020 05:52:02 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:  
> > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >     
> > > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > > Mediated devices.
> > > > > 
> > > > > This migration_version attribute is used to check migration compatibility
> > > > > between two mdev devices.
> > > > > 
> > > > > Currently, it has two locations:
> > > > > (1) under mdev_type node,
> > > > >     which can be used even before device creation, but only for mdev
> > > > >     devices of the same mdev type.
> > > > > (2) under mdev device node,
> > > > >     which can only be used after the mdev devices are created, but the src
> > > > >     and target mdev devices are not necessarily be of the same mdev type
> > > > > (The second location is newly added in v5, in order to keep consistent
> > > > > with the migration_version node for migratable pass-though devices)    
> > > > 
> > > > What is the relationship between those two attributes?
> > > >     
> > > (1) is for mdev devices specifically, and (2) is provided to keep the same
> > > sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
> > > non-mdev devices.
> > > 
> > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > is binding to vfio-pci, but is able to register migration region and do
> > > migration transactions from a vendor provided affiliate driver),
> > > the vendor driver would export (2) directly, under device node.
> > > It is not able to provide (1) as there're no mdev devices involved.  
> > 
> > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > However, wouldn't that rather be a case (3)? The change here only
> > refers to mdev devices.
> >  
> as you pointed below, (3) and (2) serve the same purpose. 
> and I think a possible usage is to migrate between a non-mdev device and
> an mdev device. so I think it's better for them both to use (2) rather
> than creating (3).

An mdev type is meant to define a software compatible interface, so in
the case of mdev->mdev migration, doesn't migrating to a different type
fail the most basic of compatibility tests that we expect userspace to
perform?  IOW, if two mdev types are migration compatible, it seems a
prerequisite to that is that they provide the same software interface,
which means they should be the same mdev type.

In the hybrid cases of mdev->phys or phys->mdev, how does a management
tool begin to even guess what might be compatible?  Are we expecting
libvirt to probe ever device with this attribute in the system?  Is
there going to be a new class hierarchy created to enumerate all
possible migrate-able devices?

I agree that there was a gap in the previous proposal for non-mdev
devices, but I think this bring a lot of questions that we need to
puzzle through and libvirt will need to re-evaluate how they might
decide to pick a migration target device.  For example, I'm sure
libvirt would reject any policy decisions regarding picking a physical
device versus an mdev device.  Had we previously left it that only a
layer above libvirt would select a target device and libvirt only tests
compatibility to that target device?

We also need to consider that this expands the namespace.  If we no
longer require matching types as the first level of comparison, then
vendor migration strings can theoretically collide.  How do we
coordinate that can't happen?  Thanks,

Alex

> > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > existence (and compatibility) of (2)?
> > > >    
> > > no. (2) does not reply on (1).  
> > 
> > Hm. Non-existence of (1) seems to imply "this type does not support
> > migration". If an mdev created for such a type suddenly does support
> > migration, it feels a bit odd.
> >   
> yes. but I think if the condition happens, it should be reported a bug
> to vendor driver.
> should I add a line in the doc like "vendor driver should ensure that the
> migration compatibility from migration_version under mdev_type should be
> consistent with that from migration_version under device node" ?
> 
> > (It obviously cannot be a prereq for what I called (3) above.)
> >   
> > >   
> > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > it so chooses?
> > > >    
> > > I think it can completely reply on (2) if compatibility check before
> > > mdev creation is not required.
> > >   
> > > > If devices with a different mdev type are indeed compatible, it seems
> > > > userspace can only find out after the devices have actually been
> > > > created, as (1) does not apply?    
> > > yes, I think so.   
> > 
> > How useful would it be for userspace to even look at (1) in that case?
> > It only knows if things have a chance of working if it actually goes
> > ahead and creates devices.
> >  
> hmm, is it useful for userspace to test the migration_version under mdev
> type before it knows what mdev device to generate ?
> like when the userspace wants to migrate an mdev device in src vm,
> but it has not created target vm and the target mdev device.
> 
> > >   
> > > > One of my worries is that the existence of an attribute with the same
> > > > name in two similar locations might lead to confusion. But maybe it
> > > > isn't a problem.
> > > >    
> > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > consistency, to make it transparent to userspace tools like libvirt,
> > > I guess the same name is necessary?  
> > 
> > What do we actually need here, I wonder? (1) and (2) seem to serve
> > slightly different purposes, while (2) and what I called (3) have the
> > same purpose. Is it important to userspace that (1) and (2) have the
> > same name?  
> so change (1) to migration_type_version and (2) to
> migration_instance_version?
> But as they are under different locations, could that location imply
> enough information?
> 
> 
> Thanks
> Yan
> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-20 22:56         ` Alex Williamson
@ 2020-04-21  2:37           ` Yan Zhao
  2020-04-21 12:08             ` Tian, Kevin
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-21  2:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, intel-gvt-dev, libvir-list, kvm, linux-doc,
	linux-kernel, aik, Zhengxiao.zx, shuangtai.tst, qemu-devel,
	eauger, Liu, Yi L, Zeng, Xin, Yang, Ziye, mlevitsk, pasic,
	felipe, Liu, Changpeng, Ken.Xue, jonathan.davies, He, Shaopeng,
	eskultet, dgilbert, Tian, Kevin, zhenyuw, Wang, Zhi A, cjia,
	kwankhede, berrange, dinechin, corbet

On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> On Sun, 19 Apr 2020 21:24:57 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:  
> > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >     
> > > > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > > > Mediated devices.
> > > > > > 
> > > > > > This migration_version attribute is used to check migration compatibility
> > > > > > between two mdev devices.
> > > > > > 
> > > > > > Currently, it has two locations:
> > > > > > (1) under mdev_type node,
> > > > > >     which can be used even before device creation, but only for mdev
> > > > > >     devices of the same mdev type.
> > > > > > (2) under mdev device node,
> > > > > >     which can only be used after the mdev devices are created, but the src
> > > > > >     and target mdev devices are not necessarily be of the same mdev type
> > > > > > (The second location is newly added in v5, in order to keep consistent
> > > > > > with the migration_version node for migratable pass-though devices)    
> > > > > 
> > > > > What is the relationship between those two attributes?
> > > > >     
> > > > (1) is for mdev devices specifically, and (2) is provided to keep the same
> > > > sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
> > > > non-mdev devices.
> > > > 
> > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > is binding to vfio-pci, but is able to register migration region and do
> > > > migration transactions from a vendor provided affiliate driver),
> > > > the vendor driver would export (2) directly, under device node.
> > > > It is not able to provide (1) as there're no mdev devices involved.  
> > > 
> > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > However, wouldn't that rather be a case (3)? The change here only
> > > refers to mdev devices.
> > >  
> > as you pointed below, (3) and (2) serve the same purpose. 
> > and I think a possible usage is to migrate between a non-mdev device and
> > an mdev device. so I think it's better for them both to use (2) rather
> > than creating (3).
> 
> An mdev type is meant to define a software compatible interface, so in
> the case of mdev->mdev migration, doesn't migrating to a different type
> fail the most basic of compatibility tests that we expect userspace to
> perform?  IOW, if two mdev types are migration compatible, it seems a
> prerequisite to that is that they provide the same software interface,
> which means they should be the same mdev type.
> 
> In the hybrid cases of mdev->phys or phys->mdev, how does a management
> tool begin to even guess what might be compatible?  Are we expecting
> libvirt to probe ever device with this attribute in the system?  Is
> there going to be a new class hierarchy created to enumerate all
> possible migrate-able devices?
>
yes, management tool needs to guess and test migration compatible
between two devices. But I think it's not the problem only for
mdev->phys or phys->mdev. even for mdev->mdev, management tool needs to
first assume that the two mdevs have the same type of parent devices
(e.g.their pciids are equal). otherwise, it's still enumerating
possibilities.

on the other hand, for two mdevs,
mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
if pdev2 is exactly 2 times of pdev1, why not allow migration between
mdev1 <-> mdev2.


> I agree that there was a gap in the previous proposal for non-mdev
> devices, but I think this bring a lot of questions that we need to
> puzzle through and libvirt will need to re-evaluate how they might
> decide to pick a migration target device.  For example, I'm sure
> libvirt would reject any policy decisions regarding picking a physical
> device versus an mdev device.  Had we previously left it that only a
> layer above libvirt would select a target device and libvirt only tests
> compatibility to that target device?
I'm not sure if there's a layer above libvirt would select a target
device. but if there is such a layer (even it's human), we need to
provide an interface for them to know whether their decision is suitable
for migration. The migration_version interface provides a potential to
allow mdev->phys migration, even libvirt may currently reject it.


> We also need to consider that this expands the namespace.  If we no
> longer require matching types as the first level of comparison, then
> vendor migration strings can theoretically collide.  How do we
> coordinate that can't happen?  Thanks,
yes, it's indeed a problem.
could only allowing migration beteen devices from the same vendor be a good
prerequisite?

Thanks
Yan
> 
> > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > existence (and compatibility) of (2)?
> > > > >    
> > > > no. (2) does not reply on (1).  
> > > 
> > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > migration". If an mdev created for such a type suddenly does support
> > > migration, it feels a bit odd.
> > >   
> > yes. but I think if the condition happens, it should be reported a bug
> > to vendor driver.
> > should I add a line in the doc like "vendor driver should ensure that the
> > migration compatibility from migration_version under mdev_type should be
> > consistent with that from migration_version under device node" ?
> > 
> > > (It obviously cannot be a prereq for what I called (3) above.)
> > >   
> > > >   
> > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > it so chooses?
> > > > >    
> > > > I think it can completely reply on (2) if compatibility check before
> > > > mdev creation is not required.
> > > >   
> > > > > If devices with a different mdev type are indeed compatible, it seems
> > > > > userspace can only find out after the devices have actually been
> > > > > created, as (1) does not apply?    
> > > > yes, I think so.   
> > > 
> > > How useful would it be for userspace to even look at (1) in that case?
> > > It only knows if things have a chance of working if it actually goes
> > > ahead and creates devices.
> > >  
> > hmm, is it useful for userspace to test the migration_version under mdev
> > type before it knows what mdev device to generate ?
> > like when the userspace wants to migrate an mdev device in src vm,
> > but it has not created target vm and the target mdev device.
> > 
> > > >   
> > > > > One of my worries is that the existence of an attribute with the same
> > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > isn't a problem.
> > > > >    
> > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > I guess the same name is necessary?  
> > > 
> > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > slightly different purposes, while (2) and what I called (3) have the
> > > same purpose. Is it important to userspace that (1) and (2) have the
> > > same name?  
> > so change (1) to migration_type_version and (2) to
> > migration_instance_version?
> > But as they are under different locations, could that location imply
> > enough information?
> > 
> > 
> > Thanks
> > Yan
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-21  2:37           ` Yan Zhao
@ 2020-04-21 12:08             ` Tian, Kevin
  2020-04-22  7:36               ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Tian, Kevin @ 2020-04-21 12:08 UTC (permalink / raw)
  To: Zhao, Yan Y, Alex Williamson
  Cc: cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx, shuangtai.tst,
	qemu-devel, kwankhede, eauger, corbet, Liu, Yi L, eskultet, Yang,
	Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue, Zeng, Xin, dgilbert,
	zhenyuw, dinechin, intel-gvt-dev, Liu, Changpeng, berrange,
	Cornelia Huck, linux-kernel, Wang, Zhi A, jonathan.davies, He,
	Shaopeng

> From: Yan Zhao
> Sent: Tuesday, April 21, 2020 10:37 AM
> 
> On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > On Sun, 19 Apr 2020 21:24:57 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >
> > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >
> > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > >
> > > > > > > This patchset introduces a migration_version attribute under sysfs
> of VFIO
> > > > > > > Mediated devices.
> > > > > > >
> > > > > > > This migration_version attribute is used to check migration
> compatibility
> > > > > > > between two mdev devices.
> > > > > > >
> > > > > > > Currently, it has two locations:
> > > > > > > (1) under mdev_type node,
> > > > > > >     which can be used even before device creation, but only for
> mdev
> > > > > > >     devices of the same mdev type.
> > > > > > > (2) under mdev device node,
> > > > > > >     which can only be used after the mdev devices are created, but
> the src
> > > > > > >     and target mdev devices are not necessarily be of the same
> mdev type
> > > > > > > (The second location is newly added in v5, in order to keep
> consistent
> > > > > > > with the migration_version node for migratable pass-though
> devices)
> > > > > >
> > > > > > What is the relationship between those two attributes?
> > > > > >
> > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> same
> > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> devices and
> > > > > non-mdev devices.
> > > > >
> > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > migration transactions from a vendor provided affiliate driver),
> > > > > the vendor driver would export (2) directly, under device node.
> > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > >
> > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > However, wouldn't that rather be a case (3)? The change here only
> > > > refers to mdev devices.
> > > >
> > > as you pointed below, (3) and (2) serve the same purpose.
> > > and I think a possible usage is to migrate between a non-mdev device and
> > > an mdev device. so I think it's better for them both to use (2) rather
> > > than creating (3).
> >
> > An mdev type is meant to define a software compatible interface, so in
> > the case of mdev->mdev migration, doesn't migrating to a different type
> > fail the most basic of compatibility tests that we expect userspace to
> > perform?  IOW, if two mdev types are migration compatible, it seems a
> > prerequisite to that is that they provide the same software interface,
> > which means they should be the same mdev type.
> >
> > In the hybrid cases of mdev->phys or phys->mdev, how does a
> management
> > tool begin to even guess what might be compatible?  Are we expecting
> > libvirt to probe ever device with this attribute in the system?  Is
> > there going to be a new class hierarchy created to enumerate all
> > possible migrate-able devices?
> >
> yes, management tool needs to guess and test migration compatible
> between two devices. But I think it's not the problem only for
> mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> to
> first assume that the two mdevs have the same type of parent devices
> (e.g.their pciids are equal). otherwise, it's still enumerating
> possibilities.
> 
> on the other hand, for two mdevs,
> mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> if pdev2 is exactly 2 times of pdev1, why not allow migration between
> mdev1 <-> mdev2.

How could the manage tool figure out that 1/2 of pdev1 is equivalent 
to 1/4 of pdev2? If we really want to allow such thing happen, the best
choice is to report the same mdev type on both pdev1 and pdev2.

btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
Can we simplify the requirement by allowing only mdev<->mdev and 
phys<->phys migration? If an customer does want to migrate between a 
mdev and phys, he could wrap physical device into a wrapped mdev 
instance (with the same type as the source mdev) instead of using vendor 
ops. Doing so does add some burden but if mdev<->phys is not dominant 
usage then such tradeoff might be worthywhile...

Thanks
Kevin

> 
> 
> > I agree that there was a gap in the previous proposal for non-mdev
> > devices, but I think this bring a lot of questions that we need to
> > puzzle through and libvirt will need to re-evaluate how they might
> > decide to pick a migration target device.  For example, I'm sure
> > libvirt would reject any policy decisions regarding picking a physical
> > device versus an mdev device.  Had we previously left it that only a
> > layer above libvirt would select a target device and libvirt only tests
> > compatibility to that target device?
> I'm not sure if there's a layer above libvirt would select a target
> device. but if there is such a layer (even it's human), we need to
> provide an interface for them to know whether their decision is suitable
> for migration. The migration_version interface provides a potential to
> allow mdev->phys migration, even libvirt may currently reject it.
> 
> 
> > We also need to consider that this expands the namespace.  If we no
> > longer require matching types as the first level of comparison, then
> > vendor migration strings can theoretically collide.  How do we
> > coordinate that can't happen?  Thanks,
> yes, it's indeed a problem.
> could only allowing migration beteen devices from the same vendor be a
> good
> prerequisite?
> 
> Thanks
> Yan
> >
> > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > existence (and compatibility) of (2)?
> > > > > >
> > > > > no. (2) does not reply on (1).
> > > >
> > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > migration". If an mdev created for such a type suddenly does support
> > > > migration, it feels a bit odd.
> > > >
> > > yes. but I think if the condition happens, it should be reported a bug
> > > to vendor driver.
> > > should I add a line in the doc like "vendor driver should ensure that the
> > > migration compatibility from migration_version under mdev_type should
> be
> > > consistent with that from migration_version under device node" ?
> > >
> > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > >
> > > > >
> > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > it so chooses?
> > > > > >
> > > > > I think it can completely reply on (2) if compatibility check before
> > > > > mdev creation is not required.
> > > > >
> > > > > > If devices with a different mdev type are indeed compatible, it
> seems
> > > > > > userspace can only find out after the devices have actually been
> > > > > > created, as (1) does not apply?
> > > > > yes, I think so.
> > > >
> > > > How useful would it be for userspace to even look at (1) in that case?
> > > > It only knows if things have a chance of working if it actually goes
> > > > ahead and creates devices.
> > > >
> > > hmm, is it useful for userspace to test the migration_version under mdev
> > > type before it knows what mdev device to generate ?
> > > like when the userspace wants to migrate an mdev device in src vm,
> > > but it has not created target vm and the target mdev device.
> > >
> > > > >
> > > > > > One of my worries is that the existence of an attribute with the
> same
> > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > isn't a problem.
> > > > > >
> > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > I guess the same name is necessary?
> > > >
> > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > slightly different purposes, while (2) and what I called (3) have the
> > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > same name?
> > > so change (1) to migration_type_version and (2) to
> > > migration_instance_version?
> > > But as they are under different locations, could that location imply
> > > enough information?
> > >
> > >
> > > Thanks
> > > Yan
> > >
> > >
> >
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-21 12:08             ` Tian, Kevin
@ 2020-04-22  7:36               ` Yan Zhao
  2020-04-24 19:10                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-22  7:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, corbet, Liu, Yi L,
	eskultet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Zeng, Xin, dgilbert, zhenyuw, dinechin, intel-gvt-dev, Liu,
	Changpeng, berrange, Cornelia Huck, linux-kernel, Wang, Zhi A,
	jonathan.davies, He, Shaopeng

On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > From: Yan Zhao
> > Sent: Tuesday, April 21, 2020 10:37 AM
> > 
> > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >
> > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > >
> > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > of VFIO
> > > > > > > > Mediated devices.
> > > > > > > >
> > > > > > > > This migration_version attribute is used to check migration
> > compatibility
> > > > > > > > between two mdev devices.
> > > > > > > >
> > > > > > > > Currently, it has two locations:
> > > > > > > > (1) under mdev_type node,
> > > > > > > >     which can be used even before device creation, but only for
> > mdev
> > > > > > > >     devices of the same mdev type.
> > > > > > > > (2) under mdev device node,
> > > > > > > >     which can only be used after the mdev devices are created, but
> > the src
> > > > > > > >     and target mdev devices are not necessarily be of the same
> > mdev type
> > > > > > > > (The second location is newly added in v5, in order to keep
> > consistent
> > > > > > > > with the migration_version node for migratable pass-though
> > devices)
> > > > > > >
> > > > > > > What is the relationship between those two attributes?
> > > > > > >
> > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > same
> > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > devices and
> > > > > > non-mdev devices.
> > > > > >
> > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > >
> > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > refers to mdev devices.
> > > > >
> > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > than creating (3).
> > >
> > > An mdev type is meant to define a software compatible interface, so in
> > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > fail the most basic of compatibility tests that we expect userspace to
> > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > prerequisite to that is that they provide the same software interface,
> > > which means they should be the same mdev type.
> > >
> > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > management
> > > tool begin to even guess what might be compatible?  Are we expecting
> > > libvirt to probe ever device with this attribute in the system?  Is
> > > there going to be a new class hierarchy created to enumerate all
> > > possible migrate-able devices?
> > >
> > yes, management tool needs to guess and test migration compatible
> > between two devices. But I think it's not the problem only for
> > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > to
> > first assume that the two mdevs have the same type of parent devices
> > (e.g.their pciids are equal). otherwise, it's still enumerating
> > possibilities.
> > 
> > on the other hand, for two mdevs,
> > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > mdev1 <-> mdev2.
> 
> How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> to 1/4 of pdev2? If we really want to allow such thing happen, the best
> choice is to report the same mdev type on both pdev1 and pdev2.
I think that's exactly the value of this migration_version interface.
the management tool can take advantage of this interface to know if two
devices are migration compatible, no matter they are mdevs, non-mdevs,
or mix.

as I know, (please correct me if not right), current libvirt still
requires manually generating mdev devices, and it just duplicates src vm
configuration to the target vm.
for libvirt, currently it's always phys->phys and mdev->mdev (and of the
same mdev type).
But it does not justify that hybrid cases should not be allowed. otherwise,
why do we need to introduce this migration_version interface and leave
the judgement of migration compatibility to vendor driver? why not simply
set the criteria to something like "pciids of parent devices are equal,
and mdev types are equal" ?


> btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
could you help me understand why it will bring trouble to upper stack?

I think it just needs to read src migration_version under src dev node,
and test it in target migration version under target dev node. 

after all, through this interface we just help the upper layer
knowing available options through reading and testing, and they decide
to use it or not.

> Can we simplify the requirement by allowing only mdev<->mdev and 
> phys<->phys migration? If an customer does want to migrate between a 
> mdev and phys, he could wrap physical device into a wrapped mdev 
> instance (with the same type as the source mdev) instead of using vendor 
> ops. Doing so does add some burden but if mdev<->phys is not dominant 
> usage then such tradeoff might be worthywhile...
>
If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
difference to phys<->mdev, right?
I think the vendor string for a mdev device is something like:
"Parent PCIID + mdev type + software version", and
that for a phys device is something like:
"PCIID + software version".
as long as we don't migrate between devices from different vendors, it's
easy for vendor driver to tell if a phys device is migration compatible
to a mdev device according it supports it or not.


Thanks
Yan
> 
> > 
> > 
> > > I agree that there was a gap in the previous proposal for non-mdev
> > > devices, but I think this bring a lot of questions that we need to
> > > puzzle through and libvirt will need to re-evaluate how they might
> > > decide to pick a migration target device.  For example, I'm sure
> > > libvirt would reject any policy decisions regarding picking a physical
> > > device versus an mdev device.  Had we previously left it that only a
> > > layer above libvirt would select a target device and libvirt only tests
> > > compatibility to that target device?
> > I'm not sure if there's a layer above libvirt would select a target
> > device. but if there is such a layer (even it's human), we need to
> > provide an interface for them to know whether their decision is suitable
> > for migration. The migration_version interface provides a potential to
> > allow mdev->phys migration, even libvirt may currently reject it.
> > 
> > 
> > > We also need to consider that this expands the namespace.  If we no
> > > longer require matching types as the first level of comparison, then
> > > vendor migration strings can theoretically collide.  How do we
> > > coordinate that can't happen?  Thanks,
> > yes, it's indeed a problem.
> > could only allowing migration beteen devices from the same vendor be a
> > good
> > prerequisite?
> > 
> > Thanks
> > Yan
> > >
> > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > existence (and compatibility) of (2)?
> > > > > > >
> > > > > > no. (2) does not reply on (1).
> > > > >
> > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > migration". If an mdev created for such a type suddenly does support
> > > > > migration, it feels a bit odd.
> > > > >
> > > > yes. but I think if the condition happens, it should be reported a bug
> > > > to vendor driver.
> > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > migration compatibility from migration_version under mdev_type should
> > be
> > > > consistent with that from migration_version under device node" ?
> > > >
> > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > >
> > > > > >
> > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > it so chooses?
> > > > > > >
> > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > mdev creation is not required.
> > > > > >
> > > > > > > If devices with a different mdev type are indeed compatible, it
> > seems
> > > > > > > userspace can only find out after the devices have actually been
> > > > > > > created, as (1) does not apply?
> > > > > > yes, I think so.
> > > > >
> > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > It only knows if things have a chance of working if it actually goes
> > > > > ahead and creates devices.
> > > > >
> > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > type before it knows what mdev device to generate ?
> > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > but it has not created target vm and the target mdev device.
> > > >
> > > > > >
> > > > > > > One of my worries is that the existence of an attribute with the
> > same
> > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > isn't a problem.
> > > > > > >
> > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > I guess the same name is necessary?
> > > > >
> > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > same name?
> > > > so change (1) to migration_type_version and (2) to
> > > > migration_instance_version?
> > > > But as they are under different locations, could that location imply
> > > > enough information?
> > > >
> > > >
> > > > Thanks
> > > > Yan
> > > >
> > > >
> > >
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-22  7:36               ` Yan Zhao
@ 2020-04-24 19:10                 ` Dr. David Alan Gilbert
  2020-04-26  1:36                   ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-04-24 19:10 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > From: Yan Zhao
> > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > 
> > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >
> > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > >
> > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > >
> > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > of VFIO
> > > > > > > > > Mediated devices.
> > > > > > > > >
> > > > > > > > > This migration_version attribute is used to check migration
> > > compatibility
> > > > > > > > > between two mdev devices.
> > > > > > > > >
> > > > > > > > > Currently, it has two locations:
> > > > > > > > > (1) under mdev_type node,
> > > > > > > > >     which can be used even before device creation, but only for
> > > mdev
> > > > > > > > >     devices of the same mdev type.
> > > > > > > > > (2) under mdev device node,
> > > > > > > > >     which can only be used after the mdev devices are created, but
> > > the src
> > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > mdev type
> > > > > > > > > (The second location is newly added in v5, in order to keep
> > > consistent
> > > > > > > > > with the migration_version node for migratable pass-though
> > > devices)
> > > > > > > >
> > > > > > > > What is the relationship between those two attributes?
> > > > > > > >
> > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > same
> > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > devices and
> > > > > > > non-mdev devices.
> > > > > > >
> > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > >
> > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > refers to mdev devices.
> > > > > >
> > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > than creating (3).
> > > >
> > > > An mdev type is meant to define a software compatible interface, so in
> > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > fail the most basic of compatibility tests that we expect userspace to
> > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > prerequisite to that is that they provide the same software interface,
> > > > which means they should be the same mdev type.
> > > >
> > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > management
> > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > there going to be a new class hierarchy created to enumerate all
> > > > possible migrate-able devices?
> > > >
> > > yes, management tool needs to guess and test migration compatible
> > > between two devices. But I think it's not the problem only for
> > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > to
> > > first assume that the two mdevs have the same type of parent devices
> > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > possibilities.
> > > 
> > > on the other hand, for two mdevs,
> > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > mdev1 <-> mdev2.
> > 
> > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > choice is to report the same mdev type on both pdev1 and pdev2.
> I think that's exactly the value of this migration_version interface.
> the management tool can take advantage of this interface to know if two
> devices are migration compatible, no matter they are mdevs, non-mdevs,
> or mix.
> 
> as I know, (please correct me if not right), current libvirt still
> requires manually generating mdev devices, and it just duplicates src vm
> configuration to the target vm.
> for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> same mdev type).
> But it does not justify that hybrid cases should not be allowed. otherwise,
> why do we need to introduce this migration_version interface and leave
> the judgement of migration compatibility to vendor driver? why not simply
> set the criteria to something like "pciids of parent devices are equal,
> and mdev types are equal" ?
> 
> 
> > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> could you help me understand why it will bring trouble to upper stack?
> 
> I think it just needs to read src migration_version under src dev node,
> and test it in target migration version under target dev node. 
> 
> after all, through this interface we just help the upper layer
> knowing available options through reading and testing, and they decide
> to use it or not.
> 
> > Can we simplify the requirement by allowing only mdev<->mdev and 
> > phys<->phys migration? If an customer does want to migrate between a 
> > mdev and phys, he could wrap physical device into a wrapped mdev 
> > instance (with the same type as the source mdev) instead of using vendor 
> > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > usage then such tradeoff might be worthywhile...
> >
> If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> difference to phys<->mdev, right?
> I think the vendor string for a mdev device is something like:
> "Parent PCIID + mdev type + software version", and
> that for a phys device is something like:
> "PCIID + software version".
> as long as we don't migrate between devices from different vendors, it's
> easy for vendor driver to tell if a phys device is migration compatible
> to a mdev device according it supports it or not.

It surprises me that the PCIID matching is a requirement; I'd assumed
with this clever mdev name setup that you could migrate between two
different models in a series, or to a newer model, as long as they
both supported the same mdev view.

Dave

> 
> Thanks
> Yan
> > 
> > > 
> > > 
> > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > devices, but I think this bring a lot of questions that we need to
> > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > decide to pick a migration target device.  For example, I'm sure
> > > > libvirt would reject any policy decisions regarding picking a physical
> > > > device versus an mdev device.  Had we previously left it that only a
> > > > layer above libvirt would select a target device and libvirt only tests
> > > > compatibility to that target device?
> > > I'm not sure if there's a layer above libvirt would select a target
> > > device. but if there is such a layer (even it's human), we need to
> > > provide an interface for them to know whether their decision is suitable
> > > for migration. The migration_version interface provides a potential to
> > > allow mdev->phys migration, even libvirt may currently reject it.
> > > 
> > > 
> > > > We also need to consider that this expands the namespace.  If we no
> > > > longer require matching types as the first level of comparison, then
> > > > vendor migration strings can theoretically collide.  How do we
> > > > coordinate that can't happen?  Thanks,
> > > yes, it's indeed a problem.
> > > could only allowing migration beteen devices from the same vendor be a
> > > good
> > > prerequisite?
> > > 
> > > Thanks
> > > Yan
> > > >
> > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > existence (and compatibility) of (2)?
> > > > > > > >
> > > > > > > no. (2) does not reply on (1).
> > > > > >
> > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > migration, it feels a bit odd.
> > > > > >
> > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > to vendor driver.
> > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > migration compatibility from migration_version under mdev_type should
> > > be
> > > > > consistent with that from migration_version under device node" ?
> > > > >
> > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > >
> > > > > > >
> > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > it so chooses?
> > > > > > > >
> > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > mdev creation is not required.
> > > > > > >
> > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > seems
> > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > created, as (1) does not apply?
> > > > > > > yes, I think so.
> > > > > >
> > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > ahead and creates devices.
> > > > > >
> > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > type before it knows what mdev device to generate ?
> > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > but it has not created target vm and the target mdev device.
> > > > >
> > > > > > >
> > > > > > > > One of my worries is that the existence of an attribute with the
> > > same
> > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > isn't a problem.
> > > > > > > >
> > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > I guess the same name is necessary?
> > > > > >
> > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > same name?
> > > > > so change (1) to migration_type_version and (2) to
> > > > > migration_instance_version?
> > > > > But as they are under different locations, could that location imply
> > > > > enough information?
> > > > >
> > > > >
> > > > > Thanks
> > > > > Yan
> > > > >
> > > > >
> > > >
> > > _______________________________________________
> > > intel-gvt-dev mailing list
> > > intel-gvt-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-24 19:10                 ` Dr. David Alan Gilbert
@ 2020-04-26  1:36                   ` Yan Zhao
  2020-04-27 15:37                     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-26  1:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > From: Yan Zhao
> > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > 
> > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >
> > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > >
> > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > of VFIO
> > > > > > > > > > Mediated devices.
> > > > > > > > > >
> > > > > > > > > > This migration_version attribute is used to check migration
> > > > compatibility
> > > > > > > > > > between two mdev devices.
> > > > > > > > > >
> > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > >     which can be used even before device creation, but only for
> > > > mdev
> > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > (2) under mdev device node,
> > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > the src
> > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > mdev type
> > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > consistent
> > > > > > > > > > with the migration_version node for migratable pass-though
> > > > devices)
> > > > > > > > >
> > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > >
> > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > same
> > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > devices and
> > > > > > > > non-mdev devices.
> > > > > > > >
> > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > >
> > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > refers to mdev devices.
> > > > > > >
> > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > than creating (3).
> > > > >
> > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > prerequisite to that is that they provide the same software interface,
> > > > > which means they should be the same mdev type.
> > > > >
> > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > management
> > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > there going to be a new class hierarchy created to enumerate all
> > > > > possible migrate-able devices?
> > > > >
> > > > yes, management tool needs to guess and test migration compatible
> > > > between two devices. But I think it's not the problem only for
> > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > to
> > > > first assume that the two mdevs have the same type of parent devices
> > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > possibilities.
> > > > 
> > > > on the other hand, for two mdevs,
> > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > mdev1 <-> mdev2.
> > > 
> > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > choice is to report the same mdev type on both pdev1 and pdev2.
> > I think that's exactly the value of this migration_version interface.
> > the management tool can take advantage of this interface to know if two
> > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > or mix.
> > 
> > as I know, (please correct me if not right), current libvirt still
> > requires manually generating mdev devices, and it just duplicates src vm
> > configuration to the target vm.
> > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > same mdev type).
> > But it does not justify that hybrid cases should not be allowed. otherwise,
> > why do we need to introduce this migration_version interface and leave
> > the judgement of migration compatibility to vendor driver? why not simply
> > set the criteria to something like "pciids of parent devices are equal,
> > and mdev types are equal" ?
> > 
> > 
> > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > could you help me understand why it will bring trouble to upper stack?
> > 
> > I think it just needs to read src migration_version under src dev node,
> > and test it in target migration version under target dev node. 
> > 
> > after all, through this interface we just help the upper layer
> > knowing available options through reading and testing, and they decide
> > to use it or not.
> > 
> > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > phys<->phys migration? If an customer does want to migrate between a 
> > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > instance (with the same type as the source mdev) instead of using vendor 
> > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > usage then such tradeoff might be worthywhile...
> > >
> > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > difference to phys<->mdev, right?
> > I think the vendor string for a mdev device is something like:
> > "Parent PCIID + mdev type + software version", and
> > that for a phys device is something like:
> > "PCIID + software version".
> > as long as we don't migrate between devices from different vendors, it's
> > easy for vendor driver to tell if a phys device is migration compatible
> > to a mdev device according it supports it or not.
> 
> It surprises me that the PCIID matching is a requirement; I'd assumed
> with this clever mdev name setup that you could migrate between two
> different models in a series, or to a newer model, as long as they
> both supported the same mdev view.
> 
hi Dave
the migration_version string is transparent to userspace, and is
completely defined by vendor driver.
I put it there just as an example of how vendor driver may implement it.
e.g.
the src migration_version string is "src PCIID + src software version", 
then when this string is write to target migration_version node,
the vendor driver in the target device will compare it with its own
device info and software version.
If different models are allowed, the write just succeeds even
PCIIDs in src and target are different.

so, it is the vendor driver to define whether two devices are able to
migrate, no matter their PCIIDs, mdev types, software versions..., which
provides vendor driver full flexibility.

do you think it's good?

Thanks
Yan

> 
> > 
> > Thanks
> > Yan
> > > 
> > > > 
> > > > 
> > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > devices, but I think this bring a lot of questions that we need to
> > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > compatibility to that target device?
> > > > I'm not sure if there's a layer above libvirt would select a target
> > > > device. but if there is such a layer (even it's human), we need to
> > > > provide an interface for them to know whether their decision is suitable
> > > > for migration. The migration_version interface provides a potential to
> > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > 
> > > > 
> > > > > We also need to consider that this expands the namespace.  If we no
> > > > > longer require matching types as the first level of comparison, then
> > > > > vendor migration strings can theoretically collide.  How do we
> > > > > coordinate that can't happen?  Thanks,
> > > > yes, it's indeed a problem.
> > > > could only allowing migration beteen devices from the same vendor be a
> > > > good
> > > > prerequisite?
> > > > 
> > > > Thanks
> > > > Yan
> > > > >
> > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > >
> > > > > > > > no. (2) does not reply on (1).
> > > > > > >
> > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > migration, it feels a bit odd.
> > > > > > >
> > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > to vendor driver.
> > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > migration compatibility from migration_version under mdev_type should
> > > > be
> > > > > > consistent with that from migration_version under device node" ?
> > > > > >
> > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > >
> > > > > > > >
> > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > it so chooses?
> > > > > > > > >
> > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > mdev creation is not required.
> > > > > > > >
> > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > seems
> > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > created, as (1) does not apply?
> > > > > > > > yes, I think so.
> > > > > > >
> > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > ahead and creates devices.
> > > > > > >
> > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > type before it knows what mdev device to generate ?
> > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > but it has not created target vm and the target mdev device.
> > > > > >
> > > > > > > >
> > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > same
> > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > isn't a problem.
> > > > > > > > >
> > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > I guess the same name is necessary?
> > > > > > >
> > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > same name?
> > > > > > so change (1) to migration_type_version and (2) to
> > > > > > migration_instance_version?
> > > > > > But as they are under different locations, could that location imply
> > > > > > enough information?
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yan
> > > > > >
> > > > > >
> > > > >
> > > > _______________________________________________
> > > > intel-gvt-dev mailing list
> > > > intel-gvt-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-26  1:36                   ` Yan Zhao
@ 2020-04-27 15:37                     ` Dr. David Alan Gilbert
  2020-04-28  0:54                       ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-04-27 15:37 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > From: Yan Zhao
> > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > 
> > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > >
> > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > >
> > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > of VFIO
> > > > > > > > > > > Mediated devices.
> > > > > > > > > > >
> > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > compatibility
> > > > > > > > > > > between two mdev devices.
> > > > > > > > > > >
> > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > mdev
> > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > the src
> > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > mdev type
> > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > consistent
> > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > devices)
> > > > > > > > > >
> > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > >
> > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > same
> > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > devices and
> > > > > > > > > non-mdev devices.
> > > > > > > > >
> > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > >
> > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > refers to mdev devices.
> > > > > > > >
> > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > than creating (3).
> > > > > >
> > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > which means they should be the same mdev type.
> > > > > >
> > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > management
> > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > possible migrate-able devices?
> > > > > >
> > > > > yes, management tool needs to guess and test migration compatible
> > > > > between two devices. But I think it's not the problem only for
> > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > to
> > > > > first assume that the two mdevs have the same type of parent devices
> > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > possibilities.
> > > > > 
> > > > > on the other hand, for two mdevs,
> > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > mdev1 <-> mdev2.
> > > > 
> > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > I think that's exactly the value of this migration_version interface.
> > > the management tool can take advantage of this interface to know if two
> > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > or mix.
> > > 
> > > as I know, (please correct me if not right), current libvirt still
> > > requires manually generating mdev devices, and it just duplicates src vm
> > > configuration to the target vm.
> > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > same mdev type).
> > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > why do we need to introduce this migration_version interface and leave
> > > the judgement of migration compatibility to vendor driver? why not simply
> > > set the criteria to something like "pciids of parent devices are equal,
> > > and mdev types are equal" ?
> > > 
> > > 
> > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > could you help me understand why it will bring trouble to upper stack?
> > > 
> > > I think it just needs to read src migration_version under src dev node,
> > > and test it in target migration version under target dev node. 
> > > 
> > > after all, through this interface we just help the upper layer
> > > knowing available options through reading and testing, and they decide
> > > to use it or not.
> > > 
> > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > usage then such tradeoff might be worthywhile...
> > > >
> > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > difference to phys<->mdev, right?
> > > I think the vendor string for a mdev device is something like:
> > > "Parent PCIID + mdev type + software version", and
> > > that for a phys device is something like:
> > > "PCIID + software version".
> > > as long as we don't migrate between devices from different vendors, it's
> > > easy for vendor driver to tell if a phys device is migration compatible
> > > to a mdev device according it supports it or not.
> > 
> > It surprises me that the PCIID matching is a requirement; I'd assumed
> > with this clever mdev name setup that you could migrate between two
> > different models in a series, or to a newer model, as long as they
> > both supported the same mdev view.
> > 
> hi Dave
> the migration_version string is transparent to userspace, and is
> completely defined by vendor driver.
> I put it there just as an example of how vendor driver may implement it.
> e.g.
> the src migration_version string is "src PCIID + src software version", 
> then when this string is write to target migration_version node,
> the vendor driver in the target device will compare it with its own
> device info and software version.
> If different models are allowed, the write just succeeds even
> PCIIDs in src and target are different.
> 
> so, it is the vendor driver to define whether two devices are able to
> migrate, no matter their PCIIDs, mdev types, software versions..., which
> provides vendor driver full flexibility.
> 
> do you think it's good?

Yeh that's OK; I guess it's going to need to have a big table in their
with all the PCIIDs in.
The alternative would be to abstract it a little; e.g. to say it's
an Intel-gpu-core-v4  and then it would be less worried about the exact
clock speed etc - but yes you might be right htat PCIIDs might be best
for checking for quirks.

Dave

> Thanks
> Yan
> 
> > 
> > > 
> > > Thanks
> > > Yan
> > > > 
> > > > > 
> > > > > 
> > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > compatibility to that target device?
> > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > device. but if there is such a layer (even it's human), we need to
> > > > > provide an interface for them to know whether their decision is suitable
> > > > > for migration. The migration_version interface provides a potential to
> > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > 
> > > > > 
> > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > longer require matching types as the first level of comparison, then
> > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > coordinate that can't happen?  Thanks,
> > > > > yes, it's indeed a problem.
> > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > good
> > > > > prerequisite?
> > > > > 
> > > > > Thanks
> > > > > Yan
> > > > > >
> > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > >
> > > > > > > > > no. (2) does not reply on (1).
> > > > > > > >
> > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > migration, it feels a bit odd.
> > > > > > > >
> > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > to vendor driver.
> > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > be
> > > > > > > consistent with that from migration_version under device node" ?
> > > > > > >
> > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > it so chooses?
> > > > > > > > > >
> > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > mdev creation is not required.
> > > > > > > > >
> > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > seems
> > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > yes, I think so.
> > > > > > > >
> > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > ahead and creates devices.
> > > > > > > >
> > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > type before it knows what mdev device to generate ?
> > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > but it has not created target vm and the target mdev device.
> > > > > > >
> > > > > > > > >
> > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > same
> > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > isn't a problem.
> > > > > > > > > >
> > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > I guess the same name is necessary?
> > > > > > > >
> > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > same name?
> > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > migration_instance_version?
> > > > > > > But as they are under different locations, could that location imply
> > > > > > > enough information?
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > Yan
> > > > > > >
> > > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > intel-gvt-dev mailing list
> > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-27 15:37                     ` Dr. David Alan Gilbert
@ 2020-04-28  0:54                       ` Yan Zhao
  2020-04-28 14:14                         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-28  0:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > From: Yan Zhao
> > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > 
> > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > >
> > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > of VFIO
> > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > >
> > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > compatibility
> > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > >
> > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > mdev
> > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > the src
> > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > mdev type
> > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > consistent
> > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > devices)
> > > > > > > > > > >
> > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > >
> > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > same
> > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > devices and
> > > > > > > > > > non-mdev devices.
> > > > > > > > > >
> > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > >
> > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > refers to mdev devices.
> > > > > > > > >
> > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > than creating (3).
> > > > > > >
> > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > which means they should be the same mdev type.
> > > > > > >
> > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > management
> > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > possible migrate-able devices?
> > > > > > >
> > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > between two devices. But I think it's not the problem only for
> > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > to
> > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > possibilities.
> > > > > > 
> > > > > > on the other hand, for two mdevs,
> > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > mdev1 <-> mdev2.
> > > > > 
> > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > I think that's exactly the value of this migration_version interface.
> > > > the management tool can take advantage of this interface to know if two
> > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > or mix.
> > > > 
> > > > as I know, (please correct me if not right), current libvirt still
> > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > configuration to the target vm.
> > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > same mdev type).
> > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > why do we need to introduce this migration_version interface and leave
> > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > set the criteria to something like "pciids of parent devices are equal,
> > > > and mdev types are equal" ?
> > > > 
> > > > 
> > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > could you help me understand why it will bring trouble to upper stack?
> > > > 
> > > > I think it just needs to read src migration_version under src dev node,
> > > > and test it in target migration version under target dev node. 
> > > > 
> > > > after all, through this interface we just help the upper layer
> > > > knowing available options through reading and testing, and they decide
> > > > to use it or not.
> > > > 
> > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > usage then such tradeoff might be worthywhile...
> > > > >
> > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > difference to phys<->mdev, right?
> > > > I think the vendor string for a mdev device is something like:
> > > > "Parent PCIID + mdev type + software version", and
> > > > that for a phys device is something like:
> > > > "PCIID + software version".
> > > > as long as we don't migrate between devices from different vendors, it's
> > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > to a mdev device according it supports it or not.
> > > 
> > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > with this clever mdev name setup that you could migrate between two
> > > different models in a series, or to a newer model, as long as they
> > > both supported the same mdev view.
> > > 
> > hi Dave
> > the migration_version string is transparent to userspace, and is
> > completely defined by vendor driver.
> > I put it there just as an example of how vendor driver may implement it.
> > e.g.
> > the src migration_version string is "src PCIID + src software version", 
> > then when this string is write to target migration_version node,
> > the vendor driver in the target device will compare it with its own
> > device info and software version.
> > If different models are allowed, the write just succeeds even
> > PCIIDs in src and target are different.
> > 
> > so, it is the vendor driver to define whether two devices are able to
> > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > provides vendor driver full flexibility.
> > 
> > do you think it's good?
> 
> Yeh that's OK; I guess it's going to need to have a big table in their
> with all the PCIIDs in.
> The alternative would be to abstract it a little; e.g. to say it's
> an Intel-gpu-core-v4  and then it would be less worried about the exact
> clock speed etc - but yes you might be right htat PCIIDs might be best
> for checking for quirks.
>
glad that you are agreed with it:)
I think the vendor driver still can choose a way to abstract a little
(e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
migration_string would be something like "Intel-gpu-core-v4 + instance
number + software version".
IOW, they can choose anything they think appropriate to identify migration
compatibility of a device.
But Alex is right, we have to prevent namespace overlapping. So I think
we need to ensure src and target devices are from the same vendors.
or, any other ideas?

Thanks
Yan


> > > > > > 
> > > > > > 
> > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > compatibility to that target device?
> > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > for migration. The migration_version interface provides a potential to
> > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > 
> > > > > > 
> > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > coordinate that can't happen?  Thanks,
> > > > > > yes, it's indeed a problem.
> > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > good
> > > > > > prerequisite?
> > > > > > 
> > > > > > Thanks
> > > > > > Yan
> > > > > > >
> > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > >
> > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > >
> > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > migration, it feels a bit odd.
> > > > > > > > >
> > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > to vendor driver.
> > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > be
> > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > >
> > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > it so chooses?
> > > > > > > > > > >
> > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > mdev creation is not required.
> > > > > > > > > >
> > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > seems
> > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > yes, I think so.
> > > > > > > > >
> > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > ahead and creates devices.
> > > > > > > > >
> > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > same
> > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > isn't a problem.
> > > > > > > > > > >
> > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > I guess the same name is necessary?
> > > > > > > > >
> > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > same name?
> > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > migration_instance_version?
> > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > enough information?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Yan
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > _______________________________________________
> > > > > > intel-gvt-dev mailing list
> > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-28  0:54                       ` Yan Zhao
@ 2020-04-28 14:14                         ` Dr. David Alan Gilbert
  2020-04-29  7:26                           ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-04-28 14:14 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > > From: Yan Zhao
> > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > 
> > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > >
> > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > > of VFIO
> > > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > > compatibility
> > > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > > mdev
> > > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > > the src
> > > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > > mdev type
> > > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > > consistent
> > > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > > devices)
> > > > > > > > > > > >
> > > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > > >
> > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > > same
> > > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > > devices and
> > > > > > > > > > > non-mdev devices.
> > > > > > > > > > >
> > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > > >
> > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > > refers to mdev devices.
> > > > > > > > > >
> > > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > > than creating (3).
> > > > > > > >
> > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > which means they should be the same mdev type.
> > > > > > > >
> > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > management
> > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > possible migrate-able devices?
> > > > > > > >
> > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > to
> > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > possibilities.
> > > > > > > 
> > > > > > > on the other hand, for two mdevs,
> > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > mdev1 <-> mdev2.
> > > > > > 
> > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > I think that's exactly the value of this migration_version interface.
> > > > > the management tool can take advantage of this interface to know if two
> > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > or mix.
> > > > > 
> > > > > as I know, (please correct me if not right), current libvirt still
> > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > configuration to the target vm.
> > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > same mdev type).
> > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > why do we need to introduce this migration_version interface and leave
> > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > and mdev types are equal" ?
> > > > > 
> > > > > 
> > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > 
> > > > > I think it just needs to read src migration_version under src dev node,
> > > > > and test it in target migration version under target dev node. 
> > > > > 
> > > > > after all, through this interface we just help the upper layer
> > > > > knowing available options through reading and testing, and they decide
> > > > > to use it or not.
> > > > > 
> > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > usage then such tradeoff might be worthywhile...
> > > > > >
> > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > difference to phys<->mdev, right?
> > > > > I think the vendor string for a mdev device is something like:
> > > > > "Parent PCIID + mdev type + software version", and
> > > > > that for a phys device is something like:
> > > > > "PCIID + software version".
> > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > to a mdev device according it supports it or not.
> > > > 
> > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > with this clever mdev name setup that you could migrate between two
> > > > different models in a series, or to a newer model, as long as they
> > > > both supported the same mdev view.
> > > > 
> > > hi Dave
> > > the migration_version string is transparent to userspace, and is
> > > completely defined by vendor driver.
> > > I put it there just as an example of how vendor driver may implement it.
> > > e.g.
> > > the src migration_version string is "src PCIID + src software version", 
> > > then when this string is write to target migration_version node,
> > > the vendor driver in the target device will compare it with its own
> > > device info and software version.
> > > If different models are allowed, the write just succeeds even
> > > PCIIDs in src and target are different.
> > > 
> > > so, it is the vendor driver to define whether two devices are able to
> > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > provides vendor driver full flexibility.
> > > 
> > > do you think it's good?
> > 
> > Yeh that's OK; I guess it's going to need to have a big table in their
> > with all the PCIIDs in.
> > The alternative would be to abstract it a little; e.g. to say it's
> > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > clock speed etc - but yes you might be right htat PCIIDs might be best
> > for checking for quirks.
> >
> glad that you are agreed with it:)
> I think the vendor driver still can choose a way to abstract a little
> (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> migration_string would be something like "Intel-gpu-core-v4 + instance
> number + software version".
> IOW, they can choose anything they think appropriate to identify migration
> compatibility of a device.
> But Alex is right, we have to prevent namespace overlapping. So I think
> we need to ensure src and target devices are from the same vendors.
> or, any other ideas?

That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
only really trying to say that within one vendors range there are often
a lot of PCI-IDs that have really minor variations.

Dave

> Thanks
> Yan
> 
> 
> > > > > > > 
> > > > > > > 
> > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > compatibility to that target device?
> > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > 
> > > > > > > 
> > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > yes, it's indeed a problem.
> > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > good
> > > > > > > prerequisite?
> > > > > > > 
> > > > > > > Thanks
> > > > > > > Yan
> > > > > > > >
> > > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > > >
> > > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > > >
> > > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > > migration, it feels a bit odd.
> > > > > > > > > >
> > > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > > to vendor driver.
> > > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > > be
> > > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > > >
> > > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > it so chooses?
> > > > > > > > > > > >
> > > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > > mdev creation is not required.
> > > > > > > > > > >
> > > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > > seems
> > > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > > yes, I think so.
> > > > > > > > > >
> > > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > > ahead and creates devices.
> > > > > > > > > >
> > > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > > same
> > > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > > isn't a problem.
> > > > > > > > > > > >
> > > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > > I guess the same name is necessary?
> > > > > > > > > >
> > > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > > same name?
> > > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > > migration_instance_version?
> > > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > > enough information?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Yan
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > _______________________________________________
> > > > > > > intel-gvt-dev mailing list
> > > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-28 14:14                         ` Dr. David Alan Gilbert
@ 2020-04-29  7:26                           ` Yan Zhao
  2020-04-29  8:22                             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-29  7:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

On Tue, Apr 28, 2020 at 10:14:37PM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > > > From: Yan Zhao
> > > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > > 
> > > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > > > of VFIO
> > > > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > > > compatibility
> > > > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > > > mdev
> > > > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > > > the src
> > > > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > > > mdev type
> > > > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > > > consistent
> > > > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > > > devices)
> > > > > > > > > > > > >
> > > > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > > > >
> > > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > > > same
> > > > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > > > devices and
> > > > > > > > > > > > non-mdev devices.
> > > > > > > > > > > >
> > > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > > > >
> > > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > > > refers to mdev devices.
> > > > > > > > > > >
> > > > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > > > than creating (3).
> > > > > > > > >
> > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > which means they should be the same mdev type.
> > > > > > > > >
> > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > management
> > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > possible migrate-able devices?
> > > > > > > > >
> > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > to
> > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > possibilities.
> > > > > > > > 
> > > > > > > > on the other hand, for two mdevs,
> > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > mdev1 <-> mdev2.
> > > > > > > 
> > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > the management tool can take advantage of this interface to know if two
> > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > or mix.
> > > > > > 
> > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > configuration to the target vm.
> > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > same mdev type).
> > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > and mdev types are equal" ?
> > > > > > 
> > > > > > 
> > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > 
> > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > and test it in target migration version under target dev node. 
> > > > > > 
> > > > > > after all, through this interface we just help the upper layer
> > > > > > knowing available options through reading and testing, and they decide
> > > > > > to use it or not.
> > > > > > 
> > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > >
> > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > difference to phys<->mdev, right?
> > > > > > I think the vendor string for a mdev device is something like:
> > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > that for a phys device is something like:
> > > > > > "PCIID + software version".
> > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > to a mdev device according it supports it or not.
> > > > > 
> > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > with this clever mdev name setup that you could migrate between two
> > > > > different models in a series, or to a newer model, as long as they
> > > > > both supported the same mdev view.
> > > > > 
> > > > hi Dave
> > > > the migration_version string is transparent to userspace, and is
> > > > completely defined by vendor driver.
> > > > I put it there just as an example of how vendor driver may implement it.
> > > > e.g.
> > > > the src migration_version string is "src PCIID + src software version", 
> > > > then when this string is write to target migration_version node,
> > > > the vendor driver in the target device will compare it with its own
> > > > device info and software version.
> > > > If different models are allowed, the write just succeeds even
> > > > PCIIDs in src and target are different.
> > > > 
> > > > so, it is the vendor driver to define whether two devices are able to
> > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > provides vendor driver full flexibility.
> > > > 
> > > > do you think it's good?
> > > 
> > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > with all the PCIIDs in.
> > > The alternative would be to abstract it a little; e.g. to say it's
> > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > for checking for quirks.
> > >
> > glad that you are agreed with it:)
> > I think the vendor driver still can choose a way to abstract a little
> > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > migration_string would be something like "Intel-gpu-core-v4 + instance
> > number + software version".
> > IOW, they can choose anything they think appropriate to identify migration
> > compatibility of a device.
> > But Alex is right, we have to prevent namespace overlapping. So I think
> > we need to ensure src and target devices are from the same vendors.
> > or, any other ideas?
> 
> That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
Yes, it's a good idea!
could we add a line in the doc saying that
it is the vendor driver to add a unique string to avoid namespace
collision?

> only really trying to say that within one vendors range there are often
> a lot of PCI-IDs that have really minor variations.
Yes. I also prefer to include PCI-IDs.
BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
difference or are migration compatible. for example, two local NVMe
devices may have the same PCI-ID but are configured to two different remote NVMe
devices. the vendor driver needs to add extra info besides PCI-IDs then.

Thanks
Yan

> 
> 
> > 
> > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > > compatibility to that target device?
> > > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > > yes, it's indeed a problem.
> > > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > > good
> > > > > > > > prerequisite?
> > > > > > > > 
> > > > > > > > Thanks
> > > > > > > > Yan
> > > > > > > > >
> > > > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > > > >
> > > > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > > > >
> > > > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > > > migration, it feels a bit odd.
> > > > > > > > > > >
> > > > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > > > to vendor driver.
> > > > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > > > be
> > > > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > > > >
> > > > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > > it so chooses?
> > > > > > > > > > > > >
> > > > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > > > mdev creation is not required.
> > > > > > > > > > > >
> > > > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > > > seems
> > > > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > > > yes, I think so.
> > > > > > > > > > >
> > > > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > > > ahead and creates devices.
> > > > > > > > > > >
> > > > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > > > same
> > > > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > > > isn't a problem.
> > > > > > > > > > > > >
> > > > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > > > I guess the same name is necessary?
> > > > > > > > > > >
> > > > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > > > same name?
> > > > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > > > migration_instance_version?
> > > > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > > > enough information?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Yan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > intel-gvt-dev mailing list
> > > > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > 
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29  7:26                           ` Yan Zhao
@ 2020-04-29  8:22                             ` Dr. David Alan Gilbert
  2020-04-29  9:35                               ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-04-29  8:22 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Tue, Apr 28, 2020 at 10:14:37PM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > > > > From: Yan Zhao
> > > > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > > > 
> > > > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > > > > of VFIO
> > > > > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > > > > compatibility
> > > > > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > > > > mdev
> > > > > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > > > > the src
> > > > > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > > > > mdev type
> > > > > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > > > > consistent
> > > > > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > > > > devices)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > > > > same
> > > > > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > > > > devices and
> > > > > > > > > > > > > non-mdev devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > > > > refers to mdev devices.
> > > > > > > > > > > >
> > > > > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > > > > than creating (3).
> > > > > > > > > >
> > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > >
> > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > > management
> > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > possible migrate-able devices?
> > > > > > > > > >
> > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > to
> > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > possibilities.
> > > > > > > > > 
> > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > mdev1 <-> mdev2.
> > > > > > > > 
> > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > or mix.
> > > > > > > 
> > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > configuration to the target vm.
> > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > same mdev type).
> > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > and mdev types are equal" ?
> > > > > > > 
> > > > > > > 
> > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > 
> > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > and test it in target migration version under target dev node. 
> > > > > > > 
> > > > > > > after all, through this interface we just help the upper layer
> > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > to use it or not.
> > > > > > > 
> > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > >
> > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > difference to phys<->mdev, right?
> > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > that for a phys device is something like:
> > > > > > > "PCIID + software version".
> > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > to a mdev device according it supports it or not.
> > > > > > 
> > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > different models in a series, or to a newer model, as long as they
> > > > > > both supported the same mdev view.
> > > > > > 
> > > > > hi Dave
> > > > > the migration_version string is transparent to userspace, and is
> > > > > completely defined by vendor driver.
> > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > e.g.
> > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > then when this string is write to target migration_version node,
> > > > > the vendor driver in the target device will compare it with its own
> > > > > device info and software version.
> > > > > If different models are allowed, the write just succeeds even
> > > > > PCIIDs in src and target are different.
> > > > > 
> > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > provides vendor driver full flexibility.
> > > > > 
> > > > > do you think it's good?
> > > > 
> > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > with all the PCIIDs in.
> > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > for checking for quirks.
> > > >
> > > glad that you are agreed with it:)
> > > I think the vendor driver still can choose a way to abstract a little
> > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > number + software version".
> > > IOW, they can choose anything they think appropriate to identify migration
> > > compatibility of a device.
> > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > we need to ensure src and target devices are from the same vendors.
> > > or, any other ideas?
> > 
> > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
> Yes, it's a good idea!
> could we add a line in the doc saying that
> it is the vendor driver to add a unique string to avoid namespace
> collision?

So why don't we split the difference; lets say that it should start with
the hex PCI Vendor ID.

> > only really trying to say that within one vendors range there are often
> > a lot of PCI-IDs that have really minor variations.
> Yes. I also prefer to include PCI-IDs.
> BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
> difference or are migration compatible. for example, two local NVMe
> devices may have the same PCI-ID but are configured to two different remote NVMe
> devices. the vendor driver needs to add extra info besides PCI-IDs then.

Ah, yes that's an interesting example.

Dave

> Thanks
> Yan
> 
> > 
> > 
> > > 
> > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > > > compatibility to that target device?
> > > > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > > > yes, it's indeed a problem.
> > > > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > > > good
> > > > > > > > > prerequisite?
> > > > > > > > > 
> > > > > > > > > Thanks
> > > > > > > > > Yan
> > > > > > > > > >
> > > > > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > > > > >
> > > > > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > > > > migration, it feels a bit odd.
> > > > > > > > > > > >
> > > > > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > > > > to vendor driver.
> > > > > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > > > > be
> > > > > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > > > > >
> > > > > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > > > it so chooses?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > > > > mdev creation is not required.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > > > > seems
> > > > > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > > > > yes, I think so.
> > > > > > > > > > > >
> > > > > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > > > > ahead and creates devices.
> > > > > > > > > > > >
> > > > > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > > > > same
> > > > > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > > > > isn't a problem.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > > > > I guess the same name is necessary?
> > > > > > > > > > > >
> > > > > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > > > > same name?
> > > > > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > > > > migration_instance_version?
> > > > > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > > > > enough information?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Yan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > intel-gvt-dev mailing list
> > > > > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > > 
> > > > > > --
> > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > > 
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29  8:22                             ` Dr. David Alan Gilbert
@ 2020-04-29  9:35                               ` Yan Zhao
  2020-04-29  9:48                                 ` Dr. David Alan Gilbert
  2020-04-29 14:13                                 ` Eric Blake
  0 siblings, 2 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-29  9:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

On Wed, Apr 29, 2020 at 04:22:01PM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > On Tue, Apr 28, 2020 at 10:14:37PM +0800, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > > > > > From: Yan Zhao
> > > > > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > > > > 
> > > > > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > > > > > of VFIO
> > > > > > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > > > > > compatibility
> > > > > > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > > > > > mdev
> > > > > > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > > > > > the src
> > > > > > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > > > > > mdev type
> > > > > > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > > > > > consistent
> > > > > > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > > > > > devices)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > > > > > same
> > > > > > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > > > > > devices and
> > > > > > > > > > > > > > non-mdev devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > > > > > refers to mdev devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > > > > > than creating (3).
> > > > > > > > > > >
> > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > >
> > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > > > management
> > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > >
> > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > to
> > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > possibilities.
> > > > > > > > > > 
> > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > mdev1 <-> mdev2.
> > > > > > > > > 
> > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > or mix.
> > > > > > > > 
> > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > configuration to the target vm.
> > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > same mdev type).
> > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > and mdev types are equal" ?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > 
> > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > 
> > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > to use it or not.
> > > > > > > > 
> > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > >
> > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > difference to phys<->mdev, right?
> > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > that for a phys device is something like:
> > > > > > > > "PCIID + software version".
> > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > to a mdev device according it supports it or not.
> > > > > > > 
> > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > both supported the same mdev view.
> > > > > > > 
> > > > > > hi Dave
> > > > > > the migration_version string is transparent to userspace, and is
> > > > > > completely defined by vendor driver.
> > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > e.g.
> > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > then when this string is write to target migration_version node,
> > > > > > the vendor driver in the target device will compare it with its own
> > > > > > device info and software version.
> > > > > > If different models are allowed, the write just succeeds even
> > > > > > PCIIDs in src and target are different.
> > > > > > 
> > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > provides vendor driver full flexibility.
> > > > > > 
> > > > > > do you think it's good?
> > > > > 
> > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > with all the PCIIDs in.
> > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > for checking for quirks.
> > > > >
> > > > glad that you are agreed with it:)
> > > > I think the vendor driver still can choose a way to abstract a little
> > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > number + software version".
> > > > IOW, they can choose anything they think appropriate to identify migration
> > > > compatibility of a device.
> > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > we need to ensure src and target devices are from the same vendors.
> > > > or, any other ideas?
> > > 
> > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
> > Yes, it's a good idea!
> > could we add a line in the doc saying that
> > it is the vendor driver to add a unique string to avoid namespace
> > collision?
> 
> So why don't we split the difference; lets say that it should start with
> the hex PCI Vendor ID.
>
The problem is for mdev devices, if the parent devices are not PCI devices, 
they don't have PCI vendor IDs.

Thanks
Yan


> > > only really trying to say that within one vendors range there are often
> > > a lot of PCI-IDs that have really minor variations.
> > Yes. I also prefer to include PCI-IDs.
> > BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
> > difference or are migration compatible. for example, two local NVMe
> > devices may have the same PCI-ID but are configured to two different remote NVMe
> > devices. the vendor driver needs to add extra info besides PCI-IDs then.
> 
> Ah, yes that's an interesting example.
> 
> Dave
> 
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > > > > compatibility to that target device?
> > > > > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > > > > yes, it's indeed a problem.
> > > > > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > > > > good
> > > > > > > > > > prerequisite?
> > > > > > > > > > 
> > > > > > > > > > Thanks
> > > > > > > > > > Yan
> > > > > > > > > > >
> > > > > > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > > > > > migration, it feels a bit odd.
> > > > > > > > > > > > >
> > > > > > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > > > > > to vendor driver.
> > > > > > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > > > > > be
> > > > > > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > > > > > >
> > > > > > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > > > > it so chooses?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > > > > > mdev creation is not required.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > > > > > seems
> > > > > > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > > > > > yes, I think so.
> > > > > > > > > > > > >
> > > > > > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > > > > > ahead and creates devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > > > > > same
> > > > > > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > > > > > isn't a problem.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > > > > > I guess the same name is necessary?
> > > > > > > > > > > > >
> > > > > > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > > > > > same name?
> > > > > > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > > > > > migration_instance_version?
> > > > > > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > > > > > enough information?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Yan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > intel-gvt-dev mailing list
> > > > > > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > > > 
> > > > > > > --
> > > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > > > 
> > > > > > 
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29  9:35                               ` Yan Zhao
@ 2020-04-29  9:48                                 ` Dr. David Alan Gilbert
  2020-04-30  0:39                                   ` Yan Zhao
  2020-04-29 14:13                                 ` Eric Blake
  1 sibling, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-04-29  9:48 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Wed, Apr 29, 2020 at 04:22:01PM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > On Tue, Apr 28, 2020 at 10:14:37PM +0800, Dr. David Alan Gilbert wrote:
> > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > > > > > > > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > > > > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > > > > > > > From: Yan Zhao
> > > > > > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > > > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This patchset introduces a migration_version attribute under sysfs
> > > > > > > > > > > of VFIO
> > > > > > > > > > > > > > > > > Mediated devices.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This migration_version attribute is used to check migration
> > > > > > > > > > > compatibility
> > > > > > > > > > > > > > > > > between two mdev devices.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > > > > > > >     which can be used even before device creation, but only for
> > > > > > > > > > > mdev
> > > > > > > > > > > > > > > > >     devices of the same mdev type.
> > > > > > > > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > > > > > > >     which can only be used after the mdev devices are created, but
> > > > > > > > > > > the src
> > > > > > > > > > > > > > > > >     and target mdev devices are not necessarily be of the same
> > > > > > > > > > > mdev type
> > > > > > > > > > > > > > > > > (The second location is newly added in v5, in order to keep
> > > > > > > > > > > consistent
> > > > > > > > > > > > > > > > > with the migration_version node for migratable pass-though
> > > > > > > > > > > devices)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What is the relationship between those two attributes?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > > > > > > > > > devices and
> > > > > > > > > > > > > > > non-mdev devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > > > > > is binding to vfio-pci, but is able to register migration region and do
> > > > > > > > > > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > > > > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > > > > > > > > > It is not able to provide (1) as there're no mdev devices involved.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes sense.
> > > > > > > > > > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > > > > > > > > > refers to mdev devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > > > > > > > > > and I think a possible usage is to migrate between a non-mdev device and
> > > > > > > > > > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > > > > > > > > > than creating (3).
> > > > > > > > > > > >
> > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > >
> > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > > > > management
> > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > >
> > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > to
> > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > possibilities.
> > > > > > > > > > > 
> > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > mdev1 <-> mdev2.
> > > > > > > > > > 
> > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > or mix.
> > > > > > > > > 
> > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > configuration to the target vm.
> > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > same mdev type).
> > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > and mdev types are equal" ?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > 
> > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > 
> > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > to use it or not.
> > > > > > > > > 
> > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > >
> > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > that for a phys device is something like:
> > > > > > > > > "PCIID + software version".
> > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > to a mdev device according it supports it or not.
> > > > > > > > 
> > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > both supported the same mdev view.
> > > > > > > > 
> > > > > > > hi Dave
> > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > completely defined by vendor driver.
> > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > e.g.
> > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > then when this string is write to target migration_version node,
> > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > device info and software version.
> > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > PCIIDs in src and target are different.
> > > > > > > 
> > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > provides vendor driver full flexibility.
> > > > > > > 
> > > > > > > do you think it's good?
> > > > > > 
> > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > with all the PCIIDs in.
> > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > for checking for quirks.
> > > > > >
> > > > > glad that you are agreed with it:)
> > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > number + software version".
> > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > compatibility of a device.
> > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > we need to ensure src and target devices are from the same vendors.
> > > > > or, any other ideas?
> > > > 
> > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
> > > Yes, it's a good idea!
> > > could we add a line in the doc saying that
> > > it is the vendor driver to add a unique string to avoid namespace
> > > collision?
> > 
> > So why don't we split the difference; lets say that it should start with
> > the hex PCI Vendor ID.
> >
> The problem is for mdev devices, if the parent devices are not PCI devices, 
> they don't have PCI vendor IDs.

Hmm it would be best not to invent a whole new way of giving unique
idenitifiers for vendors if we can.

Dave

> Thanks
> Yan
> 
> 
> > > > only really trying to say that within one vendors range there are often
> > > > a lot of PCI-IDs that have really minor variations.
> > > Yes. I also prefer to include PCI-IDs.
> > > BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
> > > difference or are migration compatible. for example, two local NVMe
> > > devices may have the same PCI-ID but are configured to two different remote NVMe
> > > devices. the vendor driver needs to add extra info besides PCI-IDs then.
> > 
> > Ah, yes that's an interesting example.
> > 
> > Dave
> > 
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > > > > > compatibility to that target device?
> > > > > > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > > > > > yes, it's indeed a problem.
> > > > > > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > > > > > good
> > > > > > > > > > > prerequisite?
> > > > > > > > > > > 
> > > > > > > > > > > Thanks
> > > > > > > > > > > Yan
> > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is existence (and compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > no. (2) does not reply on (1).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hm. Non-existence of (1) seems to imply "this type does not support
> > > > > > > > > > > > > > migration". If an mdev created for such a type suddenly does support
> > > > > > > > > > > > > > migration, it feels a bit odd.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > yes. but I think if the condition happens, it should be reported a bug
> > > > > > > > > > > > > to vendor driver.
> > > > > > > > > > > > > should I add a line in the doc like "vendor driver should ensure that the
> > > > > > > > > > > > > migration compatibility from migration_version under mdev_type should
> > > > > > > > > > > be
> > > > > > > > > > > > > consistent with that from migration_version under device node" ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > (It obviously cannot be a prereq for what I called (3) above.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Does userspace need to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > > > > > it so chooses?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think it can completely reply on (2) if compatibility check before
> > > > > > > > > > > > > > > mdev creation is not required.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If devices with a different mdev type are indeed compatible, it
> > > > > > > > > > > seems
> > > > > > > > > > > > > > > > userspace can only find out after the devices have actually been
> > > > > > > > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > > > > > > > yes, I think so.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > How useful would it be for userspace to even look at (1) in that case?
> > > > > > > > > > > > > > It only knows if things have a chance of working if it actually goes
> > > > > > > > > > > > > > ahead and creates devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > hmm, is it useful for userspace to test the migration_version under mdev
> > > > > > > > > > > > > type before it knows what mdev device to generate ?
> > > > > > > > > > > > > like when the userspace wants to migrate an mdev device in src vm,
> > > > > > > > > > > > > but it has not created target vm and the target mdev device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One of my worries is that the existence of an attribute with the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > > name in two similar locations might lead to confusion. But maybe it
> > > > > > > > > > > > > > > > isn't a problem.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, I have the same feeling. but as (2) is for sysfs interface
> > > > > > > > > > > > > > > consistency, to make it transparent to userspace tools like libvirt,
> > > > > > > > > > > > > > > I guess the same name is necessary?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What do we actually need here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > > > > > slightly different purposes, while (2) and what I called (3) have the
> > > > > > > > > > > > > > same purpose. Is it important to userspace that (1) and (2) have the
> > > > > > > > > > > > > > same name?
> > > > > > > > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > > > > > > > migration_instance_version?
> > > > > > > > > > > > > But as they are under different locations, could that location imply
> > > > > > > > > > > > > enough information?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Yan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > intel-gvt-dev mailing list
> > > > > > > > > > > intel-gvt-dev@lists.freedesktop.org
> > > > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > > > > 
> > > > > > > > --
> > > > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > > > > 
> > > > > > > 
> > > > > > --
> > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > > 
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29  9:35                               ` Yan Zhao
  2020-04-29  9:48                                 ` Dr. David Alan Gilbert
@ 2020-04-29 14:13                                 ` Eric Blake
  2020-04-30  0:45                                   ` Yan Zhao
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Blake @ 2020-04-29 14:13 UTC (permalink / raw)
  To: Yan Zhao, Dr. David Alan Gilbert
  Cc: Cornelia Huck, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, Liu, Yi L, corbet,
	Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue, Tian, Kevin,
	Zeng, Xin, zhenyuw, jonathan.davies, Alex Williamson,
	intel-gvt-dev, Liu, Changpeng, berrange, eskultet, linux-kernel,
	Wang, Zhi A, dinechin, He, Shaopeng

[meta-comment]

On 4/29/20 4:35 AM, Yan Zhao wrote:
> On Wed, Apr 29, 2020 at 04:22:01PM +0800, Dr. David Alan Gilbert wrote:
[...]
>>>>>>>>>>>>>>>>> This patchset introduces a migration_version attribute under sysfs
>>>>>>>>>>> of VFIO
>>>>>>>>>>>>>>>>> Mediated devices.

Hmm, several pages with up to 16 levels of quoting, with editors making 
the lines ragged, all before I get to the real meat of the email. 
Remember, it's okay to trim content,...

>> So why don't we split the difference; lets say that it should start with
>> the hex PCI Vendor ID.
>>
> The problem is for mdev devices, if the parent devices are not PCI devices,
> they don't have PCI vendor IDs.

...to just what you are replying to.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29  9:48                                 ` Dr. David Alan Gilbert
@ 2020-04-30  0:39                                   ` Yan Zhao
  2020-06-02 22:55                                     ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-04-30  0:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tian, Kevin, Alex Williamson, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger,
	corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic, aik,
	felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev,
	Liu, Changpeng, berrange, Cornelia Huck, linux-kernel, Wang,
	Zhi A, jonathan.davies, He, Shaopeng

On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
<snip>
> > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > > > > > management
> > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > >
> > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > to
> > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > possibilities.
> > > > > > > > > > > > 
> > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > mdev1 <-> mdev2.
> > > > > > > > > > > 
> > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > or mix.
> > > > > > > > > > 
> > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > configuration to the target vm.
> > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > same mdev type).
> > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out. 
> > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > 
> > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > 
> > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > to use it or not.
> > > > > > > > > > 
> > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > >
> > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > "PCIID + software version".
> > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > to a mdev device according it supports it or not.
> > > > > > > > > 
> > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > both supported the same mdev view.
> > > > > > > > > 
> > > > > > > > hi Dave
> > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > completely defined by vendor driver.
> > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > e.g.
> > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > device info and software version.
> > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > PCIIDs in src and target are different.
> > > > > > > > 
> > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > provides vendor driver full flexibility.
> > > > > > > > 
> > > > > > > > do you think it's good?
> > > > > > > 
> > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > with all the PCIIDs in.
> > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > for checking for quirks.
> > > > > > >
> > > > > > glad that you are agreed with it:)
> > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > number + software version".
> > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > compatibility of a device.
> > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > or, any other ideas?
> > > > > 
> > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
> > > > Yes, it's a good idea!
> > > > could we add a line in the doc saying that
> > > > it is the vendor driver to add a unique string to avoid namespace
> > > > collision?
> > > 
> > > So why don't we split the difference; lets say that it should start with
> > > the hex PCI Vendor ID.
> > >
> > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > they don't have PCI vendor IDs.
> 
> Hmm it would be best not to invent a whole new way of giving unique
> idenitifiers for vendors if we can.
> 
what about leveraging the flags in vfio device info ?

#define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
#define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
#define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
#define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
#define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
#define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */

Then for migration_version string,
The first 64 bits are for device type, the second 64 bits are for device id.
e.g.
for PCI devices, it could be
VFIO_DEVICE_FLAGS_PCI + PCI ID.

Currently in the doc, we only define PCI devices to use PCI ID as the second
64 bits. In future, if other types of devices want to support migration,
they can define their own parts of device id. e.g. use ACPI ID as the
second 64-bit...

sounds good?

Thanks
Yan

> > 
> > 
> > > > > only really trying to say that within one vendors range there are often
> > > > > a lot of PCI-IDs that have really minor variations.
> > > > Yes. I also prefer to include PCI-IDs.
> > > > BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
> > > > difference or are migration compatible. for example, two local NVMe
> > > > devices may have the same PCI-ID but are configured to two different remote NVMe
> > > > devices. the vendor driver needs to add extra info besides PCI-IDs then.
> > > 
> > > Ah, yes that's an interesting example.
> > > 
> > > Dave
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > I agree that there was a gap in the previous proposal for non-mdev
> > > > > > > > > > > > > devices, but I think this bring a lot of questions that we need to
> > > > > > > > > > > > > puzzle through and libvirt will need to re-evaluate how they might
> > > > > > > > > > > > > decide to pick a migration target device.  For example, I'm sure
> > > > > > > > > > > > > libvirt would reject any policy decisions regarding picking a physical
> > > > > > > > > > > > > device versus an mdev device.  Had we previously left it that only a
> > > > > > > > > > > > > layer above libvirt would select a target device and libvirt only tests
> > > > > > > > > > > > > compatibility to that target device?
> > > > > > > > > > > > I'm not sure if there's a layer above libvirt would select a target
> > > > > > > > > > > > device. but if there is such a layer (even it's human), we need to
> > > > > > > > > > > > provide an interface for them to know whether their decision is suitable
> > > > > > > > > > > > for migration. The migration_version interface provides a potential to
> > > > > > > > > > > > allow mdev->phys migration, even libvirt may currently reject it.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > We also need to consider that this expands the namespace.  If we no
> > > > > > > > > > > > > longer require matching types as the first level of comparison, then
> > > > > > > > > > > > > vendor migration strings can theoretically collide.  How do we
> > > > > > > > > > > > > coordinate that can't happen?  Thanks,
> > > > > > > > > > > > yes, it's indeed a problem.
> > > > > > > > > > > > could only allowing migration beteen devices from the same vendor be a
> > > > > > > > > > > > good
> > > > > > > > > > > > prerequisite?
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Yan
> > > > > > > > > > > > >
<snip>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-29 14:13                                 ` Eric Blake
@ 2020-04-30  0:45                                   ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-04-30  0:45 UTC (permalink / raw)
  To: Eric Blake
  Cc: Dr. David Alan Gilbert, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, eskultet, Zeng, Xin, zhenyuw, dinechin,
	Alex Williamson, intel-gvt-dev, Liu, Changpeng, berrange,
	Cornelia Huck, linux-kernel, Wang, Zhi A, jonathan.davies, He,
	Shaopeng

On Wed, Apr 29, 2020 at 10:13:01PM +0800, Eric Blake wrote:
> [meta-comment]
> 
> On 4/29/20 4:35 AM, Yan Zhao wrote:
> > On Wed, Apr 29, 2020 at 04:22:01PM +0800, Dr. David Alan Gilbert wrote:
> [...]
> >>>>>>>>>>>>>>>>> This patchset introduces a migration_version attribute under sysfs
> >>>>>>>>>>> of VFIO
> >>>>>>>>>>>>>>>>> Mediated devices.
> 
> Hmm, several pages with up to 16 levels of quoting, with editors making 
> the lines ragged, all before I get to the real meat of the email. 
> Remember, it's okay to trim content,...
> 
> >> So why don't we split the difference; lets say that it should start with
> >> the hex PCI Vendor ID.
> >>
> > The problem is for mdev devices, if the parent devices are not PCI devices,
> > they don't have PCI vendor IDs.
> 
> ...to just what you are replying to.
>
sorry for that. next time I'll try to make a better balance between
keeping conversation background and leaving the real meat of the email.

Thanks for reminding.
Yan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-04-30  0:39                                   ` Yan Zhao
@ 2020-06-02 22:55                                     ` Alex Williamson
  2020-06-03  3:19                                       ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-06-02 22:55 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Dr. David Alan Gilbert, Tian, Kevin, cjia, kvm, linux-doc,
	libvir-list, Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede,
	eauger, corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic,
	aik, felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Wed, 29 Apr 2020 20:39:50 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> <snip>
> > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a  
> > > > > > > > > > > > > management  
> > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > to
> > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > mdev1 <-> mdev2.  
> > > > > > > > > > > > 
> > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.  
> > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > or mix.
> > > > > > > > > > > 
> > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > same mdev type).
> > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > 
> > > > > > > > > > >   
> > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.   
> > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > 
> > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > 
> > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > to use it or not.
> > > > > > > > > > >   
> > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > >  
> > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > to a mdev device according it supports it or not.  
> > > > > > > > > > 
> > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > both supported the same mdev view.
> > > > > > > > > >   
> > > > > > > > > hi Dave
> > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > completely defined by vendor driver.
> > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > e.g.
> > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > device info and software version.
> > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > 
> > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > 
> > > > > > > > > do you think it's good?  
> > > > > > > > 
> > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > with all the PCIIDs in.
> > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > for checking for quirks.
> > > > > > > >  
> > > > > > > glad that you are agreed with it:)
> > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > number + software version".
> > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > compatibility of a device.
> > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > or, any other ideas?  
> > > > > > 
> > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was  
> > > > > Yes, it's a good idea!
> > > > > could we add a line in the doc saying that
> > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > collision?  
> > > > 
> > > > So why don't we split the difference; lets say that it should start with
> > > > the hex PCI Vendor ID.
> > > >  
> > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > they don't have PCI vendor IDs.  
> > 
> > Hmm it would be best not to invent a whole new way of giving unique
> > idenitifiers for vendors if we can.
> >   
> what about leveraging the flags in vfio device info ?
> 
> #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> 
> Then for migration_version string,
> The first 64 bits are for device type, the second 64 bits are for device id.
> e.g.
> for PCI devices, it could be
> VFIO_DEVICE_FLAGS_PCI + PCI ID.
> 
> Currently in the doc, we only define PCI devices to use PCI ID as the second
> 64 bits. In future, if other types of devices want to support migration,
> they can define their own parts of device id. e.g. use ACPI ID as the
> second 64-bit...
> 
> sounds good?

[dead thread resurrection alert]

Not really.  We're deep into territory that we were trying to avoid.
We had previously defined the version string as opaque (not
transparent) specifically because we did not want userspace to make
assumptions about compatibility based on the content of the string.  It
was 100% left to the vendor driver to determine compatibility.  The
mdev type was the full extent of the first level filter that userspace
could use to narrow the set of potentially compatible devices.  If we
remove that due to physical device migration support, I'm not sure how
we simplify the problem for userspace.

We need to step away from PCI IDs and parent devices.  We're not
designing a solution that only works for PCI, there's no guarantee that
parent devices are similar or even from the same vendor.

Does the mdev type sufficiently solve the problem for mdev devices?  If
so, then what can we learn from it and how can we apply an equivalence
to physical devices?  For example, should a vfio bus driver (vfio-pci
or vfio-mdev) expose vfio_migration_type and vfio_migration_version
attributes under the device in sysfs where the _type provides the first
level, user transparent, matching string (ex. mdev type for mdev
devices) while the _version provides the user opaque, vendor known
compatibility test?

This pushes the problem out to the drivers where we can perhaps
incorporate the module name to avoid collisions.  For example Yan's
vendor extension proposal makes use of vfio-pci with extension modules
loaded via an alias incorporating the PCI vendor and device ID.  So
vfio-pci might use a type of "vfio-pci:$ALIAS".

It's still a bit messy that someone needs to go evaluate all these
types between devices that exist and mdev devices that might exist if
created, but I don't have any good ideas to resolve that (maybe a new
class hierarchy?).  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-02 22:55                                     ` Alex Williamson
@ 2020-06-03  3:19                                       ` Yan Zhao
  2020-06-03  3:55                                         ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-06-03  3:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Dr. David Alan Gilbert, Tian, Kevin, cjia, kvm, linux-doc,
	libvir-list, Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede,
	eauger, corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic,
	aik, felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:
> On Wed, 29 Apr 2020 20:39:50 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > <snip>
> > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a  
> > > > > > > > > > > > > > management  
> > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > >  
> > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > mdev1 <-> mdev2.  
> > > > > > > > > > > > > 
> > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.  
> > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > or mix.
> > > > > > > > > > > > 
> > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > 
> > > > > > > > > > > >   
> > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.   
> > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > 
> > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > 
> > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > to use it or not.
> > > > > > > > > > > >   
> > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > >  
> > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > to a mdev device according it supports it or not.  
> > > > > > > > > > > 
> > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > >   
> > > > > > > > > > hi Dave
> > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > e.g.
> > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > device info and software version.
> > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > 
> > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > 
> > > > > > > > > > do you think it's good?  
> > > > > > > > > 
> > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > with all the PCIIDs in.
> > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > for checking for quirks.
> > > > > > > > >  
> > > > > > > > glad that you are agreed with it:)
> > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > number + software version".
> > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > compatibility of a device.
> > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > or, any other ideas?  
> > > > > > > 
> > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was  
> > > > > > Yes, it's a good idea!
> > > > > > could we add a line in the doc saying that
> > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > collision?  
> > > > > 
> > > > > So why don't we split the difference; lets say that it should start with
> > > > > the hex PCI Vendor ID.
> > > > >  
> > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > they don't have PCI vendor IDs.  
> > > 
> > > Hmm it would be best not to invent a whole new way of giving unique
> > > idenitifiers for vendors if we can.
> > >   
> > what about leveraging the flags in vfio device info ?
> > 
> > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > 
> > Then for migration_version string,
> > The first 64 bits are for device type, the second 64 bits are for device id.
> > e.g.
> > for PCI devices, it could be
> > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > 
> > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > 64 bits. In future, if other types of devices want to support migration,
> > they can define their own parts of device id. e.g. use ACPI ID as the
> > second 64-bit...
> > 
> > sounds good?
> 
> [dead thread resurrection alert]
> 
> Not really.  We're deep into territory that we were trying to avoid.
> We had previously defined the version string as opaque (not
> transparent) specifically because we did not want userspace to make
> assumptions about compatibility based on the content of the string.  It
> was 100% left to the vendor driver to determine compatibility.  The
> mdev type was the full extent of the first level filter that userspace
> could use to narrow the set of potentially compatible devices.  If we
> remove that due to physical device migration support, I'm not sure how
> we simplify the problem for userspace.
> 
> We need to step away from PCI IDs and parent devices.  We're not
> designing a solution that only works for PCI, there's no guarantee that
> parent devices are similar or even from the same vendor.
> 
> Does the mdev type sufficiently solve the problem for mdev devices?  If
> so, then what can we learn from it and how can we apply an equivalence
> to physical devices?  For example, should a vfio bus driver (vfio-pci
> or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> attributes under the device in sysfs where the _type provides the first
> level, user transparent, matching string (ex. mdev type for mdev
> devices) while the _version provides the user opaque, vendor known
> compatibility test?
> 
> This pushes the problem out to the drivers where we can perhaps
> incorporate the module name to avoid collisions.  For example Yan's
> vendor extension proposal makes use of vfio-pci with extension modules
> loaded via an alias incorporating the PCI vendor and device ID.  So
> vfio-pci might use a type of "vfio-pci:$ALIAS".
> 
> It's still a bit messy that someone needs to go evaluate all these
> types between devices that exist and mdev devices that might exist if
> created, but I don't have any good ideas to resolve that (maybe a new
> class hierarchy?).  Thanks,

hi Alex

yes, with the same mdev_type, user still has to enumerate all parent
devices and test between the supported mdev_types to know whether two mdev
devices are compatible.
maybe this is not a problem? in reality, it is the administrator that
specifies two devices and the management tool feedbacks compatibility
result. management tool is not required to pre-test and setup the
compatibility map beforehand.

If so, then the only problem left is namespace collision. 
given that the migration_version nodes is exported by vendor driver,
maybe it can also embed its module name in the migration version string,
like "i915" in "i915-GVTg_V5_8", as you suggested above.

with module name as the first mandatory field in version string and
skipping the enumeration/testing problem, we can happyly unify migration
across mdev and phys devices. e.g. it is possible to migrate between
VFs in sriov and mdevs in siov to achieve backwards compatibility.

Thanks
Yan



> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-03  3:19                                       ` Yan Zhao
@ 2020-06-03  3:55                                         ` Alex Williamson
  2020-06-03  5:24                                           ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-06-03  3:55 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Dr. David Alan Gilbert, Tian, Kevin, cjia, kvm, linux-doc,
	libvir-list, Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede,
	eauger, corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic,
	aik, felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Tue, 2 Jun 2020 23:19:48 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:
> > On Wed, 29 Apr 2020 20:39:50 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > <snip>  
> > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a    
> > > > > > > > > > > > > > > management    
> > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > >    
> > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > mdev1 <-> mdev2.    
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.    
> > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > 
> > > > > > > > > > > > >     
> > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.     
> > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > >     
> > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > >    
> > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > to a mdev device according it supports it or not.    
> > > > > > > > > > > > 
> > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > >     
> > > > > > > > > > > hi Dave
> > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > e.g.
> > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > device info and software version.
> > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > 
> > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > 
> > > > > > > > > > > do you think it's good?    
> > > > > > > > > > 
> > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > for checking for quirks.
> > > > > > > > > >    
> > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > number + software version".
> > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > compatibility of a device.
> > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > or, any other ideas?    
> > > > > > > > 
> > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was    
> > > > > > > Yes, it's a good idea!
> > > > > > > could we add a line in the doc saying that
> > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > collision?    
> > > > > > 
> > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > the hex PCI Vendor ID.
> > > > > >    
> > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > they don't have PCI vendor IDs.    
> > > > 
> > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > idenitifiers for vendors if we can.
> > > >     
> > > what about leveraging the flags in vfio device info ?
> > > 
> > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > 
> > > Then for migration_version string,
> > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > e.g.
> > > for PCI devices, it could be
> > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > 
> > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > 64 bits. In future, if other types of devices want to support migration,
> > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > second 64-bit...
> > > 
> > > sounds good?  
> > 
> > [dead thread resurrection alert]
> > 
> > Not really.  We're deep into territory that we were trying to avoid.
> > We had previously defined the version string as opaque (not
> > transparent) specifically because we did not want userspace to make
> > assumptions about compatibility based on the content of the string.  It
> > was 100% left to the vendor driver to determine compatibility.  The
> > mdev type was the full extent of the first level filter that userspace
> > could use to narrow the set of potentially compatible devices.  If we
> > remove that due to physical device migration support, I'm not sure how
> > we simplify the problem for userspace.
> > 
> > We need to step away from PCI IDs and parent devices.  We're not
> > designing a solution that only works for PCI, there's no guarantee that
> > parent devices are similar or even from the same vendor.
> > 
> > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > so, then what can we learn from it and how can we apply an equivalence
> > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > attributes under the device in sysfs where the _type provides the first
> > level, user transparent, matching string (ex. mdev type for mdev
> > devices) while the _version provides the user opaque, vendor known
> > compatibility test?
> > 
> > This pushes the problem out to the drivers where we can perhaps
> > incorporate the module name to avoid collisions.  For example Yan's
> > vendor extension proposal makes use of vfio-pci with extension modules
> > loaded via an alias incorporating the PCI vendor and device ID.  So
> > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > 
> > It's still a bit messy that someone needs to go evaluate all these
> > types between devices that exist and mdev devices that might exist if
> > created, but I don't have any good ideas to resolve that (maybe a new
> > class hierarchy?).  Thanks,  
> 
> hi Alex
> 
> yes, with the same mdev_type, user still has to enumerate all parent
> devices and test between the supported mdev_types to know whether two mdev
> devices are compatible.
> maybe this is not a problem? in reality, it is the administrator that
> specifies two devices and the management tool feedbacks compatibility
> result. management tool is not required to pre-test and setup the
> compatibility map beforehand.

That's exactly the purpose of this interface though is to give the
management tools some indication that a migration has a chance of
working.
 
> If so, then the only problem left is namespace collision. 
> given that the migration_version nodes is exported by vendor driver,
> maybe it can also embed its module name in the migration version string,
> like "i915" in "i915-GVTg_V5_8", as you suggested above.

No, we've already decided that the version string is opaque, the user
is not to attempt to infer anything from it.  That's why I've suggested
another attribute in sysfs that does present type information that a
user can compare.  Thanks,

Alex

> with module name as the first mandatory field in version string and
> skipping the enumeration/testing problem, we can happyly unify migration
> across mdev and phys devices. e.g. it is possible to migrate between
> VFs in sriov and mdevs in siov to achieve backwards compatibility.
> 
> Thanks
> Yan
> 
> 
> 
> >   
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-03  3:55                                         ` Alex Williamson
@ 2020-06-03  5:24                                           ` Yan Zhao
  2020-06-03 16:26                                             ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-06-03  5:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Dr. David Alan Gilbert, Tian, Kevin, cjia, kvm, linux-doc,
	libvir-list, Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede,
	eauger, corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic,
	aik, felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:
> On Tue, 2 Jun 2020 23:19:48 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:
> > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > <snip>  
> > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a    
> > > > > > > > > > > > > > > > management    
> > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > >    
> > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > mdev1 <-> mdev2.    
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.    
> > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.     
> > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > >     
> > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > >    
> > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > to a mdev device according it supports it or not.    
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > >     
> > > > > > > > > > > > hi Dave
> > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > e.g.
> > > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > > 
> > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > > 
> > > > > > > > > > > > do you think it's good?    
> > > > > > > > > > > 
> > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > for checking for quirks.
> > > > > > > > > > >    
> > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > number + software version".
> > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > compatibility of a device.
> > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > or, any other ideas?    
> > > > > > > > > 
> > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was    
> > > > > > > > Yes, it's a good idea!
> > > > > > > > could we add a line in the doc saying that
> > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > collision?    
> > > > > > > 
> > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > the hex PCI Vendor ID.
> > > > > > >    
> > > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > > they don't have PCI vendor IDs.    
> > > > > 
> > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > idenitifiers for vendors if we can.
> > > > >     
> > > > what about leveraging the flags in vfio device info ?
> > > > 
> > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > > 
> > > > Then for migration_version string,
> > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > e.g.
> > > > for PCI devices, it could be
> > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > > 
> > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > 64 bits. In future, if other types of devices want to support migration,
> > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > second 64-bit...
> > > > 
> > > > sounds good?  
> > > 
> > > [dead thread resurrection alert]
> > > 
> > > Not really.  We're deep into territory that we were trying to avoid.
> > > We had previously defined the version string as opaque (not
> > > transparent) specifically because we did not want userspace to make
> > > assumptions about compatibility based on the content of the string.  It
> > > was 100% left to the vendor driver to determine compatibility.  The
> > > mdev type was the full extent of the first level filter that userspace
> > > could use to narrow the set of potentially compatible devices.  If we
> > > remove that due to physical device migration support, I'm not sure how
> > > we simplify the problem for userspace.
> > > 
> > > We need to step away from PCI IDs and parent devices.  We're not
> > > designing a solution that only works for PCI, there's no guarantee that
> > > parent devices are similar or even from the same vendor.
> > > 
> > > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > > so, then what can we learn from it and how can we apply an equivalence
> > > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > attributes under the device in sysfs where the _type provides the first
> > > level, user transparent, matching string (ex. mdev type for mdev
> > > devices) while the _version provides the user opaque, vendor known
> > > compatibility test?
> > > 
> > > This pushes the problem out to the drivers where we can perhaps
> > > incorporate the module name to avoid collisions.  For example Yan's
> > > vendor extension proposal makes use of vfio-pci with extension modules
> > > loaded via an alias incorporating the PCI vendor and device ID.  So
> > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > > 
> > > It's still a bit messy that someone needs to go evaluate all these
> > > types between devices that exist and mdev devices that might exist if
> > > created, but I don't have any good ideas to resolve that (maybe a new
> > > class hierarchy?).  Thanks,  
> > 
> > hi Alex
> > 
> > yes, with the same mdev_type, user still has to enumerate all parent
> > devices and test between the supported mdev_types to know whether two mdev
> > devices are compatible.
> > maybe this is not a problem? in reality, it is the administrator that
> > specifies two devices and the management tool feedbacks compatibility
> > result. management tool is not required to pre-test and setup the
> > compatibility map beforehand.
> 
> That's exactly the purpose of this interface though is to give the
> management tools some indication that a migration has a chance of
> working.
>  
> > If so, then the only problem left is namespace collision. 
> > given that the migration_version nodes is exported by vendor driver,
> > maybe it can also embed its module name in the migration version string,
> > like "i915" in "i915-GVTg_V5_8", as you suggested above.
> 
> No, we've already decided that the version string is opaque, the user
> is not to attempt to infer anything from it.  That's why I've suggested
> another attribute in sysfs that does present type information that a
> user can compare.  Thanks,
> 
> Alex
>
ok. got it.
one more thing I want to confirm is that do you think it's a necessary
restriction that "The mdev devices are of the same type" ?
could mdev and phys devices both expose "vfio_migration_type" and
"vfio_migration_version" under device sysfs so that it may not be
confined in mdev_type? (e.g. when aggregator is enabled, though two
mdevs are of the same mdev_type, they are not actually compatible; and
two mdevs are compatible though their mdev_type is not equal.) 

for mdev devices, we could still expose vfio_migration_version
attribute under mdev_type for detection before mdev generated.

Thanks
Yan
> > with module name as the first mandatory field in version string and
> > skipping the enumeration/testing problem, we can happyly unify migration
> > across mdev and phys devices. e.g. it is possible to migrate between
> > VFs in sriov and mdevs in siov to achieve backwards compatibility.
> > 
> > Thanks
> > Yan
> > 
> > 
> > 
> > >   
> > 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-03  5:24                                           ` Yan Zhao
@ 2020-06-03 16:26                                             ` Alex Williamson
  2020-06-05 10:22                                               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-06-03 16:26 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Dr. David Alan Gilbert, Tian, Kevin, cjia, kvm, linux-doc,
	libvir-list, Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede,
	eauger, corbet, Liu, Yi L, eskultet, Yang, Ziye, mlevitsk, pasic,
	aik, felipe, Ken.Xue, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Wed, 3 Jun 2020 01:24:43 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:
> > On Tue, 2 Jun 2020 23:19:48 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:  
> > > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >     
> > > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > > <snip>    
> > > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a      
> > > > > > > > > > > > > > > > > management      
> > > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > > >      
> > > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > > mdev1 <-> mdev2.      
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.      
> > > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.       
> > > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > > >      
> > > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > > to a mdev device according it supports it or not.      
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > > >       
> > > > > > > > > > > > > hi Dave
> > > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > do you think it's good?      
> > > > > > > > > > > > 
> > > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > > for checking for quirks.
> > > > > > > > > > > >      
> > > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > > number + software version".
> > > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > > compatibility of a device.
> > > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > > or, any other ideas?      
> > > > > > > > > > 
> > > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was      
> > > > > > > > > Yes, it's a good idea!
> > > > > > > > > could we add a line in the doc saying that
> > > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > > collision?      
> > > > > > > > 
> > > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > > the hex PCI Vendor ID.
> > > > > > > >      
> > > > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > > > they don't have PCI vendor IDs.      
> > > > > > 
> > > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > > idenitifiers for vendors if we can.
> > > > > >       
> > > > > what about leveraging the flags in vfio device info ?
> > > > > 
> > > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > > > 
> > > > > Then for migration_version string,
> > > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > > e.g.
> > > > > for PCI devices, it could be
> > > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > > > 
> > > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > > 64 bits. In future, if other types of devices want to support migration,
> > > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > > second 64-bit...
> > > > > 
> > > > > sounds good?    
> > > > 
> > > > [dead thread resurrection alert]
> > > > 
> > > > Not really.  We're deep into territory that we were trying to avoid.
> > > > We had previously defined the version string as opaque (not
> > > > transparent) specifically because we did not want userspace to make
> > > > assumptions about compatibility based on the content of the string.  It
> > > > was 100% left to the vendor driver to determine compatibility.  The
> > > > mdev type was the full extent of the first level filter that userspace
> > > > could use to narrow the set of potentially compatible devices.  If we
> > > > remove that due to physical device migration support, I'm not sure how
> > > > we simplify the problem for userspace.
> > > > 
> > > > We need to step away from PCI IDs and parent devices.  We're not
> > > > designing a solution that only works for PCI, there's no guarantee that
> > > > parent devices are similar or even from the same vendor.
> > > > 
> > > > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > > > so, then what can we learn from it and how can we apply an equivalence
> > > > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > > attributes under the device in sysfs where the _type provides the first
> > > > level, user transparent, matching string (ex. mdev type for mdev
> > > > devices) while the _version provides the user opaque, vendor known
> > > > compatibility test?
> > > > 
> > > > This pushes the problem out to the drivers where we can perhaps
> > > > incorporate the module name to avoid collisions.  For example Yan's
> > > > vendor extension proposal makes use of vfio-pci with extension modules
> > > > loaded via an alias incorporating the PCI vendor and device ID.  So
> > > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > > > 
> > > > It's still a bit messy that someone needs to go evaluate all these
> > > > types between devices that exist and mdev devices that might exist if
> > > > created, but I don't have any good ideas to resolve that (maybe a new
> > > > class hierarchy?).  Thanks,    
> > > 
> > > hi Alex
> > > 
> > > yes, with the same mdev_type, user still has to enumerate all parent
> > > devices and test between the supported mdev_types to know whether two mdev
> > > devices are compatible.
> > > maybe this is not a problem? in reality, it is the administrator that
> > > specifies two devices and the management tool feedbacks compatibility
> > > result. management tool is not required to pre-test and setup the
> > > compatibility map beforehand.  
> > 
> > That's exactly the purpose of this interface though is to give the
> > management tools some indication that a migration has a chance of
> > working.
> >    
> > > If so, then the only problem left is namespace collision. 
> > > given that the migration_version nodes is exported by vendor driver,
> > > maybe it can also embed its module name in the migration version string,
> > > like "i915" in "i915-GVTg_V5_8", as you suggested above.  
> > 
> > No, we've already decided that the version string is opaque, the user
> > is not to attempt to infer anything from it.  That's why I've suggested
> > another attribute in sysfs that does present type information that a
> > user can compare.  Thanks,
> > 
> > Alex
> >  
> ok. got it.
> one more thing I want to confirm is that do you think it's a necessary
> restriction that "The mdev devices are of the same type" ?
> could mdev and phys devices both expose "vfio_migration_type" and
> "vfio_migration_version" under device sysfs so that it may not be
> confined in mdev_type? (e.g. when aggregator is enabled, though two
> mdevs are of the same mdev_type, they are not actually compatible; and
> two mdevs are compatible though their mdev_type is not equal.) 
> 
> for mdev devices, we could still expose vfio_migration_version
> attribute under mdev_type for detection before mdev generated.

I tried to simplify the problem a bit, but we keep going backwards.  If
the requirement is that potentially any source device can migrate to any
target device and we cannot provide any means other than writing an
opaque source string into a version attribute on the target and
evaluating the result to determine compatibility, then we're requiring
userspace to do an exhaustive search to find a potential match.  That
sucks.  We don't have an agreed proposal for aggregation and even this
exhaustive search mechanism doesn't solve that problem, ex. the target
type may be able to support a compatible aggregation, but the user
might find after they've created the device that their aggregation was
wrong and the resulting device doesn't even match the version
compatibility of the parent type.  We're arguing our way into an
unsolvable problem and unless we can simplify it, I'm afraid there's no
solution, we're just going to have a bad interface for the user to test
compatibility, which is not really acceptable.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-03 16:26                                             ` Alex Williamson
@ 2020-06-05 10:22                                               ` Dr. David Alan Gilbert
  2020-06-05 14:31                                                 ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-06-05 10:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yan Zhao, Cornelia Huck, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, Zeng, Xin, zhenyuw, jonathan.davies, intel-gvt-dev,
	Liu, Changpeng, berrange, eskultet, linux-kernel, Wang, Zhi A,
	dinechin, He, Shaopeng

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Wed, 3 Jun 2020 01:24:43 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:
> > > On Tue, 2 Jun 2020 23:19:48 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:  
> > > > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >     
> > > > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > > > <snip>    
> > > > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a      
> > > > > > > > > > > > > > > > > > management      
> > > > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > > > >      
> > > > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > > > mdev1 <-> mdev2.      
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.      
> > > > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.       
> > > > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > > > >      
> > > > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > > > to a mdev device according it supports it or not.      
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > hi Dave
> > > > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > do you think it's good?      
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > > > for checking for quirks.
> > > > > > > > > > > > >      
> > > > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > > > number + software version".
> > > > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > > > compatibility of a device.
> > > > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > > > or, any other ideas?      
> > > > > > > > > > > 
> > > > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was      
> > > > > > > > > > Yes, it's a good idea!
> > > > > > > > > > could we add a line in the doc saying that
> > > > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > > > collision?      
> > > > > > > > > 
> > > > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > > > the hex PCI Vendor ID.
> > > > > > > > >      
> > > > > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > > > > they don't have PCI vendor IDs.      
> > > > > > > 
> > > > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > > > idenitifiers for vendors if we can.
> > > > > > >       
> > > > > > what about leveraging the flags in vfio device info ?
> > > > > > 
> > > > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > > > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > > > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > > > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > > > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > > > > 
> > > > > > Then for migration_version string,
> > > > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > > > e.g.
> > > > > > for PCI devices, it could be
> > > > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > > > > 
> > > > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > > > 64 bits. In future, if other types of devices want to support migration,
> > > > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > > > second 64-bit...
> > > > > > 
> > > > > > sounds good?    
> > > > > 
> > > > > [dead thread resurrection alert]
> > > > > 
> > > > > Not really.  We're deep into territory that we were trying to avoid.
> > > > > We had previously defined the version string as opaque (not
> > > > > transparent) specifically because we did not want userspace to make
> > > > > assumptions about compatibility based on the content of the string.  It
> > > > > was 100% left to the vendor driver to determine compatibility.  The
> > > > > mdev type was the full extent of the first level filter that userspace
> > > > > could use to narrow the set of potentially compatible devices.  If we
> > > > > remove that due to physical device migration support, I'm not sure how
> > > > > we simplify the problem for userspace.
> > > > > 
> > > > > We need to step away from PCI IDs and parent devices.  We're not
> > > > > designing a solution that only works for PCI, there's no guarantee that
> > > > > parent devices are similar or even from the same vendor.
> > > > > 
> > > > > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > > > > so, then what can we learn from it and how can we apply an equivalence
> > > > > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > > > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > > > attributes under the device in sysfs where the _type provides the first
> > > > > level, user transparent, matching string (ex. mdev type for mdev
> > > > > devices) while the _version provides the user opaque, vendor known
> > > > > compatibility test?
> > > > > 
> > > > > This pushes the problem out to the drivers where we can perhaps
> > > > > incorporate the module name to avoid collisions.  For example Yan's
> > > > > vendor extension proposal makes use of vfio-pci with extension modules
> > > > > loaded via an alias incorporating the PCI vendor and device ID.  So
> > > > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > > > > 
> > > > > It's still a bit messy that someone needs to go evaluate all these
> > > > > types between devices that exist and mdev devices that might exist if
> > > > > created, but I don't have any good ideas to resolve that (maybe a new
> > > > > class hierarchy?).  Thanks,    
> > > > 
> > > > hi Alex
> > > > 
> > > > yes, with the same mdev_type, user still has to enumerate all parent
> > > > devices and test between the supported mdev_types to know whether two mdev
> > > > devices are compatible.
> > > > maybe this is not a problem? in reality, it is the administrator that
> > > > specifies two devices and the management tool feedbacks compatibility
> > > > result. management tool is not required to pre-test and setup the
> > > > compatibility map beforehand.  
> > > 
> > > That's exactly the purpose of this interface though is to give the
> > > management tools some indication that a migration has a chance of
> > > working.
> > >    
> > > > If so, then the only problem left is namespace collision. 
> > > > given that the migration_version nodes is exported by vendor driver,
> > > > maybe it can also embed its module name in the migration version string,
> > > > like "i915" in "i915-GVTg_V5_8", as you suggested above.  
> > > 
> > > No, we've already decided that the version string is opaque, the user
> > > is not to attempt to infer anything from it.  That's why I've suggested
> > > another attribute in sysfs that does present type information that a
> > > user can compare.  Thanks,
> > > 
> > > Alex
> > >  
> > ok. got it.
> > one more thing I want to confirm is that do you think it's a necessary
> > restriction that "The mdev devices are of the same type" ?
> > could mdev and phys devices both expose "vfio_migration_type" and
> > "vfio_migration_version" under device sysfs so that it may not be
> > confined in mdev_type? (e.g. when aggregator is enabled, though two
> > mdevs are of the same mdev_type, they are not actually compatible; and
> > two mdevs are compatible though their mdev_type is not equal.) 
> > 
> > for mdev devices, we could still expose vfio_migration_version
> > attribute under mdev_type for detection before mdev generated.
> 
> I tried to simplify the problem a bit, but we keep going backwards.  If
> the requirement is that potentially any source device can migrate to any
> target device and we cannot provide any means other than writing an
> opaque source string into a version attribute on the target and
> evaluating the result to determine compatibility, then we're requiring
> userspace to do an exhaustive search to find a potential match.  That
> sucks. 

Why is the mechanism a 'write and test' why isn't it a 'write and ask'?
i.e. the destination tells the driver what type it's received from the
source, and the driver replies with a set of compatible configurations
(in some preferred order).

It's also not clear to me why the name has to be that opaque;
I agree it's only got to be understood by the driver but that doesn't
seem to be a reason for the driver to make it purposely obfuscated.
I wouldn't expect a user to be able to parse it necessarily; but would
expect something that would be useful for an error message.

Dave

> We don't have an agreed proposal for aggregation and even this
> exhaustive search mechanism doesn't solve that problem, ex. the target
> type may be able to support a compatible aggregation, but the user
> might find after they've created the device that their aggregation was
> wrong and the resulting device doesn't even match the version
> compatibility of the parent type.  We're arguing our way into an
> unsolvable problem and unless we can simplify it, I'm afraid there's no
> solution, we're just going to have a bad interface for the user to test
> compatibility, which is not really acceptable.  Thanks,
> 
> Alex
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-05 10:22                                               ` Dr. David Alan Gilbert
@ 2020-06-05 14:31                                                 ` Alex Williamson
  2020-06-05 14:39                                                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-06-05 14:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Yan Zhao, Cornelia Huck, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, Zeng, Xin, zhenyuw, jonathan.davies, intel-gvt-dev,
	Liu, Changpeng, berrange, eskultet, linux-kernel, Wang, Zhi A,
	dinechin, He, Shaopeng

On Fri, 5 Jun 2020 11:22:24 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Alex Williamson (alex.williamson@redhat.com) wrote:
> > On Wed, 3 Jun 2020 01:24:43 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:  
> > > > On Tue, 2 Jun 2020 23:19:48 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >     
> > > > > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:    
> > > > > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > >       
> > > > > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > > > > <snip>      
> > > > > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a        
> > > > > > > > > > > > > > > > > > > management        
> > > > > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > > > > >        
> > > > > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > > > > mdev1 <-> mdev2.        
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.        
> > > > > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.         
> > > > > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > > > > >        
> > > > > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > > > > to a mdev device according it supports it or not.        
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > hi Dave
> > > > > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > do you think it's good?        
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > > > > for checking for quirks.
> > > > > > > > > > > > > >        
> > > > > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > > > > number + software version".
> > > > > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > > > > compatibility of a device.
> > > > > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > > > > or, any other ideas?        
> > > > > > > > > > > > 
> > > > > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was        
> > > > > > > > > > > Yes, it's a good idea!
> > > > > > > > > > > could we add a line in the doc saying that
> > > > > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > > > > collision?        
> > > > > > > > > > 
> > > > > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > > > > the hex PCI Vendor ID.
> > > > > > > > > >        
> > > > > > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > > > > > they don't have PCI vendor IDs.        
> > > > > > > > 
> > > > > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > > > > idenitifiers for vendors if we can.
> > > > > > > >         
> > > > > > > what about leveraging the flags in vfio device info ?
> > > > > > > 
> > > > > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > > > > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > > > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > > > > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > > > > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > > > > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > > > > > 
> > > > > > > Then for migration_version string,
> > > > > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > > > > e.g.
> > > > > > > for PCI devices, it could be
> > > > > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > > > > > 
> > > > > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > > > > 64 bits. In future, if other types of devices want to support migration,
> > > > > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > > > > second 64-bit...
> > > > > > > 
> > > > > > > sounds good?      
> > > > > > 
> > > > > > [dead thread resurrection alert]
> > > > > > 
> > > > > > Not really.  We're deep into territory that we were trying to avoid.
> > > > > > We had previously defined the version string as opaque (not
> > > > > > transparent) specifically because we did not want userspace to make
> > > > > > assumptions about compatibility based on the content of the string.  It
> > > > > > was 100% left to the vendor driver to determine compatibility.  The
> > > > > > mdev type was the full extent of the first level filter that userspace
> > > > > > could use to narrow the set of potentially compatible devices.  If we
> > > > > > remove that due to physical device migration support, I'm not sure how
> > > > > > we simplify the problem for userspace.
> > > > > > 
> > > > > > We need to step away from PCI IDs and parent devices.  We're not
> > > > > > designing a solution that only works for PCI, there's no guarantee that
> > > > > > parent devices are similar or even from the same vendor.
> > > > > > 
> > > > > > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > > > > > so, then what can we learn from it and how can we apply an equivalence
> > > > > > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > > > > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > > > > attributes under the device in sysfs where the _type provides the first
> > > > > > level, user transparent, matching string (ex. mdev type for mdev
> > > > > > devices) while the _version provides the user opaque, vendor known
> > > > > > compatibility test?
> > > > > > 
> > > > > > This pushes the problem out to the drivers where we can perhaps
> > > > > > incorporate the module name to avoid collisions.  For example Yan's
> > > > > > vendor extension proposal makes use of vfio-pci with extension modules
> > > > > > loaded via an alias incorporating the PCI vendor and device ID.  So
> > > > > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > > > > > 
> > > > > > It's still a bit messy that someone needs to go evaluate all these
> > > > > > types between devices that exist and mdev devices that might exist if
> > > > > > created, but I don't have any good ideas to resolve that (maybe a new
> > > > > > class hierarchy?).  Thanks,      
> > > > > 
> > > > > hi Alex
> > > > > 
> > > > > yes, with the same mdev_type, user still has to enumerate all parent
> > > > > devices and test between the supported mdev_types to know whether two mdev
> > > > > devices are compatible.
> > > > > maybe this is not a problem? in reality, it is the administrator that
> > > > > specifies two devices and the management tool feedbacks compatibility
> > > > > result. management tool is not required to pre-test and setup the
> > > > > compatibility map beforehand.    
> > > > 
> > > > That's exactly the purpose of this interface though is to give the
> > > > management tools some indication that a migration has a chance of
> > > > working.
> > > >      
> > > > > If so, then the only problem left is namespace collision. 
> > > > > given that the migration_version nodes is exported by vendor driver,
> > > > > maybe it can also embed its module name in the migration version string,
> > > > > like "i915" in "i915-GVTg_V5_8", as you suggested above.    
> > > > 
> > > > No, we've already decided that the version string is opaque, the user
> > > > is not to attempt to infer anything from it.  That's why I've suggested
> > > > another attribute in sysfs that does present type information that a
> > > > user can compare.  Thanks,
> > > > 
> > > > Alex
> > > >    
> > > ok. got it.
> > > one more thing I want to confirm is that do you think it's a necessary
> > > restriction that "The mdev devices are of the same type" ?
> > > could mdev and phys devices both expose "vfio_migration_type" and
> > > "vfio_migration_version" under device sysfs so that it may not be
> > > confined in mdev_type? (e.g. when aggregator is enabled, though two
> > > mdevs are of the same mdev_type, they are not actually compatible; and
> > > two mdevs are compatible though their mdev_type is not equal.) 
> > > 
> > > for mdev devices, we could still expose vfio_migration_version
> > > attribute under mdev_type for detection before mdev generated.  
> > 
> > I tried to simplify the problem a bit, but we keep going backwards.  If
> > the requirement is that potentially any source device can migrate to any
> > target device and we cannot provide any means other than writing an
> > opaque source string into a version attribute on the target and
> > evaluating the result to determine compatibility, then we're requiring
> > userspace to do an exhaustive search to find a potential match.  That
> > sucks.   
> 
> Why is the mechanism a 'write and test' why isn't it a 'write and ask'?
> i.e. the destination tells the driver what type it's received from the
> source, and the driver replies with a set of compatible configurations
> (in some preferred order).

A 'write and ask' interface would imply some sort of session in order
to not be racy with concurrent users.  More likely this would imply an
ioctl interface, which I don't think we have in sysfs.  Where do we
host this ioctl?

> It's also not clear to me why the name has to be that opaque;
> I agree it's only got to be understood by the driver but that doesn't
> seem to be a reason for the driver to make it purposely obfuscated.
> I wouldn't expect a user to be able to parse it necessarily; but would
> expect something that would be useful for an error message.

If the name is not opaque, then we're going to rat hole on the format
and the fields and evolving that format for every feature a vendor
decides they want the user to be able to parse out of the version
string.  Then we require a full specification of the string in order
that it be parsed according to a standard such that we don't break
users inferring features in subtly different ways.

This is a lot like the problems with mdev description attributes,
libvirt complains they can't use description because there's no
standard formatting, but even with two vendors describing the same class
of device we don't have an agreed set of things to expose in the
description attribute.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-05 14:31                                                 ` Alex Williamson
@ 2020-06-05 14:39                                                   ` Dr. David Alan Gilbert
  2020-06-10  0:37                                                     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Dr. David Alan Gilbert @ 2020-06-05 14:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yan Zhao, Cornelia Huck, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, Zeng, Xin, zhenyuw, jonathan.davies, intel-gvt-dev,
	Liu, Changpeng, berrange, eskultet, linux-kernel, Wang, Zhi A,
	dinechin, He, Shaopeng

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Fri, 5 Jun 2020 11:22:24 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Alex Williamson (alex.williamson@redhat.com) wrote:
> > > On Wed, 3 Jun 2020 01:24:43 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:  
> > > > > On Tue, 2 Jun 2020 23:19:48 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >     
> > > > > > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:    
> > > > > > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > >       
> > > > > > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > > > > > <snip>      
> > > > > > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > > > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a        
> > > > > > > > > > > > > > > > > > > > management        
> > > > > > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > > > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > > > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > > > > > >        
> > > > > > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > > > > > mdev1 <-> mdev2.        
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > > > > > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.        
> > > > > > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.         
> > > > > > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > > > > > and test it in target migration version under target dev node. 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and 
> > > > > > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a 
> > > > > > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev 
> > > > > > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor 
> > > > > > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant 
> > > > > > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > > > > > >        
> > > > > > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > > > > > to a mdev device according it supports it or not.        
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > > > > > >         
> > > > > > > > > > > > > > > > hi Dave
> > > > > > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > the src migration_version string is "src PCIID + src software version", 
> > > > > > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > do you think it's good?        
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > > > > > an Intel-gpu-core-v4  and then it would be less worried about the exact
> > > > > > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > > > > > for checking for quirks.
> > > > > > > > > > > > > > >        
> > > > > > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > > > > > number + software version".
> > > > > > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > > > > > compatibility of a device.
> > > > > > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > > > > > or, any other ideas?        
> > > > > > > > > > > > > 
> > > > > > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was        
> > > > > > > > > > > > Yes, it's a good idea!
> > > > > > > > > > > > could we add a line in the doc saying that
> > > > > > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > > > > > collision?        
> > > > > > > > > > > 
> > > > > > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > > > > > the hex PCI Vendor ID.
> > > > > > > > > > >        
> > > > > > > > > > The problem is for mdev devices, if the parent devices are not PCI devices, 
> > > > > > > > > > they don't have PCI vendor IDs.        
> > > > > > > > > 
> > > > > > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > > > > > idenitifiers for vendors if we can.
> > > > > > > > >         
> > > > > > > > what about leveraging the flags in vfio device info ?
> > > > > > > > 
> > > > > > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
> > > > > > > > #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
> > > > > > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
> > > > > > > > #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
> > > > > > > > #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
> > > > > > > > #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */
> > > > > > > > 
> > > > > > > > Then for migration_version string,
> > > > > > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > > > > > e.g.
> > > > > > > > for PCI devices, it could be
> > > > > > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > > > > > > 
> > > > > > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > > > > > 64 bits. In future, if other types of devices want to support migration,
> > > > > > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > > > > > second 64-bit...
> > > > > > > > 
> > > > > > > > sounds good?      
> > > > > > > 
> > > > > > > [dead thread resurrection alert]
> > > > > > > 
> > > > > > > Not really.  We're deep into territory that we were trying to avoid.
> > > > > > > We had previously defined the version string as opaque (not
> > > > > > > transparent) specifically because we did not want userspace to make
> > > > > > > assumptions about compatibility based on the content of the string.  It
> > > > > > > was 100% left to the vendor driver to determine compatibility.  The
> > > > > > > mdev type was the full extent of the first level filter that userspace
> > > > > > > could use to narrow the set of potentially compatible devices.  If we
> > > > > > > remove that due to physical device migration support, I'm not sure how
> > > > > > > we simplify the problem for userspace.
> > > > > > > 
> > > > > > > We need to step away from PCI IDs and parent devices.  We're not
> > > > > > > designing a solution that only works for PCI, there's no guarantee that
> > > > > > > parent devices are similar or even from the same vendor.
> > > > > > > 
> > > > > > > Does the mdev type sufficiently solve the problem for mdev devices?  If
> > > > > > > so, then what can we learn from it and how can we apply an equivalence
> > > > > > > to physical devices?  For example, should a vfio bus driver (vfio-pci
> > > > > > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > > > > > attributes under the device in sysfs where the _type provides the first
> > > > > > > level, user transparent, matching string (ex. mdev type for mdev
> > > > > > > devices) while the _version provides the user opaque, vendor known
> > > > > > > compatibility test?
> > > > > > > 
> > > > > > > This pushes the problem out to the drivers where we can perhaps
> > > > > > > incorporate the module name to avoid collisions.  For example Yan's
> > > > > > > vendor extension proposal makes use of vfio-pci with extension modules
> > > > > > > loaded via an alias incorporating the PCI vendor and device ID.  So
> > > > > > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > > > > > > 
> > > > > > > It's still a bit messy that someone needs to go evaluate all these
> > > > > > > types between devices that exist and mdev devices that might exist if
> > > > > > > created, but I don't have any good ideas to resolve that (maybe a new
> > > > > > > class hierarchy?).  Thanks,      
> > > > > > 
> > > > > > hi Alex
> > > > > > 
> > > > > > yes, with the same mdev_type, user still has to enumerate all parent
> > > > > > devices and test between the supported mdev_types to know whether two mdev
> > > > > > devices are compatible.
> > > > > > maybe this is not a problem? in reality, it is the administrator that
> > > > > > specifies two devices and the management tool feedbacks compatibility
> > > > > > result. management tool is not required to pre-test and setup the
> > > > > > compatibility map beforehand.    
> > > > > 
> > > > > That's exactly the purpose of this interface though is to give the
> > > > > management tools some indication that a migration has a chance of
> > > > > working.
> > > > >      
> > > > > > If so, then the only problem left is namespace collision. 
> > > > > > given that the migration_version nodes is exported by vendor driver,
> > > > > > maybe it can also embed its module name in the migration version string,
> > > > > > like "i915" in "i915-GVTg_V5_8", as you suggested above.    
> > > > > 
> > > > > No, we've already decided that the version string is opaque, the user
> > > > > is not to attempt to infer anything from it.  That's why I've suggested
> > > > > another attribute in sysfs that does present type information that a
> > > > > user can compare.  Thanks,
> > > > > 
> > > > > Alex
> > > > >    
> > > > ok. got it.
> > > > one more thing I want to confirm is that do you think it's a necessary
> > > > restriction that "The mdev devices are of the same type" ?
> > > > could mdev and phys devices both expose "vfio_migration_type" and
> > > > "vfio_migration_version" under device sysfs so that it may not be
> > > > confined in mdev_type? (e.g. when aggregator is enabled, though two
> > > > mdevs are of the same mdev_type, they are not actually compatible; and
> > > > two mdevs are compatible though their mdev_type is not equal.) 
> > > > 
> > > > for mdev devices, we could still expose vfio_migration_version
> > > > attribute under mdev_type for detection before mdev generated.  
> > > 
> > > I tried to simplify the problem a bit, but we keep going backwards.  If
> > > the requirement is that potentially any source device can migrate to any
> > > target device and we cannot provide any means other than writing an
> > > opaque source string into a version attribute on the target and
> > > evaluating the result to determine compatibility, then we're requiring
> > > userspace to do an exhaustive search to find a potential match.  That
> > > sucks.   
> > 
> > Why is the mechanism a 'write and test' why isn't it a 'write and ask'?
> > i.e. the destination tells the driver what type it's received from the
> > source, and the driver replies with a set of compatible configurations
> > (in some preferred order).
> 
> A 'write and ask' interface would imply some sort of session in order
> to not be racy with concurrent users.  More likely this would imply an
> ioctl interface, which I don't think we have in sysfs.  Where do we
> host this ioctl?

Or one fd?
  f=open()
  write(f, "The ID I want")
  do {
     read(f, ...)  -> The IDs we're offering that are compatible
  } while (!eof)

> > It's also not clear to me why the name has to be that opaque;
> > I agree it's only got to be understood by the driver but that doesn't
> > seem to be a reason for the driver to make it purposely obfuscated.
> > I wouldn't expect a user to be able to parse it necessarily; but would
> > expect something that would be useful for an error message.
> 
> If the name is not opaque, then we're going to rat hole on the format
> and the fields and evolving that format for every feature a vendor
> decides they want the user to be able to parse out of the version
> string.  Then we require a full specification of the string in order
> that it be parsed according to a standard such that we don't break
> users inferring features in subtly different ways.
> 
> This is a lot like the problems with mdev description attributes,
> libvirt complains they can't use description because there's no
> standard formatting, but even with two vendors describing the same class
> of device we don't have an agreed set of things to expose in the
> description attribute.  Thanks,

I'm not suggesting anything in anyway machine parsable; just something
human readable that you can present in a menu/choice/configuration/error
message.  The text would be down to the vendor, and I'd suggest it start
with the vendor name just as a disambiguator and to make it obvious when
we get it grossly wrong.

Dave

> Alex
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-05 14:39                                                   ` Dr. David Alan Gilbert
@ 2020-06-10  0:37                                                     ` Yan Zhao
  2020-06-19 22:40                                                       ` Alex Williamson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2020-06-10  0:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Alex Williamson, cjia, kvm, linux-doc, libvir-list, Zhengxiao.zx,
	shuangtai.tst, qemu-devel, kwankhede, eauger, Liu, Yi L, corbet,
	Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue, Tian, Kevin,
	eskultet, Zeng, Xin, zhenyuw, dinechin, intel-gvt-dev, Liu,
	Changpeng, berrange, Cornelia Huck, linux-kernel, Wang, Zhi A,
	jonathan.davies, He, Shaopeng

On Fri, Jun 05, 2020 at 03:39:50PM +0100, Dr. David Alan Gilbert wrote:
> > > > I tried to simplify the problem a bit, but we keep going backwards.  If
> > > > the requirement is that potentially any source device can migrate to any
> > > > target device and we cannot provide any means other than writing an
> > > > opaque source string into a version attribute on the target and
> > > > evaluating the result to determine compatibility, then we're requiring
> > > > userspace to do an exhaustive search to find a potential match.  That
> > > > sucks.   
> > >
hi Alex and Dave,
do you think it's good for us to put aside physical devices and mdev aggregation
for the moment, and use Alex's original idea that

+  Userspace should regard two mdev devices compatible when ALL of below
+  conditions are met:
+  (0) The mdev devices are of the same type
+  (1) success when reading from migration_version attribute of one mdev device.
+  (2) success when writing migration_version string of one mdev device to
+  migration_version attribute of the other mdev device.

and what about adding another sysfs attribute for vendors to put
recommended migration compatible device type. e.g.
#cat /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/migration_compatible_devices
parent id: 8086 591d
mdev_type: i915-GVTg_V5_8

vendors are free to define the format and conent of this migration_compatible_devices
and it's even not to be a full list.

before libvirt or user to do live migration, they have to read and test
migration_version attributes of src/target devices to check migration compatibility.

Thanks
Yan


> > > Why is the mechanism a 'write and test' why isn't it a 'write and ask'?
> > > i.e. the destination tells the driver what type it's received from the
> > > source, and the driver replies with a set of compatible configurations
> > > (in some preferred order).
> > 
> > A 'write and ask' interface would imply some sort of session in order
> > to not be racy with concurrent users.  More likely this would imply an
> > ioctl interface, which I don't think we have in sysfs.  Where do we
> > host this ioctl?
> 
> Or one fd?
>   f=open()
>   write(f, "The ID I want")
>   do {
>      read(f, ...)  -> The IDs we're offering that are compatible
>   } while (!eof)
> 
> > > It's also not clear to me why the name has to be that opaque;
> > > I agree it's only got to be understood by the driver but that doesn't
> > > seem to be a reason for the driver to make it purposely obfuscated.
> > > I wouldn't expect a user to be able to parse it necessarily; but would
> > > expect something that would be useful for an error message.
> > 
> > If the name is not opaque, then we're going to rat hole on the format
> > and the fields and evolving that format for every feature a vendor
> > decides they want the user to be able to parse out of the version
> > string.  Then we require a full specification of the string in order
> > that it be parsed according to a standard such that we don't break
> > users inferring features in subtly different ways.
> > 
> > This is a lot like the problems with mdev description attributes,
> > libvirt complains they can't use description because there's no
> > standard formatting, but even with two vendors describing the same class
> > of device we don't have an agreed set of things to expose in the
> > description attribute.  Thanks,
> 
> I'm not suggesting anything in anyway machine parsable; just something
> human readable that you can present in a menu/choice/configuration/error
> message.  The text would be down to the vendor, and I'd suggest it start
> with the vendor name just as a disambiguator and to make it obvious when
> we get it grossly wrong.
> 
> Dave
> 
> > Alex
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-10  0:37                                                     ` Yan Zhao
@ 2020-06-19 22:40                                                       ` Alex Williamson
  2020-06-22  2:28                                                         ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Williamson @ 2020-06-19 22:40 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Dr. David Alan Gilbert, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, eskultet, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Tue, 9 Jun 2020 20:37:31 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Fri, Jun 05, 2020 at 03:39:50PM +0100, Dr. David Alan Gilbert wrote:
> > > > > I tried to simplify the problem a bit, but we keep going backwards.  If
> > > > > the requirement is that potentially any source device can migrate to any
> > > > > target device and we cannot provide any means other than writing an
> > > > > opaque source string into a version attribute on the target and
> > > > > evaluating the result to determine compatibility, then we're requiring
> > > > > userspace to do an exhaustive search to find a potential match.  That
> > > > > sucks.     
> > > >  
> hi Alex and Dave,
> do you think it's good for us to put aside physical devices and mdev aggregation
> for the moment, and use Alex's original idea that
> 
> +  Userspace should regard two mdev devices compatible when ALL of below
> +  conditions are met:
> +  (0) The mdev devices are of the same type
> +  (1) success when reading from migration_version attribute of one mdev device.
> +  (2) success when writing migration_version string of one mdev device to
> +  migration_version attribute of the other mdev device.

I think Pandora's box is already opened, if we can't articulate how
this solution would evolve to support features that we know are coming,
why should we proceed with this approach?  We've already seen interest
in breaking rule (0) in this thread, so we can't focus the solution on
mdev devices.

Maybe the best we can do is to compare one instance of a device to
another instance of a device, without any capability to predict
compatibility prior to creating devices, in the case on mdev.  The
string would need to include not only the device and vendor driver
compatibility, but also anything that has modified the state of the
device, such as creation time or post-creation time configuration.  The
user is left on their own for creating a compatible device, or
filtering devices to determine which might be, or which might generate,
compatible devices.  It's not much of a solution, I wonder if anyone
would even use it.

> and what about adding another sysfs attribute for vendors to put
> recommended migration compatible device type. e.g.
> #cat /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/migration_compatible_devices
> parent id: 8086 591d
> mdev_type: i915-GVTg_V5_8
> 
> vendors are free to define the format and conent of this migration_compatible_devices
> and it's even not to be a full list.
> 
> before libvirt or user to do live migration, they have to read and test
> migration_version attributes of src/target devices to check migration compatibility.

AFAICT, free-form, vendor defined attributes are useless to libvirt.
Vendors could already put this information in the description attribute
and have it ignored by userspace tools due to the lack of defined
format.  It's also not clear what value this provides when it's
necessarily incomplete, a driver written today cannot know what future
drivers might be compatible with its migration data.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
  2020-06-19 22:40                                                       ` Alex Williamson
@ 2020-06-22  2:28                                                         ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2020-06-22  2:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Dr. David Alan Gilbert, cjia, kvm, linux-doc, libvir-list,
	Zhengxiao.zx, shuangtai.tst, qemu-devel, kwankhede, eauger, Liu,
	Yi L, corbet, Yang, Ziye, mlevitsk, pasic, aik, felipe, Ken.Xue,
	Tian, Kevin, eskultet, Zeng, Xin, zhenyuw, dinechin,
	intel-gvt-dev, Liu, Changpeng, berrange, Cornelia Huck,
	linux-kernel, Wang, Zhi A, jonathan.davies, He, Shaopeng

On Fri, Jun 19, 2020 at 04:40:46PM -0600, Alex Williamson wrote:
> On Tue, 9 Jun 2020 20:37:31 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Fri, Jun 05, 2020 at 03:39:50PM +0100, Dr. David Alan Gilbert wrote:
> > > > > > I tried to simplify the problem a bit, but we keep going backwards.  If
> > > > > > the requirement is that potentially any source device can migrate to any
> > > > > > target device and we cannot provide any means other than writing an
> > > > > > opaque source string into a version attribute on the target and
> > > > > > evaluating the result to determine compatibility, then we're requiring
> > > > > > userspace to do an exhaustive search to find a potential match.  That
> > > > > > sucks.     
> > > > >  
> > hi Alex and Dave,
> > do you think it's good for us to put aside physical devices and mdev aggregation
> > for the moment, and use Alex's original idea that
> > 
> > +  Userspace should regard two mdev devices compatible when ALL of below
> > +  conditions are met:
> > +  (0) The mdev devices are of the same type
> > +  (1) success when reading from migration_version attribute of one mdev device.
> > +  (2) success when writing migration_version string of one mdev device to
> > +  migration_version attribute of the other mdev device.
> 
> I think Pandora's box is already opened, if we can't articulate how
> this solution would evolve to support features that we know are coming,
> why should we proceed with this approach?  We've already seen interest
> in breaking rule (0) in this thread, so we can't focus the solution on
> mdev devices.
> 
> Maybe the best we can do is to compare one instance of a device to
> another instance of a device, without any capability to predict
> compatibility prior to creating devices, in the case on mdev.  The
> string would need to include not only the device and vendor driver
> compatibility, but also anything that has modified the state of the
> device, such as creation time or post-creation time configuration.  The
> user is left on their own for creating a compatible device, or
> filtering devices to determine which might be, or which might generate,
> compatible devices.  It's not much of a solution, I wonder if anyone
> would even use it.
> 
> > and what about adding another sysfs attribute for vendors to put
> > recommended migration compatible device type. e.g.
> > #cat /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/migration_compatible_devices
> > parent id: 8086 591d
> > mdev_type: i915-GVTg_V5_8
> > 
> > vendors are free to define the format and conent of this migration_compatible_devices
> > and it's even not to be a full list.
> > 
> > before libvirt or user to do live migration, they have to read and test
> > migration_version attributes of src/target devices to check migration compatibility.
> 
> AFAICT, free-form, vendor defined attributes are useless to libvirt.
> Vendors could already put this information in the description attribute
> and have it ignored by userspace tools due to the lack of defined
> format.  It's also not clear what value this provides when it's
> necessarily incomplete, a driver written today cannot know what future
> drivers might be compatible with its migration data.  Thanks,
>
hi Alex
maybe the problem can be divided into two pieces:
(1) how to create/locate two migration compatible devices. For normal
users, the most common and safest way to do it is to find a exact duplication
of the source device. so for mdev, it's probably to create a target mdev
of the same parent pci id, mdev type and creation parameters as the
source mdev; and for physical devices, it's to locate a target device of the
same pci id as the source device, plus some extra constraints (e.g. the
target NVMe device is configured to the same remote device as the source
NVMe device; or the target QAT device is supporting equal encryption
algorithm set as the source QAT device...).
I think a possible solution for this piece is to let vendor drivers provide a
creating/locating script to find such exact duplication of source device.
Then before libvirt is about to do live migration, it can use this script to
create a target vm of exactly duplicated configuration of the source vm.

(2) how to identify two devices are migration compatible after they are
created and even they are not exactly identical (e.g. their parent
devices are of minor difference in hardware SKUs). This identification is
necessary even after in step (1) when libvirt has created/located two
identical devices and are about to start live migration.
Also, users are free to create/configure target devices and use the
read-and-test interfaces defined in this series to check if they are
live migration compatible.
The read and test behavior in this patch set can grant vendor drivers the
freedom to decide whether to support migration between only exact identical
devices or able to support migration between devices of minor
difference. 

So, do you think we can let this series focus on the second piece of
problem and leave the first piece to other future series.

Thanks
Yan






















^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-06-22  2:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-13  5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
2020-04-13  5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
2020-04-15  7:28   ` Erik Skultety
2020-04-15  8:58     ` Yan Zhao
2020-04-13  5:54 ` [PATCH v5 2/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
2020-04-13  5:55 ` [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node) Yan Zhao
2020-04-15  7:42   ` Erik Skultety
2020-04-15  9:02     ` Yan Zhao
2020-04-13  5:55 ` [PATCH v5 4/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
2020-04-17  8:44 ` [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Cornelia Huck
2020-04-17  9:52   ` Yan Zhao
2020-04-17 11:24     ` Cornelia Huck
2020-04-20  1:24       ` Yan Zhao
2020-04-20 22:56         ` Alex Williamson
2020-04-21  2:37           ` Yan Zhao
2020-04-21 12:08             ` Tian, Kevin
2020-04-22  7:36               ` Yan Zhao
2020-04-24 19:10                 ` Dr. David Alan Gilbert
2020-04-26  1:36                   ` Yan Zhao
2020-04-27 15:37                     ` Dr. David Alan Gilbert
2020-04-28  0:54                       ` Yan Zhao
2020-04-28 14:14                         ` Dr. David Alan Gilbert
2020-04-29  7:26                           ` Yan Zhao
2020-04-29  8:22                             ` Dr. David Alan Gilbert
2020-04-29  9:35                               ` Yan Zhao
2020-04-29  9:48                                 ` Dr. David Alan Gilbert
2020-04-30  0:39                                   ` Yan Zhao
2020-06-02 22:55                                     ` Alex Williamson
2020-06-03  3:19                                       ` Yan Zhao
2020-06-03  3:55                                         ` Alex Williamson
2020-06-03  5:24                                           ` Yan Zhao
2020-06-03 16:26                                             ` Alex Williamson
2020-06-05 10:22                                               ` Dr. David Alan Gilbert
2020-06-05 14:31                                                 ` Alex Williamson
2020-06-05 14:39                                                   ` Dr. David Alan Gilbert
2020-06-10  0:37                                                     ` Yan Zhao
2020-06-19 22:40                                                       ` Alex Williamson
2020-06-22  2:28                                                         ` Yan Zhao
2020-04-29 14:13                                 ` Eric Blake
2020-04-30  0:45                                   ` Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).