All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
@ 2022-05-30 17:07 Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Hello,

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Migration protocol v2 currently doesn't support the pre-copy phase of
   migration.

Full description of the v2 protocol and the differences from v1 can be
found here [1].

Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
that will be used later.

Patches 4-6 refactor v1 protocol code to make it easier to add v2
protocol.

Patches 7-11 implement v2 protocol and remove v1 protocol.

Thanks.

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
- Split the big patch that replaced v1 with v2 into several patches as
  suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
  dirty tracking.
- Add Reviewed-by tag.

Avihai Horon (11):
  vfio/migration: Fix NULL pointer dereference bug
  vfio/migration: Skip pre-copy if dirty page tracking is not supported
  migration/qemu-file: Add qemu_file_get_to_fd()
  vfio/common: Change vfio_devices_all_running_and_saving() logic to
    equivalent one
  vfio/migration: Move migration v1 logic to vfio_migration_init()
  vfio/migration: Rename functions/structs related to v1 protocol
  vfio/migration: Implement VFIO migration protocol v2
  vfio/migration: Remove VFIO migration protocol v1
  vfio/migration: Reset device if setting recover state fails
  vfio: Alphabetize migration section of VFIO trace-events file
  docs/devel: Align vfio-migration docs to VFIO migration v2

 docs/devel/vfio-migration.rst |  77 ++--
 hw/vfio/common.c              |  21 +-
 hw/vfio/migration.c           | 640 ++++++++--------------------------
 hw/vfio/trace-events          |  25 +-
 include/hw/vfio/vfio-common.h |   8 +-
 migration/migration.c         |   5 +
 migration/migration.h         |   3 +
 migration/qemu-file.c         |  34 ++
 migration/qemu-file.h         |   1 +
 9 files changed, 252 insertions(+), 562 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported Avihai Horon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 hw/vfio/migration.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a6ad1f8945..34f9f894ed 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -744,7 +744,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
          */
         error_report("%s: Failed to set device state 0x%x", vbasedev->name,
                      (migration->device_state & mask) | value);
-        qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        if (migrate_get_current()->to_dst_file) {
+            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        }
     }
     vbasedev->migration->vm_running = running;
     trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:12   ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked completely. This is because a DMA-able
VFIO device can dirty RAM pages without updating QEMU about it, thus
breaking the migration.

However, this doesn't mean that migration can't be done at all. If
migration pre-copy phase is skipped, the VFIO device doesn't have a
chance to dirty RAM pages that have been migrated already, thus
eliminating the problem previously mentioned.

Hence, in such case allow migration but skip pre-copy phase.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c   | 9 ++++++++-
 migration/migration.c | 5 +++++
 migration/migration.h | 3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34f9f894ed..d8f9b086c2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
     struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
-    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+    if (!vbasedev->enable_migration) {
         goto add_blocker;
     }
 
+    if (!container->dirty_pages_supported) {
+        warn_report_once(
+            "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped",
+            vbasedev->name);
+        migrate_get_current()->skip_precopy = true;
+    }
+
     ret = vfio_get_dev_region_info(vbasedev,
                                    VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
                                    VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
diff --git a/migration/migration.c b/migration/migration.c
index 31739b2af9..217f0e3e94 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3636,6 +3636,11 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pending_size, pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
+    if (s->skip_precopy) {
+        migration_completion(s);
+        return MIG_ITERATE_BREAK;
+    }
+
     qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
                               &pend_compat, &pend_post);
     pending_size = pend_pre + pend_compat + pend_post;
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..0920a0950e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -332,6 +332,9 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* Whether to skip pre-copy phase of migration or not */
+    bool skip_precopy;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd()
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.

This will be used later in VFIO migration code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++
 migration/qemu-file.h |  1 +
 2 files changed, 35 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1479cddad9..cad3d32eb3 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -867,3 +867,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
 {
     return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL;
 }
+
+/*
+ * Read size bytes from QEMUFile f and write them to fd.
+ */
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
+{
+    while (size) {
+        size_t pending = f->buf_size - f->buf_index;
+        ssize_t rc;
+
+        if (!pending) {
+            rc = qemu_fill_buffer(f);
+            if (rc < 0) {
+                return rc;
+            }
+            if (rc == 0) {
+                return -1;
+            }
+            continue;
+        }
+
+        rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
+        if (rc < 0) {
+            return rc;
+        }
+        if (rc == 0) {
+            return -1;
+        }
+        f->buf_index += rc;
+        size -= rc;
+    }
+
+    return 0;
+}
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 3f36d4dc8c..dd26037450 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -162,6 +162,7 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (2 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 05/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.

VFIO migration protocol v2 currently doesn't support pre-copy phase, so
it doesn't have an equivalent VFIO pre-copy state like v1 has.

As preparation for adding the v2 protocol and to make things easier,
change vfio_devices_all_running_and_saving() logic to check if migration
is active and if all VFIO devices are in running state.

The new logic avoids using the VFIO pre-copy state and is equivalent to
the previous one, indicating that migration is in pre-copy phase. No
functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 29982c7af8..bbc6d375de 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "migration/misc.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
     return true;
 }
 
-static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+/*
+ * Check if all VFIO devices are running and migration is active, which is
+ * essentially equivalent to the migration being in pre-copy phase.
+ */
+static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
     VFIOGroup *group;
     VFIODevice *vbasedev;
-    MigrationState *ms = migrate_get_current();
 
-    if (!migration_is_setup_or_active(ms->state)) {
+    if (!migration_is_active(migrate_get_current())) {
         return false;
     }
 
@@ -381,8 +385,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
                 return false;
             }
 
-            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
-                (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+            if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
                 continue;
             } else {
                 return false;
@@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
     };
 
     if (iotlb && container->dirty_pages_supported &&
-        vfio_devices_all_running_and_saving(container)) {
+        vfio_devices_all_running_and_mig_active(container)) {
         return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
     }
 
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/11] vfio/migration: Move migration v1 logic to vfio_migration_init()
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (3 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 06/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c  | 30 +++++++++++++++---------------
 hw/vfio/trace-events |  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index d8f9b086c2..8a0deed0e4 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -789,14 +789,14 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
     vbasedev->migration = NULL;
 }
 
-static int vfio_migration_init(VFIODevice *vbasedev,
-                               struct vfio_region_info *info)
+static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
     Object *obj;
     VFIOMigration *migration;
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
+    struct vfio_region_info *info = NULL;
 
     if (!vbasedev->ops->vfio_get_object) {
         return -EINVAL;
@@ -807,6 +807,14 @@ static int vfio_migration_init(VFIODevice *vbasedev,
         return -EINVAL;
     }
 
+    ret = vfio_get_dev_region_info(vbasedev,
+                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                   &info);
+    if (ret) {
+        return ret;
+    }
+
     vbasedev->migration = g_new0(VFIOMigration, 1);
 
     ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
@@ -824,6 +832,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
         goto err;
     }
 
+    g_free(info);
+
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
 
@@ -846,6 +856,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
     return 0;
 
 err:
+    g_free(info);
     vfio_migration_exit(vbasedev);
     return ret;
 }
@@ -860,7 +871,6 @@ int64_t vfio_mig_bytes_transferred(void)
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
     VFIOContainer *container = vbasedev->group->container;
-    struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
     if (!vbasedev->enable_migration) {
@@ -874,27 +884,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         migrate_get_current()->skip_precopy = true;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev,
-                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                   &info);
+    ret = vfio_migration_init(vbasedev);
     if (ret) {
         goto add_blocker;
     }
 
-    ret = vfio_migration_init(vbasedev, info);
-    if (ret) {
-        goto add_blocker;
-    }
-
-    trace_vfio_migration_probe(vbasedev->name, info->index);
-    g_free(info);
+    trace_vfio_migration_probe(vbasedev->name);
     return 0;
 
 add_blocker:
     error_setg(&vbasedev->migration_blocker,
                "VFIO device doesn't support migration");
-    g_free(info);
 
     ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
     if (ret < 0) {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 582882db91..438402b619 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,7 +148,7 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
-vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/11] vfio/migration: Rename functions/structs related to v1 protocol
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (4 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 05/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c              |  6 +--
 hw/vfio/migration.c           | 82 +++++++++++++++++------------------
 hw/vfio/trace-events          |  2 +-
 include/hw/vfio/vfio-common.h |  2 +-
 4 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index bbc6d375de..a3dd8221ed 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,8 +355,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
-                && (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
                 return false;
             }
         }
@@ -385,7 +385,7 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
+            if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 8a0deed0e4..e40aa0ad80 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -107,8 +107,8 @@ static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
  * an error is returned.
  */
 
-static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
-                                    uint32_t value)
+static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
+                                       uint32_t value)
 {
     VFIOMigration *migration = vbasedev->migration;
     VFIORegion *region = &migration->region;
@@ -145,7 +145,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
         return ret;
     }
 
-    migration->device_state = device_state;
+    migration->device_state_v1 = device_state;
     trace_vfio_migration_set_state(vbasedev->name, device_state);
     return 0;
 }
@@ -260,8 +260,8 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
     return ret;
 }
 
-static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-                            uint64_t data_size)
+static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+                               uint64_t data_size)
 {
     VFIORegion *region = &vbasedev->migration->region;
     uint64_t data_offset = 0, size, report_size;
@@ -288,7 +288,7 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
             data_size = 0;
         }
 
-        trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
+        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
 
         while (size) {
             void *buf;
@@ -394,7 +394,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
-static void vfio_migration_cleanup(VFIODevice *vbasedev)
+static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
@@ -405,7 +405,7 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
 
 /* ---------------------------------------------------------------------- */
 
-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -431,8 +431,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_V1_SAVING);
+    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+                                      VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state SAVING", vbasedev->name);
         return ret;
@@ -448,11 +448,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void vfio_save_cleanup(void *opaque)
+static void vfio_v1_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
 
-    vfio_migration_cleanup(vbasedev);
+    vfio_migration_v1_cleanup(vbasedev);
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
@@ -524,15 +524,15 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
+static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
     uint64_t data_size;
     int ret;
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
-                                   VFIO_DEVICE_STATE_V1_SAVING);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
+                                      VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state STOP and SAVING",
                      vbasedev->name);
@@ -569,7 +569,8 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
+                                      0);
     if (ret) {
         error_report("%s: Failed to set state STOPPED", vbasedev->name);
         return ret;
@@ -592,7 +593,7 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
     }
 }
 
-static int vfio_load_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -608,8 +609,8 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_V1_RESUMING);
+    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
+                                      VFIO_DEVICE_STATE_V1_RESUMING);
     if (ret) {
         error_report("%s: Failed to set state RESUMING", vbasedev->name);
         if (migration->region.mmaps) {
@@ -619,11 +620,11 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
     return ret;
 }
 
-static int vfio_load_cleanup(void *opaque)
+static int vfio_v1_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
 
-    vfio_migration_cleanup(vbasedev);
+    vfio_migration_v1_cleanup(vbasedev);
     trace_vfio_load_cleanup(vbasedev->name);
     return 0;
 }
@@ -661,7 +662,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                ret = vfio_load_buffer(f, vbasedev, data_size);
+                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
                 if (ret < 0) {
                     return ret;
                 }
@@ -682,21 +683,21 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
-static SaveVMHandlers savevm_vfio_handlers = {
-    .save_setup = vfio_save_setup,
-    .save_cleanup = vfio_save_cleanup,
+static SaveVMHandlers savevm_vfio_v1_handlers = {
+    .save_setup = vfio_v1_save_setup,
+    .save_cleanup = vfio_v1_save_cleanup,
     .save_live_pending = vfio_save_pending,
     .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
     .save_state = vfio_save_state,
-    .load_setup = vfio_load_setup,
-    .load_cleanup = vfio_load_cleanup,
+    .load_setup = vfio_v1_load_setup,
+    .load_cleanup = vfio_v1_load_cleanup,
     .load_state = vfio_load_state,
 };
 
 /* ---------------------------------------------------------------------- */
 
-static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -736,21 +737,21 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
         }
     }
 
-    ret = vfio_migration_set_state(vbasedev, mask, value);
+    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
     if (ret) {
         /*
          * Migration should be aborted in this case, but vm_state_notify()
          * currently does not support reporting failures.
          */
         error_report("%s: Failed to set device state 0x%x", vbasedev->name,
-                     (migration->device_state & mask) | value);
+                     (migration->device_state_v1 & mask) | value);
         if (migrate_get_current()->to_dst_file) {
             qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
         }
     }
     vbasedev->migration->vm_running = running;
     trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-            (migration->device_state & mask) | value);
+            (migration->device_state_v1 & mask) | value);
 }
 
 static void vfio_migration_state_notifier(Notifier *notifier, void *data)
@@ -769,10 +770,10 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        ret = vfio_migration_set_state(vbasedev,
-                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                         VFIO_DEVICE_STATE_V1_RESUMING),
-                                       VFIO_DEVICE_STATE_V1_RUNNING);
+        ret = vfio_migration_v1_set_state(vbasedev,
+                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                            VFIO_DEVICE_STATE_V1_RESUMING),
+                                          VFIO_DEVICE_STATE_V1_RUNNING);
         if (ret) {
             error_report("%s: Failed to set state RUNNING", vbasedev->name);
         }
@@ -845,12 +846,11 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
-                         vbasedev);
+    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                         &savevm_vfio_v1_handlers, vbasedev);
 
-    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
-                                                           vfio_vmstate_change,
-                                                           vbasedev);
+    migration->vm_state = qdev_add_vm_change_state_handler(
+        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
     return 0;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 438402b619..ac8b04f52a 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -162,7 +162,7 @@ vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
+vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e573f5a9f1..bbaf72ba00 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIOMigration {
     struct VFIODevice *vbasedev;
     VMChangeStateEntry *vm_state;
     VFIORegion region;
-    uint32_t device_state;
+    uint32_t device_state_v1;
     int vm_running;
     Notifier migration_state;
     uint64_t pending_bytes;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (5 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 06/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-06-14 11:08   ` Joao Martins
  2022-07-18 15:12   ` Jason Gunthorpe
  2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Add implementation of VFIO migration protocol v2. The two protocols, v1
and v2, will co-exist and in next patch v1 protocol will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- VFIO migration protocol v2 currently doesn't support the pre-copy
  phase of migration.

Detailed information about VFIO migration protocol v2 and difference
compared to v1 can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c              |  19 +-
 hw/vfio/migration.c           | 365 ++++++++++++++++++++++++++++++----
 hw/vfio/trace-events          |   2 +
 include/hw/vfio/vfio-common.h |   5 +
 4 files changed, 354 insertions(+), 37 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a3dd8221ed..5541133ec9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+            if (!migration->v2 &&
+                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
                 (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
                 return false;
             }
+
+            if (migration->v2 &&
+                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+                return false;
+            }
         }
     }
     return true;
@@ -385,7 +393,14 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+            if (!migration->v2 &&
+                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+                continue;
+            }
+
+            if (migration->v2 &&
+                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e40aa0ad80..de68eadb09 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -44,8 +44,83 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
 
+#define VFIO_MIG_DATA_BUFFER_SIZE (1024 * 1024)
+
 static int64_t bytes_transferred;
 
+static const char *mig_state_to_str(enum vfio_device_mig_state state)
+{
+    switch (state) {
+    case VFIO_DEVICE_STATE_ERROR:
+        return "ERROR";
+    case VFIO_DEVICE_STATE_STOP:
+        return "STOP";
+    case VFIO_DEVICE_STATE_RUNNING:
+        return "RUNNING";
+    case VFIO_DEVICE_STATE_STOP_COPY:
+        return "STOP_COPY";
+    case VFIO_DEVICE_STATE_RESUMING:
+        return "RESUMING";
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+        return "RUNNING_P2P";
+    default:
+        return "UNKNOWN STATE";
+    }
+}
+
+static int vfio_migration_set_state(VFIODevice *vbasedev,
+                                    enum vfio_device_mig_state new_state,
+                                    enum vfio_device_mig_state recover_state)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                              sizeof(struct vfio_device_feature_mig_state),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (void *)buf;
+    struct vfio_device_feature_mig_state *mig_state = (void *)feature->data;
+    int ret;
+
+    feature->argsz = sizeof(buf);
+    feature->flags =
+        VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
+    mig_state->device_state = new_state;
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+    if (ret) {
+        /* Try to put the device in some good state */
+        mig_state->device_state = recover_state;
+        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+            hw_error("%s: Device in error state, can't recover",
+                     vbasedev->name);
+        }
+
+        error_report("%s: Failed changing device state to %s", vbasedev->name,
+                     mig_state_to_str(new_state));
+        migration->device_state = recover_state;
+
+        return -1;
+    }
+
+    if (mig_state->data_fd != -1) {
+        if (migration->data_fd != -1) {
+            /*
+             * This can happen if the device is asynchronously reset and
+             * terminates a data transfer.
+             */
+            error_report("%s: data_fd out of sync", vbasedev->name);
+            close(mig_state->data_fd);
+
+            return -1;
+        }
+
+        migration->data_fd = mig_state->data_fd;
+    }
+    migration->device_state = new_state;
+
+    trace_vfio_migration_set_state(vbasedev->name, new_state);
+
+    return 0;
+}
+
 static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
                                   off_t off, bool iswrite)
 {
@@ -260,6 +335,22 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
     return ret;
 }
 
+static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+                            uint64_t data_size)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    int ret;
+
+    ret = qemu_file_get_to_fd(f, migration->data_fd, data_size);
+    if (ret) {
+        return ret;
+    }
+
+    trace_vfio_load_state_device_data(vbasedev->name, data_size);
+
+    return 0;
+}
+
 static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
                                uint64_t data_size)
 {
@@ -394,6 +485,14 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
+static void vfio_migration_cleanup(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    close(migration->data_fd);
+    migration->data_fd = -1;
+}
+
 static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
@@ -405,6 +504,18 @@ static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 
 /* ---------------------------------------------------------------------- */
 
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    trace_vfio_save_setup(vbasedev->name);
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+    return qemu_file_get_error(f);
+}
+
 static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -448,6 +559,14 @@ static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
+static void vfio_save_cleanup(void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    vfio_migration_cleanup(vbasedev);
+    trace_vfio_save_cleanup(vbasedev->name);
+}
+
 static void vfio_v1_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -524,6 +643,69 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     return 0;
 }
 
+/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
+static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
+{
+    ssize_t data_size;
+
+    data_size = read(migration->data_fd, migration->data_buffer,
+                     migration->data_buffer_size);
+    if (data_size < 0) {
+        return -1;
+    }
+    if (data_size == 0) {
+        return 1;
+    }
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
+    qemu_put_be64(f, data_size);
+    qemu_put_buffer_async(f, migration->data_buffer, data_size, false);
+    qemu_fflush(f);
+    bytes_transferred += data_size;
+
+    trace_vfio_save_block(migration->vbasedev->name, data_size);
+
+    return qemu_file_get_error(f);
+}
+
+static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+    enum vfio_device_mig_state recover_state;
+    int ret;
+
+    /* We reach here with device state STOP or STOP_COPY only */
+    recover_state = VFIO_DEVICE_STATE_STOP;
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
+                                   recover_state);
+    if (ret) {
+        return ret;
+    }
+
+    do {
+        ret = vfio_save_block(f, vbasedev->migration);
+        if (ret < 0) {
+            return ret;
+        }
+    } while (!ret);
+
+    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        return ret;
+    }
+
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+                                   recover_state);
+    if (ret) {
+        return ret;
+    }
+
+    trace_vfio_save_complete_precopy(vbasedev->name);
+
+    return 0;
+}
+
 static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -593,6 +775,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
     }
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+                                   vbasedev->migration->device_state);
+}
+
 static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -620,6 +810,15 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
     return ret;
 }
 
+static int vfio_load_cleanup(void *opaque)
+{
+    VFIODevice *vbasedev = opaque;
+
+    vfio_migration_cleanup(vbasedev);
+    trace_vfio_load_cleanup(vbasedev->name);
+    return 0;
+}
+
 static int vfio_v1_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -662,7 +861,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
+                if (vbasedev->migration->v2) {
+                    ret = vfio_load_buffer(f, vbasedev, data_size);
+                } else {
+                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
+                }
                 if (ret < 0) {
                     return ret;
                 }
@@ -683,6 +886,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
+static SaveVMHandlers savevm_vfio_handlers = {
+    .save_setup = vfio_save_setup,
+    .save_cleanup = vfio_save_cleanup,
+    .save_live_complete_precopy = vfio_save_complete_precopy,
+    .save_state = vfio_save_state,
+    .load_setup = vfio_load_setup,
+    .load_cleanup = vfio_load_cleanup,
+    .load_state = vfio_load_state,
+};
+
 static SaveVMHandlers savevm_vfio_v1_handlers = {
     .save_setup = vfio_v1_save_setup,
     .save_cleanup = vfio_v1_save_cleanup,
@@ -697,6 +910,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
 
 /* ---------------------------------------------------------------------- */
 
+static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+{
+    VFIODevice *vbasedev = opaque;
+    enum vfio_device_mig_state new_state;
+    int ret;
+
+    if (running) {
+        new_state = VFIO_DEVICE_STATE_RUNNING;
+    } else {
+        new_state = VFIO_DEVICE_STATE_STOP;
+    }
+
+    ret = vfio_migration_set_state(vbasedev, new_state,
+                                   VFIO_DEVICE_STATE_ERROR);
+    if (ret) {
+        /*
+         * Migration should be aborted in this case, but vm_state_notify()
+         * currently does not support reporting failures.
+         */
+        if (migrate_get_current()->to_dst_file) {
+            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        }
+    }
+
+    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
+                              new_state);
+}
+
 static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
@@ -770,12 +1011,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        ret = vfio_migration_v1_set_state(vbasedev,
-                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                            VFIO_DEVICE_STATE_V1_RESUMING),
-                                          VFIO_DEVICE_STATE_V1_RUNNING);
-        if (ret) {
-            error_report("%s: Failed to set state RUNNING", vbasedev->name);
+        if (migration->v2) {
+            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+                                     VFIO_DEVICE_STATE_ERROR);
+        } else {
+            ret = vfio_migration_v1_set_state(vbasedev,
+                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                                VFIO_DEVICE_STATE_V1_RESUMING),
+                                              VFIO_DEVICE_STATE_V1_RUNNING);
+            if (ret) {
+                error_report("%s: Failed to set state RUNNING", vbasedev->name);
+            }
         }
     }
 }
@@ -784,12 +1030,35 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
-    vfio_region_exit(&migration->region);
-    vfio_region_finalize(&migration->region);
+    if (migration->v2) {
+        g_free(migration->data_buffer);
+    } else {
+        vfio_region_exit(&migration->region);
+        vfio_region_finalize(&migration->region);
+    }
     g_free(vbasedev->migration);
     vbasedev->migration = NULL;
 }
 
+static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
+{
+    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+                                  sizeof(struct vfio_device_feature_migration),
+                              sizeof(uint64_t))] = {};
+    struct vfio_device_feature *feature = (void *)buf;
+    struct vfio_device_feature_migration *mig = (void *)feature->data;
+
+    feature->argsz = sizeof(buf);
+    feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
+    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        return -EOPNOTSUPP;
+    }
+
+    *mig_flags = mig->flags;
+
+    return 0;
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev)
 {
     int ret;
@@ -798,6 +1067,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
     struct vfio_region_info *info = NULL;
+    uint64_t mig_flags;
 
     if (!vbasedev->ops->vfio_get_object) {
         return -EINVAL;
@@ -808,32 +1078,48 @@ static int vfio_migration_init(VFIODevice *vbasedev)
         return -EINVAL;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev,
-                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                   &info);
-    if (ret) {
-        return ret;
-    }
+    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
+    if (!ret) {
+        /* Migration v2 */
+        /* Basic migration functionality must be supported */
+        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+            return -EOPNOTSUPP;
+        }
+        vbasedev->migration = g_new0(VFIOMigration, 1);
+        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
+        vbasedev->migration->data_buffer =
+            g_malloc0(vbasedev->migration->data_buffer_size);
+        vbasedev->migration->data_fd = -1;
+        vbasedev->migration->v2 = true;
+    } else {
+        /* Migration v1 */
+        ret = vfio_get_dev_region_info(vbasedev,
+                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                       &info);
+        if (ret) {
+            return ret;
+        }
 
-    vbasedev->migration = g_new0(VFIOMigration, 1);
+        vbasedev->migration = g_new0(VFIOMigration, 1);
 
-    ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
-                            info->index, "migration");
-    if (ret) {
-        error_report("%s: Failed to setup VFIO migration region %d: %s",
-                     vbasedev->name, info->index, strerror(-ret));
-        goto err;
-    }
+        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
+                                info->index, "migration");
+        if (ret) {
+            error_report("%s: Failed to setup VFIO migration region %d: %s",
+                         vbasedev->name, info->index, strerror(-ret));
+            goto err;
+        }
 
-    if (!vbasedev->migration->region.size) {
-        error_report("%s: Invalid zero-sized VFIO migration region %d",
-                     vbasedev->name, info->index);
-        ret = -EINVAL;
-        goto err;
-    }
+        if (!vbasedev->migration->region.size) {
+            error_report("%s: Invalid zero-sized VFIO migration region %d",
+                         vbasedev->name, info->index);
+            ret = -EINVAL;
+            goto err;
+        }
 
-    g_free(info);
+        g_free(info);
+    }
 
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
@@ -846,11 +1132,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                         &savevm_vfio_v1_handlers, vbasedev);
+    if (migration->v2) {
+        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                             &savevm_vfio_handlers, vbasedev);
+
+        migration->vm_state = qdev_add_vm_change_state_handler(
+            vbasedev->dev, vfio_vmstate_change, vbasedev);
+    } else {
+        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
+                             &savevm_vfio_v1_handlers, vbasedev);
+
+        migration->vm_state = qdev_add_vm_change_state_handler(
+            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
+    }
 
-    migration->vm_state = qdev_add_vm_change_state_handler(
-        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
     return 0;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index ac8b04f52a..6e8c5958b9 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -163,6 +163,8 @@ vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
 vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bbaf72ba00..2ec3346fea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,11 @@ typedef struct VFIOMigration {
     int vm_running;
     Notifier migration_state;
     uint64_t pending_bytes;
+    enum vfio_device_mig_state device_state;
+    int data_fd;
+    void *data_buffer;
+    size_t data_buffer_size;
+    bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (6 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-09-19  8:35   ` liulongfang via
  2022-09-19  9:41   ` Philippe Mathieu-Daudé via
  2022-05-30 17:07 ` [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails Avihai Horon
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/common.c              |  19 +-
 hw/vfio/migration.c           | 698 +---------------------------------
 hw/vfio/trace-events          |   5 -
 include/hw/vfio/vfio-common.h |   5 -
 4 files changed, 24 insertions(+), 703 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5541133ec9..00c6cb0ffe 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
                 return false;
             }
 
-            if (!migration->v2 &&
-                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
-                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
-                return false;
-            }
-
-            if (migration->v2 &&
-                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
                 (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
                  migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
                 return false;
@@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (!migration->v2 &&
-                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
-                continue;
-            }
-
-            if (migration->v2 &&
-                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
-                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+                migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index de68eadb09..852759e6ca 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -121,220 +121,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
     return 0;
 }
 
-static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
-                                  off_t off, bool iswrite)
-{
-    int ret;
-
-    ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
-                    pread(vbasedev->fd, val, count, off);
-    if (ret < count) {
-        error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
-                     HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
-                     vbasedev->name, off, strerror(errno));
-        return (ret < 0) ? ret : -EINVAL;
-    }
-    return 0;
-}
-
-static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
-                       off_t off, bool iswrite)
-{
-    int ret, done = 0;
-    __u8 *tbuf = buf;
-
-    while (count) {
-        int bytes = 0;
-
-        if (count >= 8 && !(off % 8)) {
-            bytes = 8;
-        } else if (count >= 4 && !(off % 4)) {
-            bytes = 4;
-        } else if (count >= 2 && !(off % 2)) {
-            bytes = 2;
-        } else {
-            bytes = 1;
-        }
-
-        ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
-        if (ret) {
-            return ret;
-        }
-
-        count -= bytes;
-        done += bytes;
-        off += bytes;
-        tbuf += bytes;
-    }
-    return done;
-}
-
-#define vfio_mig_read(f, v, c, o)       vfio_mig_rw(f, (__u8 *)v, c, o, false)
-#define vfio_mig_write(f, v, c, o)      vfio_mig_rw(f, (__u8 *)v, c, o, true)
-
-#define VFIO_MIG_STRUCT_OFFSET(f)       \
-                                 offsetof(struct vfio_device_migration_info, f)
-/*
- * Change the device_state register for device @vbasedev. Bits set in @mask
- * are preserved, bits set in @value are set, and bits not set in either @mask
- * or @value are cleared in device_state. If the register cannot be accessed,
- * the resulting state would be invalid, or the device enters an error state,
- * an error is returned.
- */
-
-static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
-                                       uint32_t value)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    off_t dev_state_off = region->fd_offset +
-                          VFIO_MIG_STRUCT_OFFSET(device_state);
-    uint32_t device_state;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
-                        dev_state_off);
-    if (ret < 0) {
-        return ret;
-    }
-
-    device_state = (device_state & mask) | value;
-
-    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
-        return -EINVAL;
-    }
-
-    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
-                         dev_state_off);
-    if (ret < 0) {
-        int rret;
-
-        rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
-                             dev_state_off);
-
-        if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
-            hw_error("%s: Device in error state 0x%x", vbasedev->name,
-                     device_state);
-            return rret ? rret : -EIO;
-        }
-        return ret;
-    }
-
-    migration->device_state_v1 = device_state;
-    trace_vfio_migration_set_state(vbasedev->name, device_state);
-    return 0;
-}
-
-static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
-                                   uint64_t data_size, uint64_t *size)
-{
-    void *ptr = NULL;
-    uint64_t limit = 0;
-    int i;
-
-    if (!region->mmaps) {
-        if (size) {
-            *size = MIN(data_size, region->size - data_offset);
-        }
-        return ptr;
-    }
-
-    for (i = 0; i < region->nr_mmaps; i++) {
-        VFIOMmap *map = region->mmaps + i;
-
-        if ((data_offset >= map->offset) &&
-            (data_offset < map->offset + map->size)) {
-
-            /* check if data_offset is within sparse mmap areas */
-            ptr = map->mmap + data_offset - map->offset;
-            if (size) {
-                *size = MIN(data_size, map->offset + map->size - data_offset);
-            }
-            break;
-        } else if ((data_offset < map->offset) &&
-                   (!limit || limit > map->offset)) {
-            /*
-             * data_offset is not within sparse mmap areas, find size of
-             * non-mapped area. Check through all list since region->mmaps list
-             * is not sorted.
-             */
-            limit = map->offset;
-        }
-    }
-
-    if (!ptr && size) {
-        *size = limit ? MIN(data_size, limit - data_offset) : data_size;
-    }
-    return ptr;
-}
-
-static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    uint64_t data_offset = 0, data_size = 0, sz;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
-                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
-    if (ret < 0) {
-        return ret;
-    }
-
-    ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
-                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
-    if (ret < 0) {
-        return ret;
-    }
-
-    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
-                           migration->pending_bytes);
-
-    qemu_put_be64(f, data_size);
-    sz = data_size;
-
-    while (sz) {
-        void *buf;
-        uint64_t sec_size;
-        bool buf_allocated = false;
-
-        buf = get_data_section_size(region, data_offset, sz, &sec_size);
-
-        if (!buf) {
-            buf = g_try_malloc(sec_size);
-            if (!buf) {
-                error_report("%s: Error allocating buffer ", __func__);
-                return -ENOMEM;
-            }
-            buf_allocated = true;
-
-            ret = vfio_mig_read(vbasedev, buf, sec_size,
-                                region->fd_offset + data_offset);
-            if (ret < 0) {
-                g_free(buf);
-                return ret;
-            }
-        }
-
-        qemu_put_buffer(f, buf, sec_size);
-
-        if (buf_allocated) {
-            g_free(buf);
-        }
-        sz -= sec_size;
-        data_offset += sec_size;
-    }
-
-    ret = qemu_file_get_error(f);
-
-    if (!ret && size) {
-        *size = data_size;
-    }
-
-    bytes_transferred += data_size;
-    return ret;
-}
-
 static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
                             uint64_t data_size)
 {
@@ -351,96 +137,6 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
     return 0;
 }
 
-static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-                               uint64_t data_size)
-{
-    VFIORegion *region = &vbasedev->migration->region;
-    uint64_t data_offset = 0, size, report_size;
-    int ret;
-
-    do {
-        ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
-                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
-        if (ret < 0) {
-            return ret;
-        }
-
-        if (data_offset + data_size > region->size) {
-            /*
-             * If data_size is greater than the data section of migration region
-             * then iterate the write buffer operation. This case can occur if
-             * size of migration region at destination is smaller than size of
-             * migration region at source.
-             */
-            report_size = size = region->size - data_offset;
-            data_size -= size;
-        } else {
-            report_size = size = data_size;
-            data_size = 0;
-        }
-
-        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
-
-        while (size) {
-            void *buf;
-            uint64_t sec_size;
-            bool buf_alloc = false;
-
-            buf = get_data_section_size(region, data_offset, size, &sec_size);
-
-            if (!buf) {
-                buf = g_try_malloc(sec_size);
-                if (!buf) {
-                    error_report("%s: Error allocating buffer ", __func__);
-                    return -ENOMEM;
-                }
-                buf_alloc = true;
-            }
-
-            qemu_get_buffer(f, buf, sec_size);
-
-            if (buf_alloc) {
-                ret = vfio_mig_write(vbasedev, buf, sec_size,
-                        region->fd_offset + data_offset);
-                g_free(buf);
-
-                if (ret < 0) {
-                    return ret;
-                }
-            }
-            size -= sec_size;
-            data_offset += sec_size;
-        }
-
-        ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
-                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
-        if (ret < 0) {
-            return ret;
-        }
-    } while (data_size);
-
-    return 0;
-}
-
-static int vfio_update_pending(VFIODevice *vbasedev)
-{
-    VFIOMigration *migration = vbasedev->migration;
-    VFIORegion *region = &migration->region;
-    uint64_t pending_bytes = 0;
-    int ret;
-
-    ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
-                    region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
-    if (ret < 0) {
-        migration->pending_bytes = 0;
-        return ret;
-    }
-
-    migration->pending_bytes = pending_bytes;
-    trace_vfio_update_pending(vbasedev->name, pending_bytes);
-    return 0;
-}
-
 static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -493,15 +189,6 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
     migration->data_fd = -1;
 }
 
-static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
-{
-    VFIOMigration *migration = vbasedev->migration;
-
-    if (migration->region.mmaps) {
-        vfio_region_unmap(&migration->region);
-    }
-}
-
 /* ---------------------------------------------------------------------- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -516,49 +203,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     return qemu_file_get_error(f);
 }
 
-static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret;
-
-    trace_vfio_save_setup(vbasedev->name);
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
-
-    if (migration->region.mmaps) {
-        /*
-         * Calling vfio_region_mmap() from migration thread. Memory API called
-         * from this function require locking the iothread when called from
-         * outside the main loop thread.
-         */
-        qemu_mutex_lock_iothread();
-        ret = vfio_region_mmap(&migration->region);
-        qemu_mutex_unlock_iothread();
-        if (ret) {
-            error_report("%s: Failed to mmap VFIO migration region: %s",
-                         vbasedev->name, strerror(-ret));
-            error_report("%s: Falling back to slow path", vbasedev->name);
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                      VFIO_DEVICE_STATE_V1_SAVING);
-    if (ret) {
-        error_report("%s: Failed to set state SAVING", vbasedev->name);
-        return ret;
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    return 0;
-}
-
 static void vfio_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -567,82 +211,6 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_v1_save_cleanup(void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-
-    vfio_migration_v1_cleanup(vbasedev);
-    trace_vfio_save_cleanup(vbasedev->name);
-}
-
-static void vfio_save_pending(QEMUFile *f, void *opaque,
-                              uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret;
-
-    ret = vfio_update_pending(vbasedev);
-    if (ret) {
-        return;
-    }
-
-    *res_precopy_only += migration->pending_bytes;
-
-    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
-                            *res_postcopy_only, *res_compatible);
-}
-
-static int vfio_save_iterate(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint64_t data_size;
-    int ret;
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
-
-    if (migration->pending_bytes == 0) {
-        ret = vfio_update_pending(vbasedev);
-        if (ret) {
-            return ret;
-        }
-
-        if (migration->pending_bytes == 0) {
-            qemu_put_be64(f, 0);
-            qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-            /* indicates data finished, goto complete phase */
-            return 1;
-        }
-    }
-
-    ret = vfio_save_buffer(f, vbasedev, &data_size);
-    if (ret) {
-        error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
-                     strerror(errno));
-        return ret;
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    /*
-     * Reset pending_bytes as .save_live_pending is not called during savevm or
-     * snapshot case, in such case vfio_update_pending() at the start of this
-     * function updates pending_bytes.
-     */
-    migration->pending_bytes = 0;
-    trace_vfio_save_iterate(vbasedev->name, data_size);
-    return 0;
-}
-
 /* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
 static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
 {
@@ -706,62 +274,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint64_t data_size;
-    int ret;
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
-                                      VFIO_DEVICE_STATE_V1_SAVING);
-    if (ret) {
-        error_report("%s: Failed to set state STOP and SAVING",
-                     vbasedev->name);
-        return ret;
-    }
-
-    ret = vfio_update_pending(vbasedev);
-    if (ret) {
-        return ret;
-    }
-
-    while (migration->pending_bytes > 0) {
-        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
-        ret = vfio_save_buffer(f, vbasedev, &data_size);
-        if (ret < 0) {
-            error_report("%s: Failed to save buffer", vbasedev->name);
-            return ret;
-        }
-
-        if (data_size == 0) {
-            break;
-        }
-
-        ret = vfio_update_pending(vbasedev);
-        if (ret) {
-            return ret;
-        }
-    }
-
-    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
-
-    ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
-                                      0);
-    if (ret) {
-        error_report("%s: Failed to set state STOPPED", vbasedev->name);
-        return ret;
-    }
-
-    trace_vfio_save_complete_precopy(vbasedev->name);
-    return ret;
-}
-
 static void vfio_save_state(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -783,33 +295,6 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
                                    vbasedev->migration->device_state);
 }
 
-static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    int ret = 0;
-
-    if (migration->region.mmaps) {
-        ret = vfio_region_mmap(&migration->region);
-        if (ret) {
-            error_report("%s: Failed to mmap VFIO migration region %d: %s",
-                         vbasedev->name, migration->region.nr,
-                         strerror(-ret));
-            error_report("%s: Falling back to slow path", vbasedev->name);
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
-                                      VFIO_DEVICE_STATE_V1_RESUMING);
-    if (ret) {
-        error_report("%s: Failed to set state RESUMING", vbasedev->name);
-        if (migration->region.mmaps) {
-            vfio_region_unmap(&migration->region);
-        }
-    }
-    return ret;
-}
-
 static int vfio_load_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
@@ -819,15 +304,6 @@ static int vfio_load_cleanup(void *opaque)
     return 0;
 }
 
-static int vfio_v1_load_cleanup(void *opaque)
-{
-    VFIODevice *vbasedev = opaque;
-
-    vfio_migration_v1_cleanup(vbasedev);
-    trace_vfio_load_cleanup(vbasedev->name);
-    return 0;
-}
-
 static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
 {
     VFIODevice *vbasedev = opaque;
@@ -861,11 +337,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
             uint64_t data_size = qemu_get_be64(f);
 
             if (data_size) {
-                if (vbasedev->migration->v2) {
-                    ret = vfio_load_buffer(f, vbasedev, data_size);
-                } else {
-                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
-                }
+                ret = vfio_load_buffer(f, vbasedev, data_size);
                 if (ret < 0) {
                     return ret;
                 }
@@ -896,18 +368,6 @@ static SaveVMHandlers savevm_vfio_handlers = {
     .load_state = vfio_load_state,
 };
 
-static SaveVMHandlers savevm_vfio_v1_handlers = {
-    .save_setup = vfio_v1_save_setup,
-    .save_cleanup = vfio_v1_save_cleanup,
-    .save_live_pending = vfio_save_pending,
-    .save_live_iterate = vfio_save_iterate,
-    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
-    .save_state = vfio_save_state,
-    .load_setup = vfio_v1_load_setup,
-    .load_cleanup = vfio_v1_load_cleanup,
-    .load_state = vfio_load_state,
-};
-
 /* ---------------------------------------------------------------------- */
 
 static void vfio_vmstate_change(void *opaque, bool running, RunState state)
@@ -938,70 +398,12 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
                               new_state);
 }
 
-static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
-{
-    VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
-    uint32_t value, mask;
-    int ret;
-
-    if (vbasedev->migration->vm_running == running) {
-        return;
-    }
-
-    if (running) {
-        /*
-         * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
-         * Transition from _SAVING to _RUNNING can happen if there is migration
-         * failure, in that case clear _SAVING bit.
-         * Transition from _RESUMING to _RUNNING occurs during resuming
-         * phase, in that case clear _RESUMING bit.
-         * In both the above cases, set _RUNNING bit.
-         */
-        mask = ~VFIO_DEVICE_STATE_MASK;
-        value = VFIO_DEVICE_STATE_V1_RUNNING;
-    } else {
-        /*
-         * Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
-         * _RUNNING bit
-         */
-        mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
-
-        /*
-         * When VM state transition to stop for savevm command, device should
-         * start saving data.
-         */
-        if (state == RUN_STATE_SAVE_VM) {
-            value = VFIO_DEVICE_STATE_V1_SAVING;
-        } else {
-            value = 0;
-        }
-    }
-
-    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
-    if (ret) {
-        /*
-         * Migration should be aborted in this case, but vm_state_notify()
-         * currently does not support reporting failures.
-         */
-        error_report("%s: Failed to set device state 0x%x", vbasedev->name,
-                     (migration->device_state_v1 & mask) | value);
-        if (migrate_get_current()->to_dst_file) {
-            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
-        }
-    }
-    vbasedev->migration->vm_running = running;
-    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-            (migration->device_state_v1 & mask) | value);
-}
-
 static void vfio_migration_state_notifier(Notifier *notifier, void *data)
 {
     MigrationState *s = data;
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
                                             migration_state);
     VFIODevice *vbasedev = migration->vbasedev;
-    int ret;
 
     trace_vfio_migration_state_notifier(vbasedev->name,
                                         MigrationStatus_str(s->state));
@@ -1011,31 +413,14 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
-        if (migration->v2) {
-            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
-                                     VFIO_DEVICE_STATE_ERROR);
-        } else {
-            ret = vfio_migration_v1_set_state(vbasedev,
-                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
-                                                VFIO_DEVICE_STATE_V1_RESUMING),
-                                              VFIO_DEVICE_STATE_V1_RUNNING);
-            if (ret) {
-                error_report("%s: Failed to set state RUNNING", vbasedev->name);
-            }
-        }
+        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+                                 VFIO_DEVICE_STATE_ERROR);
     }
 }
 
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
-    VFIOMigration *migration = vbasedev->migration;
-
-    if (migration->v2) {
-        g_free(migration->data_buffer);
-    } else {
-        vfio_region_exit(&migration->region);
-        vfio_region_finalize(&migration->region);
-    }
+    g_free(vbasedev->migration->data_buffer);
     g_free(vbasedev->migration);
     vbasedev->migration = NULL;
 }
@@ -1066,7 +451,6 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     VFIOMigration *migration;
     char id[256] = "";
     g_autofree char *path = NULL, *oid = NULL;
-    struct vfio_region_info *info = NULL;
     uint64_t mig_flags;
 
     if (!vbasedev->ops->vfio_get_object) {
@@ -1079,48 +463,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
 
     ret = vfio_migration_query_flags(vbasedev, &mig_flags);
-    if (!ret) {
-        /* Migration v2 */
-        /* Basic migration functionality must be supported */
-        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
-            return -EOPNOTSUPP;
-        }
-        vbasedev->migration = g_new0(VFIOMigration, 1);
-        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
-        vbasedev->migration->data_buffer =
-            g_malloc0(vbasedev->migration->data_buffer_size);
-        vbasedev->migration->data_fd = -1;
-        vbasedev->migration->v2 = true;
-    } else {
-        /* Migration v1 */
-        ret = vfio_get_dev_region_info(vbasedev,
-                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-                                       &info);
-        if (ret) {
-            return ret;
-        }
-
-        vbasedev->migration = g_new0(VFIOMigration, 1);
-
-        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
-                                info->index, "migration");
-        if (ret) {
-            error_report("%s: Failed to setup VFIO migration region %d: %s",
-                         vbasedev->name, info->index, strerror(-ret));
-            goto err;
-        }
-
-        if (!vbasedev->migration->region.size) {
-            error_report("%s: Invalid zero-sized VFIO migration region %d",
-                         vbasedev->name, info->index);
-            ret = -EINVAL;
-            goto err;
-        }
+    if (ret) {
+        return ret;
+    }
 
-        g_free(info);
+    /* Basic migration functionality must be supported */
+    if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+        return -EOPNOTSUPP;
     }
 
+    vbasedev->migration = g_new0(VFIOMigration, 1);
+    vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
+    vbasedev->migration->data_buffer =
+        g_malloc0(vbasedev->migration->data_buffer_size);
+    vbasedev->migration->data_fd = -1;
     migration = vbasedev->migration;
     migration->vbasedev = vbasedev;
 
@@ -1132,28 +488,16 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     }
     strpadcpy(id, sizeof(id), path, '\0');
 
-    if (migration->v2) {
-        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                             &savevm_vfio_handlers, vbasedev);
-
-        migration->vm_state = qdev_add_vm_change_state_handler(
-            vbasedev->dev, vfio_vmstate_change, vbasedev);
-    } else {
-        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
-                             &savevm_vfio_v1_handlers, vbasedev);
-
-        migration->vm_state = qdev_add_vm_change_state_handler(
-            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
-    }
+    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
+                         vbasedev);
 
+    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
+                                                           vfio_vmstate_change,
+                                                           vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
-    return 0;
 
-err:
-    g_free(info);
-    vfio_migration_exit(vbasedev);
-    return ret;
+    return 0;
 }
 
 /* ---------------------------------------------------------------------- */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 6e8c5958b9..a24ea7d8b0 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -154,15 +154,10 @@ vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
 vfio_save_setup(const char *name) " (%s)"
 vfio_save_cleanup(const char *name) " (%s)"
-vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
-vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
-vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
 vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
 vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
 vfio_load_cleanup(const char *name) " (%s)"
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2ec3346fea..76d470178f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -61,16 +61,11 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
     struct VFIODevice *vbasedev;
     VMChangeStateEntry *vm_state;
-    VFIORegion region;
-    uint32_t device_state_v1;
-    int vm_running;
     Notifier migration_state;
-    uint64_t pending_bytes;
     enum vfio_device_mig_state device_state;
     int data_fd;
     void *data_buffer;
     size_t data_buffer_size;
-    bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (7 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-10-11  1:41   ` liulongfang via
  2022-05-30 17:07 ` [PATCH v2 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

If vfio_migration_set_state() fails to set the device in the requested
state it tries to put it in a recover state. If setting the device in
the recover state fails as well, hw_error is triggered and the VM is
aborted.

To improve user experience and avoid VM data loss, reset the device with
VFIO_RESET_DEVICE instead of aborting the VM.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 852759e6ca..6c34502611 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
         /* Try to put the device in some good state */
         mig_state->device_state = recover_state;
         if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-            hw_error("%s: Device in error state, can't recover",
-                     vbasedev->name);
+            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+                hw_error("%s: Device in error state, can't recover",
+                         vbasedev->name);
+            }
+
+            error_report(
+                "%s: Device was reset due to failure in changing device state to recover state %s",
+                vbasedev->name, mig_state_to_str(recover_state));
+
+            return -1;
         }
 
         error_report("%s: Failed changing device state to %s", vbasedev->name,
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/11] vfio: Alphabetize migration section of VFIO trace-events file
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (8 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-05-30 17:07 ` [PATCH v2 11/11] docs/devel: Align vfio-migration docs to VFIO migration v2 Avihai Horon
  2022-06-07 17:44 ` [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/trace-events | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a24ea7d8b0..d3cba59bfd 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
+vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
@@ -148,18 +150,16 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
+vfio_load_cleanup(const char *name) " (%s)"
+vfio_load_device_config_state(const char *name) " (%s)"
+vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
 vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
-vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
-vfio_save_setup(const char *name) " (%s)"
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
-vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_complete_precopy(const char *name) " (%s)"
-vfio_load_device_config_state(const char *name) " (%s)"
-vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
-vfio_load_cleanup(const char *name) " (%s)"
-vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
-vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
-vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
+vfio_save_device_config_state(const char *name) " (%s)"
+vfio_save_setup(const char *name) " (%s)"
+vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 11/11] docs/devel: Align vfio-migration docs to VFIO migration v2
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (9 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
@ 2022-05-30 17:07 ` Avihai Horon
  2022-06-07 17:44 ` [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  11 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:07 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta, Avihai Horon

Align the vfio-migration documentation to VFIO migration protocol v2.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 docs/devel/vfio-migration.rst | 77 +++++++++++++++--------------------
 1 file changed, 33 insertions(+), 44 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..09744af5a6 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,46 +7,35 @@ the guest is running on source host and restoring this saved state on the
 destination host. This document details how saving and restoring of VFIO
 devices is done in QEMU.
 
-Migration of VFIO devices consists of two phases: the optional pre-copy phase,
-and the stop-and-copy phase. The pre-copy phase is iterative and allows to
-accommodate VFIO devices that have a large amount of data that needs to be
-transferred. The iterative pre-copy phase of migration allows for the guest to
-continue whilst the VFIO device state is transferred to the destination, this
-helps to reduce the total downtime of the VM. VFIO devices can choose to skip
-the pre-copy phase of migration by returning pending_bytes as zero during the
-pre-copy phase.
+Migration of VFIO devices currently consists of a single stop-and-copy phase.
+During the stop-and-copy phase the guest is stopped and the entire VFIO device
+data is transferred to the destination.
+
+The pre-copy phase of migration is currently not supported for VFIO devices,
+so VFIO device data is not transferred during pre-copy phase.
 
 A detailed description of the UAPI for VFIO device migration can be found in
-the comment for the ``vfio_device_migration_info`` structure in the header
-file linux-headers/linux/vfio.h.
+the comment for the ``vfio_device_mig_state`` structure in the header file
+linux-headers/linux/vfio.h.
 
 VFIO implements the device hooks for the iterative approach as follows:
 
-* A ``save_setup`` function that sets up the migration region and sets _SAVING
-  flag in the VFIO device state.
-
-* A ``load_setup`` function that sets up the migration region on the
-  destination and sets _RESUMING flag in the VFIO device state.
-
-* A ``save_live_pending`` function that reads pending_bytes from the vendor
-  driver, which indicates the amount of data that the vendor driver has yet to
-  save for the VFIO device.
+* A ``save_setup`` function that sets up migration on the source.
 
-* A ``save_live_iterate`` function that reads the VFIO device's data from the
-  vendor driver through the migration region during iterative phase.
+* A ``load_setup`` function that sets the VFIO device on the destination in
+  _RESUMING state.
 
 * A ``save_state`` function to save the device config space if it is present.
 
-* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
-  VFIO device state and iteratively copies the remaining data for the VFIO
-  device until the vendor driver indicates that no data remains (pending bytes
-  is zero).
+* A ``save_live_complete_precopy`` function that sets the VFIO device in
+  _STOP_COPY state and iteratively copies the data for the VFIO device until
+  the vendor driver indicates that no data remains.
 
 * A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above
+  sections that are generated by the save functions above.
 
 * ``cleanup`` functions for both save and load that perform any migration
-  related cleanup, including unmapping the migration region
+  related cleanup.
 
 
 The VFIO migration code uses a VM state change handler to change the VFIO
@@ -71,13 +60,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver
 can also be written by the device. There is currently no device or IOMMU
 support for dirty page tracking in hardware.
 
-By default, dirty pages are tracked when the device is in pre-copy as well as
-stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
-the destination in both phases. Copying dirty pages in pre-copy phase helps
-QEMU to predict if it can achieve its downtime tolerances. If QEMU during
-pre-copy phase keeps finding dirty pages continuously, then it understands
-that even in stop-and-copy phase, it is likely to find dirty pages and can
-predict the downtime accordingly.
+By default, dirty pages are tracked during pre-copy as well as stop-and-copy
+phase. So, a page pinned by the vendor driver will be copied to the destination
+in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
+it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
+finding dirty pages continuously, then it understands that even in stop-and-copy
+phase, it is likely to find dirty pages and can predict the downtime
+accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables querying the dirty bitmap during pre-copy phase. If it is set to
@@ -111,23 +100,23 @@ Live migration save path
                                   |
                      migrate_init spawns migration_thread
                 Migration thread then calls each device's .save_setup()
-                    (RUNNING, _SETUP, _RUNNING|_SAVING)
+                       (RUNNING, _SETUP, _RUNNING)
                                   |
-                    (RUNNING, _ACTIVE, _RUNNING|_SAVING)
-             If device is active, get pending_bytes by .save_live_pending()
-          If total pending_bytes >= threshold_size, call .save_live_iterate()
-                  Data of VFIO device for pre-copy phase is copied
+                      (RUNNING, _ACTIVE, _RUNNING)
+         Migration thread calls each .save_live_pending() handler
+  If total pending_bytes >= threshold_size, call each .save_live_iterate() handler
+          Data of this iteration for pre-copy phase is copied
         Iterate till total pending bytes converge and are less than threshold
                                   |
   On migration completion, vCPU stops and calls .save_live_complete_precopy for
-   each active device. The VFIO device is then transitioned into _SAVING state
-                   (FINISH_MIGRATE, _DEVICE, _SAVING)
+  each active device. The VFIO device is then transitioned into _STOP_COPY state
+                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
                                   |
      For the VFIO device, iterate in .save_live_complete_precopy until
                          pending data is 0
-                   (FINISH_MIGRATE, _DEVICE, _STOPPED)
+                   (FINISH_MIGRATE, _DEVICE, _STOP)
                                   |
-                 (FINISH_MIGRATE, _COMPLETED, _STOPPED)
+                 (FINISH_MIGRATE, _COMPLETED, _STOP)
              Migraton thread schedules cleanup bottom half and exits
 
 Live migration resume path
@@ -136,7 +125,7 @@ Live migration resume path
 ::
 
               Incoming migration calls .load_setup for each device
-                       (RESTORE_VM, _ACTIVE, _STOPPED)
+                       (RESTORE_VM, _ACTIVE, _STOP)
                                  |
        For each device, .load_state is called for that device section data
                        (RESTORE_VM, _ACTIVE, _RESUMING)
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported
  2022-05-30 17:07 ` [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported Avihai Horon
@ 2022-05-30 17:12   ` Avihai Horon
  2022-06-07 17:53     ` Avihai Horon
  0 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-05-30 17:12 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 5/30/2022 8:07 PM, Avihai Horon wrote:
> Currently, if IOMMU of a VFIO container doesn't support dirty page
> tracking, migration is blocked completely. This is because a DMA-able
> VFIO device can dirty RAM pages without updating QEMU about it, thus
> breaking the migration.
>
> However, this doesn't mean that migration can't be done at all. If
> migration pre-copy phase is skipped, the VFIO device doesn't have a
> chance to dirty RAM pages that have been migrated already, thus
> eliminating the problem previously mentioned.
>
> Hence, in such case allow migration but skip pre-copy phase.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   hw/vfio/migration.c   | 9 ++++++++-
>   migration/migration.c | 5 +++++
>   migration/migration.h | 3 +++
>   3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 34f9f894ed..d8f9b086c2 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>       struct vfio_region_info *info = NULL;
>       int ret = -ENOTSUP;
>   
> -    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
> +    if (!vbasedev->enable_migration) {
>           goto add_blocker;
>       }
>   
> +    if (!container->dirty_pages_supported) {
> +        warn_report_once(
> +            "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped",
> +            vbasedev->name);
> +        migrate_get_current()->skip_precopy = true;
> +    }
> +
>       ret = vfio_get_dev_region_info(vbasedev,
>                                      VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>                                      VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> diff --git a/migration/migration.c b/migration/migration.c
> index 31739b2af9..217f0e3e94 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3636,6 +3636,11 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>       uint64_t pending_size, pend_pre, pend_compat, pend_post;
>       bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>   
> +    if (s->skip_precopy) {
> +        migration_completion(s);
> +        return MIG_ITERATE_BREAK;
> +    }
> +
>       qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
>                                 &pend_compat, &pend_post);
>       pending_size = pend_pre + pend_compat + pend_post;
> diff --git a/migration/migration.h b/migration/migration.h
> index 485d58b95f..0920a0950e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -332,6 +332,9 @@ struct MigrationState {
>        * This save hostname when out-going migration starts
>        */
>       char *hostname;
> +
> +    /* Whether to skip pre-copy phase of migration or not */
> +    bool skip_precopy;
>   };
>   
>   void migrate_set_state(int *state, int old_state, int new_state);

This patch still has the problem that it doesn't respect configured 
downtime limit.

Maybe adding an option to set "no downtime limit" will solve it?
Then we can allow migration with VFIO device that doesn't support dirty 
tracking only if this option is set.
Can we use migration param downtime_limit with value 0 to mark "no 
downtime limit"? Does it make sense?

Do you have other ideas how to solve this issue?

Thanks!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
                   ` (10 preceding siblings ...)
  2022-05-30 17:07 ` [PATCH v2 11/11] docs/devel: Align vfio-migration docs to VFIO migration v2 Avihai Horon
@ 2022-06-07 17:44 ` Avihai Horon
  2022-06-07 21:32   ` Alex Williamson
  11 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-06-07 17:44 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 5/30/2022 8:07 PM, Avihai Horon wrote:
> Hello,
>
> Following VFIO migration protocol v2 acceptance in kernel, this series
> implements VFIO migration according to the new v2 protocol and replaces
> the now deprecated v1 implementation.
>
> The main differences between v1 and v2 migration protocols are:
> 1. VFIO device state is represented as a finite state machine instead of
>     a bitmap.
>
> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>     ioctl and normal read() and write() instead of the migration region
>     used in v1.
>
> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>     migration.
>
> Full description of the v2 protocol and the differences from v1 can be
> found here [1].
>
> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> that will be used later.
>
> Patches 4-6 refactor v1 protocol code to make it easier to add v2
> protocol.
>
> Patches 7-11 implement v2 protocol and remove v1 protocol.
>
> Thanks.
>
> [1]
> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>
> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> - Split the big patch that replaced v1 with v2 into several patches as
>    suggested by Joao, to make review easier.
> - Change warn_report to warn_report_once when container doesn't support
>    dirty tracking.
> - Add Reviewed-by tag.
>
> Avihai Horon (11):
>    vfio/migration: Fix NULL pointer dereference bug
>    vfio/migration: Skip pre-copy if dirty page tracking is not supported
>    migration/qemu-file: Add qemu_file_get_to_fd()
>    vfio/common: Change vfio_devices_all_running_and_saving() logic to
>      equivalent one
>    vfio/migration: Move migration v1 logic to vfio_migration_init()
>    vfio/migration: Rename functions/structs related to v1 protocol
>    vfio/migration: Implement VFIO migration protocol v2
>    vfio/migration: Remove VFIO migration protocol v1
>    vfio/migration: Reset device if setting recover state fails
>    vfio: Alphabetize migration section of VFIO trace-events file
>    docs/devel: Align vfio-migration docs to VFIO migration v2
>
>   docs/devel/vfio-migration.rst |  77 ++--
>   hw/vfio/common.c              |  21 +-
>   hw/vfio/migration.c           | 640 ++++++++--------------------------
>   hw/vfio/trace-events          |  25 +-
>   include/hw/vfio/vfio-common.h |   8 +-
>   migration/migration.c         |   5 +
>   migration/migration.h         |   3 +
>   migration/qemu-file.c         |  34 ++
>   migration/qemu-file.h         |   1 +
>   9 files changed, 252 insertions(+), 562 deletions(-)
>
Ping.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported
  2022-05-30 17:12   ` Avihai Horon
@ 2022-06-07 17:53     ` Avihai Horon
  0 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-06-07 17:53 UTC (permalink / raw)
  To: qemu-devel, Cornelia Huck, Alex Williamson, Juan Quintela,
	Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 5/30/2022 8:12 PM, Avihai Horon wrote:
>
> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>> Currently, if IOMMU of a VFIO container doesn't support dirty page
>> tracking, migration is blocked completely. This is because a DMA-able
>> VFIO device can dirty RAM pages without updating QEMU about it, thus
>> breaking the migration.
>>
>> However, this doesn't mean that migration can't be done at all. If
>> migration pre-copy phase is skipped, the VFIO device doesn't have a
>> chance to dirty RAM pages that have been migrated already, thus
>> eliminating the problem previously mentioned.
>>
>> Hence, in such case allow migration but skip pre-copy phase.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   hw/vfio/migration.c   | 9 ++++++++-
>>   migration/migration.c | 5 +++++
>>   migration/migration.h | 3 +++
>>   3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34f9f894ed..d8f9b086c2 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
>> Error **errp)
>>       struct vfio_region_info *info = NULL;
>>       int ret = -ENOTSUP;
>>   -    if (!vbasedev->enable_migration || 
>> !container->dirty_pages_supported) {
>> +    if (!vbasedev->enable_migration) {
>>           goto add_blocker;
>>       }
>>   +    if (!container->dirty_pages_supported) {
>> +        warn_report_once(
>> +            "%s: IOMMU of the device's VFIO container doesn't 
>> support dirty page tracking, migration pre-copy phase will be skipped",
>> +            vbasedev->name);
>> +        migrate_get_current()->skip_precopy = true;
>> +    }
>> +
>>       ret = vfio_get_dev_region_info(vbasedev,
>> VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 31739b2af9..217f0e3e94 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3636,6 +3636,11 @@ static MigIterateState 
>> migration_iteration_run(MigrationState *s)
>>       uint64_t pending_size, pend_pre, pend_compat, pend_post;
>>       bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>>   +    if (s->skip_precopy) {
>> +        migration_completion(s);
>> +        return MIG_ITERATE_BREAK;
>> +    }
>> +
>>       qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, 
>> &pend_pre,
>>                                 &pend_compat, &pend_post);
>>       pending_size = pend_pre + pend_compat + pend_post;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 485d58b95f..0920a0950e 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -332,6 +332,9 @@ struct MigrationState {
>>        * This save hostname when out-going migration starts
>>        */
>>       char *hostname;
>> +
>> +    /* Whether to skip pre-copy phase of migration or not */
>> +    bool skip_precopy;
>>   };
>>     void migrate_set_state(int *state, int old_state, int new_state);
>
> This patch still has the problem that it doesn't respect configured 
> downtime limit.
>
> Maybe adding an option to set "no downtime limit" will solve it?
> Then we can allow migration with VFIO device that doesn't support 
> dirty tracking only if this option is set.
> Can we use migration param downtime_limit with value 0 to mark "no 
> downtime limit"? Does it make sense?
>
> Do you have other ideas how to solve this issue?
>
What about letting QEMU mark all RAM dirty instead of kernel? Same 
effect but no need for kernel support.
Is this a reasonable approach?

Thanks.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-07 17:44 ` [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2022-06-07 21:32   ` Alex Williamson
  2022-06-13 11:21     ` Avihai Horon
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2022-06-07 21:32 UTC (permalink / raw)
  To: Avihai Horon
  Cc: qemu-devel, Cornelia Huck, Juan Quintela,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Mark Bloch, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta

On Tue, 7 Jun 2022 20:44:23 +0300
Avihai Horon <avihaih@nvidia.com> wrote:

> On 5/30/2022 8:07 PM, Avihai Horon wrote:
> > Hello,
> >
> > Following VFIO migration protocol v2 acceptance in kernel, this series
> > implements VFIO migration according to the new v2 protocol and replaces
> > the now deprecated v1 implementation.
> >
> > The main differences between v1 and v2 migration protocols are:
> > 1. VFIO device state is represented as a finite state machine instead of
> >     a bitmap.
> >
> > 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
> >     ioctl and normal read() and write() instead of the migration region
> >     used in v1.
> >
> > 3. Migration protocol v2 currently doesn't support the pre-copy phase of
> >     migration.
> >
> > Full description of the v2 protocol and the differences from v1 can be
> > found here [1].
> >
> > Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> > that will be used later.
> >
> > Patches 4-6 refactor v1 protocol code to make it easier to add v2
> > protocol.
> >
> > Patches 7-11 implement v2 protocol and remove v1 protocol.
> >
> > Thanks.
> >
> > [1]
> > https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
> >
> > Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> > - Split the big patch that replaced v1 with v2 into several patches as
> >    suggested by Joao, to make review easier.
> > - Change warn_report to warn_report_once when container doesn't support
> >    dirty tracking.
> > - Add Reviewed-by tag.
> >
> > Avihai Horon (11):
> >    vfio/migration: Fix NULL pointer dereference bug
> >    vfio/migration: Skip pre-copy if dirty page tracking is not supported
> >    migration/qemu-file: Add qemu_file_get_to_fd()
> >    vfio/common: Change vfio_devices_all_running_and_saving() logic to
> >      equivalent one
> >    vfio/migration: Move migration v1 logic to vfio_migration_init()
> >    vfio/migration: Rename functions/structs related to v1 protocol
> >    vfio/migration: Implement VFIO migration protocol v2
> >    vfio/migration: Remove VFIO migration protocol v1
> >    vfio/migration: Reset device if setting recover state fails
> >    vfio: Alphabetize migration section of VFIO trace-events file
> >    docs/devel: Align vfio-migration docs to VFIO migration v2
> >
> >   docs/devel/vfio-migration.rst |  77 ++--
> >   hw/vfio/common.c              |  21 +-
> >   hw/vfio/migration.c           | 640 ++++++++--------------------------
> >   hw/vfio/trace-events          |  25 +-
> >   include/hw/vfio/vfio-common.h |   8 +-
> >   migration/migration.c         |   5 +
> >   migration/migration.h         |   3 +
> >   migration/qemu-file.c         |  34 ++
> >   migration/qemu-file.h         |   1 +
> >   9 files changed, 252 insertions(+), 562 deletions(-)
> >  
> Ping.

Based on the changelog, this seems like a mostly cosmetic spin and I
don't see that all of the discussion threads from v1 were resolved to
everyone's satisfaction.  I'm certainly still uncomfortable with the
pre-copy behavior and I thought there were still some action items to
figure out whether an SLA is present and vet the solution with
management tools.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-07 21:32   ` Alex Williamson
@ 2022-06-13 11:21     ` Avihai Horon
  2022-06-17 21:51       ` Alex Williamson
  0 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-06-13 11:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, Cornelia Huck, Juan Quintela,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Mark Bloch, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta


On 6/8/2022 12:32 AM, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, 7 Jun 2022 20:44:23 +0300
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>>> Hello,
>>>
>>> Following VFIO migration protocol v2 acceptance in kernel, this series
>>> implements VFIO migration according to the new v2 protocol and replaces
>>> the now deprecated v1 implementation.
>>>
>>> The main differences between v1 and v2 migration protocols are:
>>> 1. VFIO device state is represented as a finite state machine instead of
>>>      a bitmap.
>>>
>>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>>>      ioctl and normal read() and write() instead of the migration region
>>>      used in v1.
>>>
>>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>>>      migration.
>>>
>>> Full description of the v2 protocol and the differences from v1 can be
>>> found here [1].
>>>
>>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
>>> that will be used later.
>>>
>>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
>>> protocol.
>>>
>>> Patches 7-11 implement v2 protocol and remove v1 protocol.
>>>
>>> Thanks.
>>>
>>> [1]
>>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>>>
>>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
>>> - Split the big patch that replaced v1 with v2 into several patches as
>>>     suggested by Joao, to make review easier.
>>> - Change warn_report to warn_report_once when container doesn't support
>>>     dirty tracking.
>>> - Add Reviewed-by tag.
>>>
>>> Avihai Horon (11):
>>>     vfio/migration: Fix NULL pointer dereference bug
>>>     vfio/migration: Skip pre-copy if dirty page tracking is not supported
>>>     migration/qemu-file: Add qemu_file_get_to_fd()
>>>     vfio/common: Change vfio_devices_all_running_and_saving() logic to
>>>       equivalent one
>>>     vfio/migration: Move migration v1 logic to vfio_migration_init()
>>>     vfio/migration: Rename functions/structs related to v1 protocol
>>>     vfio/migration: Implement VFIO migration protocol v2
>>>     vfio/migration: Remove VFIO migration protocol v1
>>>     vfio/migration: Reset device if setting recover state fails
>>>     vfio: Alphabetize migration section of VFIO trace-events file
>>>     docs/devel: Align vfio-migration docs to VFIO migration v2
>>>
>>>    docs/devel/vfio-migration.rst |  77 ++--
>>>    hw/vfio/common.c              |  21 +-
>>>    hw/vfio/migration.c           | 640 ++++++++--------------------------
>>>    hw/vfio/trace-events          |  25 +-
>>>    include/hw/vfio/vfio-common.h |   8 +-
>>>    migration/migration.c         |   5 +
>>>    migration/migration.h         |   3 +
>>>    migration/qemu-file.c         |  34 ++
>>>    migration/qemu-file.h         |   1 +
>>>    9 files changed, 252 insertions(+), 562 deletions(-)
>>>
>> Ping.
> Based on the changelog, this seems like a mostly cosmetic spin and I
> don't see that all of the discussion threads from v1 were resolved to
> everyone's satisfaction.  I'm certainly still uncomfortable with the
> pre-copy behavior and I thought there were still some action items to
> figure out whether an SLA is present and vet the solution with
> management tools.  Thanks,

Yes.
OK, so let's clear things up and reach an agreement before I prepare the 
v3 series.

There are three topics that came up in previous discussion:

 1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
    Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
    outside migration thread.
    This code existed before and I fixed a NULL pointer dereference that
    I encountered.
    I suggested that later we can refactor VMChangeStateHandler to
    return error.
    I prefer not to do this refactor right now because I am not sure
    it's as straightforward change as it might seem - if some notifier
    fails and we abort do_vm_stop/vm_prepare_start in the middle, can
    this leave the VM in some unstable state?
    We plan to leave it as is and not do the refactor as part of this
    series.
    Are you ok with this?

 2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page
    tracking is not supported.
    As previously discussed, this patch doesn't consider the configured
    downtime limit.
    One way to fix it is to allow such migration only when "no SLA" (no
    downtime limit) is set. AFAIK today there is no way that one can set
    "no SLA".
    If we go with this option, we change normal flow of migration
    (skipping pre-copy) and might need to change management tools.

Instead, what about letting QEMU VFIO code mark all pages dirty (instead 
of kernel)?
This way we don’t skip pre-copy and we get the same behavior we have now 
of perpetual dirtying all RAM, which respects SLA.
If we go with this option, do we need to block migration when IOMMU is 
sPAPR TCE?
Until now migration would be blocked because sPAPR TCE doesn't report 
dirty_pages_supported cap, but going with this option we will allow 
migration even when dirty_pages_supported cap is not set (and let QEMU 
dirty all pages).

 3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
    Juan expressed his concern about the amount of data that will go
    through main migration thread.

This is already the case in v1 protocol - VFIO devices send all their 
data in the main migration thread. Note that like in v1 protocol, here 
as well the data is sent in small sized chunks, each with a header.
This patch just aims to eliminate an extra copy.

We plan to leave it as is. Is this ok?

Thanks.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
@ 2022-06-14 11:08   ` Joao Martins
  2022-06-14 16:34     ` Avihai Horon
  2022-07-18 15:12   ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Joao Martins @ 2022-06-14 11:08 UTC (permalink / raw)
  To: Avihai Horon
  Cc: Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Cornelia Huck, qemu-devel,
	Alex Williamson, Dr . David Alan Gilbert, Juan Quintela

On 5/30/22 18:07, Avihai Horon wrote:
> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    enum vfio_device_mig_state recover_state;
> +    int ret;
> +
> +    /* We reach here with device state STOP or STOP_COPY only */
> +    recover_state = VFIO_DEVICE_STATE_STOP;
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> +                                   recover_state);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    do {
> +        ret = vfio_save_block(f, vbasedev->migration);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    } while (!ret);
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
> +                                   recover_state);

Is it expected that you are setting VFIO_DEVICE_STATE_STOP while
@recover_state is the same value (VFIO_DEVICE_STATE_STOP) ?

> +    if (ret) {
> +        return ret;
> +    }
> +
> +    trace_vfio_save_complete_precopy(vbasedev->name);
> +
> +    return 0;
> +}
> +
>  static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -593,6 +775,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>      }
>  }
>  
> +static int vfio_load_setup(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> +                                   vbasedev->migration->device_state);
> +}
> +
>  static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -620,6 +810,15 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>      return ret;
>  }
>  
> +static int vfio_load_cleanup(void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +
> +    vfio_migration_cleanup(vbasedev);
> +    trace_vfio_load_cleanup(vbasedev->name);
> +    return 0;
> +}
> +
>  static int vfio_v1_load_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -662,7 +861,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>              uint64_t data_size = qemu_get_be64(f);
>  
>              if (data_size) {
> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> +                if (vbasedev->migration->v2) {
> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
> +                } else {
> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> +                }
>                  if (ret < 0) {
>                      return ret;
>                  }
> @@ -683,6 +886,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>      return ret;
>  }
>  
> +static SaveVMHandlers savevm_vfio_handlers = {
> +    .save_setup = vfio_save_setup,
> +    .save_cleanup = vfio_save_cleanup,
> +    .save_live_complete_precopy = vfio_save_complete_precopy,
> +    .save_state = vfio_save_state,
> +    .load_setup = vfio_load_setup,
> +    .load_cleanup = vfio_load_cleanup,
> +    .load_state = vfio_load_state,
> +};
> +
>  static SaveVMHandlers savevm_vfio_v1_handlers = {
>      .save_setup = vfio_v1_save_setup,
>      .save_cleanup = vfio_v1_save_cleanup,
> @@ -697,6 +910,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>  
>  /* ---------------------------------------------------------------------- */
>  
> +static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    enum vfio_device_mig_state new_state;
> +    int ret;
> +
> +    if (running) {
> +        new_state = VFIO_DEVICE_STATE_RUNNING;
> +    } else {
> +        new_state = VFIO_DEVICE_STATE_STOP;
> +    }
> +
> +    ret = vfio_migration_set_state(vbasedev, new_state,
> +                                   VFIO_DEVICE_STATE_ERROR);
> +    if (ret) {
> +        /*
> +         * Migration should be aborted in this case, but vm_state_notify()
> +         * currently does not support reporting failures.
> +         */
> +        if (migrate_get_current()->to_dst_file) {
> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> +        }
> +    }
> +
> +    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
> +                              new_state);
> +}
> +
>  static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -770,12 +1011,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>      case MIGRATION_STATUS_CANCELLED:
>      case MIGRATION_STATUS_FAILED:
>          bytes_transferred = 0;
> -        ret = vfio_migration_v1_set_state(vbasedev,
> -                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
> -                                            VFIO_DEVICE_STATE_V1_RESUMING),
> -                                          VFIO_DEVICE_STATE_V1_RUNNING);
> -        if (ret) {
> -            error_report("%s: Failed to set state RUNNING", vbasedev->name);
> +        if (migration->v2) {
> +            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> +                                     VFIO_DEVICE_STATE_ERROR);

Perhaps you are discarding the error?

Shouldn't it be:

	err =  vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
					VFIO_DEVICE_STATE_ERROR);

> +        } else {
> +            ret = vfio_migration_v1_set_state(vbasedev,
> +                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
> +                                                VFIO_DEVICE_STATE_V1_RESUMING),
> +                                              VFIO_DEVICE_STATE_V1_RUNNING);
> +            if (ret) {
> +                error_report("%s: Failed to set state RUNNING", vbasedev->name);
> +            }

Perhaps this error_report and condition is in the wrong scope?

Shouldn't it be more like this:

if (migration->v2) {
	ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
				 VFIO_DEVICE_STATE_ERROR);
} else {
        ret = vfio_migration_v1_set_state(vbasedev,
                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
                                            VFIO_DEVICE_STATE_V1_RESUMING),
                                          VFIO_DEVICE_STATE_V1_RUNNING);
}


if (ret) {
    error_report("%s: Failed to set state RUNNING", vbasedev->name);
}

>          }
>      }
>  }
> @@ -784,12 +1030,35 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
>  {
>      VFIOMigration *migration = vbasedev->migration;
>  
> -    vfio_region_exit(&migration->region);
> -    vfio_region_finalize(&migration->region);
> +    if (migration->v2) {
> +        g_free(migration->data_buffer);
> +    } else {
> +        vfio_region_exit(&migration->region);
> +        vfio_region_finalize(&migration->region);
> +    }
>      g_free(vbasedev->migration);
>      vbasedev->migration = NULL;
>  }
>  
> +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
> +{
> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> +                                  sizeof(struct vfio_device_feature_migration),
> +                              sizeof(uint64_t))] = {};
> +    struct vfio_device_feature *feature = (void *)buf;
> +    struct vfio_device_feature_migration *mig = (void *)feature->data;
> +
> +    feature->argsz = sizeof(buf);
> +    feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        return -EOPNOTSUPP;
> +    }
> +
> +    *mig_flags = mig->flags;
> +
> +    return 0;
> +}
> +
>  static int vfio_migration_init(VFIODevice *vbasedev)
>  {
>      int ret;
> @@ -798,6 +1067,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      char id[256] = "";
>      g_autofree char *path = NULL, *oid = NULL;
>      struct vfio_region_info *info = NULL;
> +    uint64_t mig_flags;
>  
>      if (!vbasedev->ops->vfio_get_object) {
>          return -EINVAL;
> @@ -808,32 +1078,48 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>          return -EINVAL;
>      }
>  
> -    ret = vfio_get_dev_region_info(vbasedev,
> -                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> -                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> -                                   &info);
> -    if (ret) {
> -        return ret;
> -    }
> +    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> +    if (!ret) {
> +        /* Migration v2 */
> +        /* Basic migration functionality must be supported */
> +        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> +            return -EOPNOTSUPP;
> +        }
> +        vbasedev->migration = g_new0(VFIOMigration, 1);
> +        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> +        vbasedev->migration->data_buffer =
> +            g_malloc0(vbasedev->migration->data_buffer_size);
> +        vbasedev->migration->data_fd = -1;
> +        vbasedev->migration->v2 = true;
> +    } else {
> +        /* Migration v1 */
> +        ret = vfio_get_dev_region_info(vbasedev,
> +                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> +                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> +                                       &info);
> +        if (ret) {
> +            return ret;
> +        }
>  
> -    vbasedev->migration = g_new0(VFIOMigration, 1);
> +        vbasedev->migration = g_new0(VFIOMigration, 1);
>  
> -    ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> -                            info->index, "migration");
> -    if (ret) {
> -        error_report("%s: Failed to setup VFIO migration region %d: %s",
> -                     vbasedev->name, info->index, strerror(-ret));
> -        goto err;
> -    }
> +        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> +                                info->index, "migration");
> +        if (ret) {
> +            error_report("%s: Failed to setup VFIO migration region %d: %s",
> +                         vbasedev->name, info->index, strerror(-ret));
> +            goto err;
> +        }
>  
> -    if (!vbasedev->migration->region.size) {
> -        error_report("%s: Invalid zero-sized VFIO migration region %d",
> -                     vbasedev->name, info->index);
> -        ret = -EINVAL;
> -        goto err;
> -    }
> +        if (!vbasedev->migration->region.size) {
> +            error_report("%s: Invalid zero-sized VFIO migration region %d",
> +                         vbasedev->name, info->index);
> +            ret = -EINVAL;
> +            goto err;
> +        }
>  
> -    g_free(info);
> +        g_free(info);
> +    }
>  
>      migration = vbasedev->migration;
>      migration->vbasedev = vbasedev;
> @@ -846,11 +1132,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      }
>      strpadcpy(id, sizeof(id), path, '\0');
>  
> -    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                         &savevm_vfio_v1_handlers, vbasedev);
> +    if (migration->v2) {
> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> +                             &savevm_vfio_handlers, vbasedev);
> +
> +        migration->vm_state = qdev_add_vm_change_state_handler(
> +            vbasedev->dev, vfio_vmstate_change, vbasedev);
> +    } else {
> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> +                             &savevm_vfio_v1_handlers, vbasedev);
> +
> +        migration->vm_state = qdev_add_vm_change_state_handler(
> +            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
> +    }
>  
> -    migration->vm_state = qdev_add_vm_change_state_handler(
> -        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>      migration->migration_state.notify = vfio_migration_state_notifier;
>      add_migration_state_change_notifier(&migration->migration_state);
>      return 0;
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index ac8b04f52a..6e8c5958b9 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -163,6 +163,8 @@ vfio_save_complete_precopy(const char *name) " (%s)"
>  vfio_load_device_config_state(const char *name) " (%s)"
>  vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
>  vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
> +vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
>  vfio_load_cleanup(const char *name) " (%s)"
>  vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
>  vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
> +vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bbaf72ba00..2ec3346fea 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
>      int vm_running;
>      Notifier migration_state;
>      uint64_t pending_bytes;
> +    enum vfio_device_mig_state device_state;
> +    int data_fd;
> +    void *data_buffer;
> +    size_t data_buffer_size;
> +    bool v2;
>  } VFIOMigration;
>  
>  typedef struct VFIOAddressSpace {


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-14 11:08   ` Joao Martins
@ 2022-06-14 16:34     ` Avihai Horon
  2022-06-14 17:24       ` Joao Martins
  0 siblings, 1 reply; 31+ messages in thread
From: Avihai Horon @ 2022-06-14 16:34 UTC (permalink / raw)
  To: Joao Martins
  Cc: Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Cornelia Huck, qemu-devel,
	Alex Williamson, Dr . David Alan Gilbert, Juan Quintela


On 6/14/2022 2:08 PM, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> On 5/30/22 18:07, Avihai Horon wrote:
>> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    enum vfio_device_mig_state recover_state;
>> +    int ret;
>> +
>> +    /* We reach here with device state STOP or STOP_COPY only */
>> +    recover_state = VFIO_DEVICE_STATE_STOP;
>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
>> +                                   recover_state);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    do {
>> +        ret = vfio_save_block(f, vbasedev->migration);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    } while (!ret);
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> +    ret = qemu_file_get_error(f);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>> +                                   recover_state);
> Is it expected that you are setting VFIO_DEVICE_STATE_STOP while
> @recover_state is the same value (VFIO_DEVICE_STATE_STOP) ?


Yes.
Transitioning to any other state from STOP_COPY will first go through 
STOP state (this is done internally by kernel).
So there is no better option for the recover state but STOP.

>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    trace_vfio_save_complete_precopy(vbasedev->name);
>> +
>> +    return 0;
>> +}
>> +
>>   static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -593,6 +775,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>>       }
>>   }
>>
>> +static int vfio_load_setup(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +
>> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
>> +                                   vbasedev->migration->device_state);
>> +}
>> +
>>   static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -620,6 +810,15 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>       return ret;
>>   }
>>
>> +static int vfio_load_cleanup(void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +
>> +    vfio_migration_cleanup(vbasedev);
>> +    trace_vfio_load_cleanup(vbasedev->name);
>> +    return 0;
>> +}
>> +
>>   static int vfio_v1_load_cleanup(void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -662,7 +861,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>               uint64_t data_size = qemu_get_be64(f);
>>
>>               if (data_size) {
>> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>> +                if (vbasedev->migration->v2) {
>> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
>> +                } else {
>> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>> +                }
>>                   if (ret < 0) {
>>                       return ret;
>>                   }
>> @@ -683,6 +886,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>       return ret;
>>   }
>>
>> +static SaveVMHandlers savevm_vfio_handlers = {
>> +    .save_setup = vfio_save_setup,
>> +    .save_cleanup = vfio_save_cleanup,
>> +    .save_live_complete_precopy = vfio_save_complete_precopy,
>> +    .save_state = vfio_save_state,
>> +    .load_setup = vfio_load_setup,
>> +    .load_cleanup = vfio_load_cleanup,
>> +    .load_state = vfio_load_state,
>> +};
>> +
>>   static SaveVMHandlers savevm_vfio_v1_handlers = {
>>       .save_setup = vfio_v1_save_setup,
>>       .save_cleanup = vfio_v1_save_cleanup,
>> @@ -697,6 +910,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>>
>>   /* ---------------------------------------------------------------------- */
>>
>> +static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    enum vfio_device_mig_state new_state;
>> +    int ret;
>> +
>> +    if (running) {
>> +        new_state = VFIO_DEVICE_STATE_RUNNING;
>> +    } else {
>> +        new_state = VFIO_DEVICE_STATE_STOP;
>> +    }
>> +
>> +    ret = vfio_migration_set_state(vbasedev, new_state,
>> +                                   VFIO_DEVICE_STATE_ERROR);
>> +    if (ret) {
>> +        /*
>> +         * Migration should be aborted in this case, but vm_state_notify()
>> +         * currently does not support reporting failures.
>> +         */
>> +        if (migrate_get_current()->to_dst_file) {
>> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>> +        }
>> +    }
>> +
>> +    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
>> +                              new_state);
>> +}
>> +
>>   static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -770,12 +1011,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>>       case MIGRATION_STATUS_CANCELLED:
>>       case MIGRATION_STATUS_FAILED:
>>           bytes_transferred = 0;
>> -        ret = vfio_migration_v1_set_state(vbasedev,
>> -                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
>> -                                            VFIO_DEVICE_STATE_V1_RESUMING),
>> -                                          VFIO_DEVICE_STATE_V1_RUNNING);
>> -        if (ret) {
>> -            error_report("%s: Failed to set state RUNNING", vbasedev->name);
>> +        if (migration->v2) {
>> +            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>> +                                     VFIO_DEVICE_STATE_ERROR);
> Perhaps you are discarding the error?
>
> Shouldn't it be:
>
>          err =  vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>                                          VFIO_DEVICE_STATE_ERROR);
>
>> +        } else {
>> +            ret = vfio_migration_v1_set_state(vbasedev,
>> +                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
>> +                                                VFIO_DEVICE_STATE_V1_RESUMING),
>> +                                              VFIO_DEVICE_STATE_V1_RUNNING);
>> +            if (ret) {
>> +                error_report("%s: Failed to set state RUNNING", vbasedev->name);
>> +            }
> Perhaps this error_report and condition is in the wrong scope?
>
> Shouldn't it be more like this:
>
> if (migration->v2) {
>          ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>                                   VFIO_DEVICE_STATE_ERROR);
> } else {
>          ret = vfio_migration_v1_set_state(vbasedev,
>                                            ~(VFIO_DEVICE_STATE_V1_SAVING |
>                                              VFIO_DEVICE_STATE_V1_RESUMING),
>                                            VFIO_DEVICE_STATE_V1_RUNNING);
> }
>
>
> if (ret) {
>      error_report("%s: Failed to set state RUNNING", vbasedev->name);
> }


It was intentionally discarded.
The return value is used by v1 code to determine whether to print an 
error message or not.
In v2 code the error message print is done inside 
vfio_migration_set_state(), so there is no
need for the return value here.

>>           }
>>       }
>>   }
>> @@ -784,12 +1030,35 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>
>> -    vfio_region_exit(&migration->region);
>> -    vfio_region_finalize(&migration->region);
>> +    if (migration->v2) {
>> +        g_free(migration->data_buffer);
>> +    } else {
>> +        vfio_region_exit(&migration->region);
>> +        vfio_region_finalize(&migration->region);
>> +    }
>>       g_free(vbasedev->migration);
>>       vbasedev->migration = NULL;
>>   }
>>
>> +static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
>> +{
>> +    uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> +                                  sizeof(struct vfio_device_feature_migration),
>> +                              sizeof(uint64_t))] = {};
>> +    struct vfio_device_feature *feature = (void *)buf;
>> +    struct vfio_device_feature_migration *mig = (void *)feature->data;
>> +
>> +    feature->argsz = sizeof(buf);
>> +    feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
>> +    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    *mig_flags = mig->flags;
>> +
>> +    return 0;
>> +}
>> +
>>   static int vfio_migration_init(VFIODevice *vbasedev)
>>   {
>>       int ret;
>> @@ -798,6 +1067,7 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>       char id[256] = "";
>>       g_autofree char *path = NULL, *oid = NULL;
>>       struct vfio_region_info *info = NULL;
>> +    uint64_t mig_flags;
>>
>>       if (!vbasedev->ops->vfio_get_object) {
>>           return -EINVAL;
>> @@ -808,32 +1078,48 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>           return -EINVAL;
>>       }
>>
>> -    ret = vfio_get_dev_region_info(vbasedev,
>> -                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> -                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> -                                   &info);
>> -    if (ret) {
>> -        return ret;
>> -    }
>> +    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
>> +    if (!ret) {
>> +        /* Migration v2 */
>> +        /* Basic migration functionality must be supported */
>> +        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
>> +            return -EOPNOTSUPP;
>> +        }
>> +        vbasedev->migration = g_new0(VFIOMigration, 1);
>> +        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
>> +        vbasedev->migration->data_buffer =
>> +            g_malloc0(vbasedev->migration->data_buffer_size);
>> +        vbasedev->migration->data_fd = -1;
>> +        vbasedev->migration->v2 = true;
>> +    } else {
>> +        /* Migration v1 */
>> +        ret = vfio_get_dev_region_info(vbasedev,
>> +                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> +                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> +                                       &info);
>> +        if (ret) {
>> +            return ret;
>> +        }
>>
>> -    vbasedev->migration = g_new0(VFIOMigration, 1);
>> +        vbasedev->migration = g_new0(VFIOMigration, 1);
>>
>> -    ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
>> -                            info->index, "migration");
>> -    if (ret) {
>> -        error_report("%s: Failed to setup VFIO migration region %d: %s",
>> -                     vbasedev->name, info->index, strerror(-ret));
>> -        goto err;
>> -    }
>> +        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
>> +                                info->index, "migration");
>> +        if (ret) {
>> +            error_report("%s: Failed to setup VFIO migration region %d: %s",
>> +                         vbasedev->name, info->index, strerror(-ret));
>> +            goto err;
>> +        }
>>
>> -    if (!vbasedev->migration->region.size) {
>> -        error_report("%s: Invalid zero-sized VFIO migration region %d",
>> -                     vbasedev->name, info->index);
>> -        ret = -EINVAL;
>> -        goto err;
>> -    }
>> +        if (!vbasedev->migration->region.size) {
>> +            error_report("%s: Invalid zero-sized VFIO migration region %d",
>> +                         vbasedev->name, info->index);
>> +            ret = -EINVAL;
>> +            goto err;
>> +        }
>>
>> -    g_free(info);
>> +        g_free(info);
>> +    }
>>
>>       migration = vbasedev->migration;
>>       migration->vbasedev = vbasedev;
>> @@ -846,11 +1132,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>       }
>>       strpadcpy(id, sizeof(id), path, '\0');
>>
>> -    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> -                         &savevm_vfio_v1_handlers, vbasedev);
>> +    if (migration->v2) {
>> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> +                             &savevm_vfio_handlers, vbasedev);
>> +
>> +        migration->vm_state = qdev_add_vm_change_state_handler(
>> +            vbasedev->dev, vfio_vmstate_change, vbasedev);
>> +    } else {
>> +        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
>> +                             &savevm_vfio_v1_handlers, vbasedev);
>> +
>> +        migration->vm_state = qdev_add_vm_change_state_handler(
>> +            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>> +    }
>>
>> -    migration->vm_state = qdev_add_vm_change_state_handler(
>> -        vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
>>       migration->migration_state.notify = vfio_migration_state_notifier;
>>       add_migration_state_change_notifier(&migration->migration_state);
>>       return 0;
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index ac8b04f52a..6e8c5958b9 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -163,6 +163,8 @@ vfio_save_complete_precopy(const char *name) " (%s)"
>>   vfio_load_device_config_state(const char *name) " (%s)"
>>   vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
>>   vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
>> +vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
>>   vfio_load_cleanup(const char *name) " (%s)"
>>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
>>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
>> +vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index bbaf72ba00..2ec3346fea 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -66,6 +66,11 @@ typedef struct VFIOMigration {
>>       int vm_running;
>>       Notifier migration_state;
>>       uint64_t pending_bytes;
>> +    enum vfio_device_mig_state device_state;
>> +    int data_fd;
>> +    void *data_buffer;
>> +    size_t data_buffer_size;
>> +    bool v2;
>>   } VFIOMigration;
>>
>>   typedef struct VFIOAddressSpace {


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-14 16:34     ` Avihai Horon
@ 2022-06-14 17:24       ` Joao Martins
  2022-06-15  6:40         ` Avihai Horon
  0 siblings, 1 reply; 31+ messages in thread
From: Joao Martins @ 2022-06-14 17:24 UTC (permalink / raw)
  To: Avihai Horon
  Cc: Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Cornelia Huck, qemu-devel,
	Alex Williamson, Dr . David Alan Gilbert, Juan Quintela



On 6/14/22 17:34, Avihai Horon wrote:
> 
> On 6/14/2022 2:08 PM, Joao Martins wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 5/30/22 18:07, Avihai Horon wrote:
>>> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>> +{
>>> +    VFIODevice *vbasedev = opaque;
>>> +    enum vfio_device_mig_state recover_state;
>>> +    int ret;
>>> +
>>> +    /* We reach here with device state STOP or STOP_COPY only */
>>> +    recover_state = VFIO_DEVICE_STATE_STOP;
>>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
>>> +                                   recover_state);
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    do {
>>> +        ret = vfio_save_block(f, vbasedev->migration);
>>> +        if (ret < 0) {
>>> +            return ret;
>>> +        }
>>> +    } while (!ret);
>>> +
>>> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>> +    ret = qemu_file_get_error(f);
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>>> +                                   recover_state);
>> Is it expected that you are setting VFIO_DEVICE_STATE_STOP while
>> @recover_state is the same value (VFIO_DEVICE_STATE_STOP) ?
> 
> 
> Yes.
> Transitioning to any other state from STOP_COPY will first go through 
> STOP state (this is done internally by kernel).
> So there is no better option for the recover state but STOP.
> 
I was think about ERROR state given that you can transition there
from any state, but wasn't quite sure if it's appropriate to make that arc
while in stop copy migration phase.

>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    trace_vfio_save_complete_precopy(vbasedev->name);
>>> +
>>> +    return 0;

just a cosmetic nit: you could probably rewrite these last couple of lines as:

	if (!ret) {
	    trace_vfio_save_complete_precopy(vbasedev->name);
	}

	return ret;

Let's you avoid the double return path.

>>> +}
>>> +
>>>   static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> @@ -593,6 +775,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>>>       }
>>>   }
>>>
>>> +static int vfio_load_setup(QEMUFile *f, void *opaque)
>>> +{
>>> +    VFIODevice *vbasedev = opaque;
>>> +
>>> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
>>> +                                   vbasedev->migration->device_state);
>>> +}
>>> +
>>>   static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> @@ -620,6 +810,15 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>>       return ret;
>>>   }
>>>
>>> +static int vfio_load_cleanup(void *opaque)
>>> +{
>>> +    VFIODevice *vbasedev = opaque;
>>> +
>>> +    vfio_migration_cleanup(vbasedev);
>>> +    trace_vfio_load_cleanup(vbasedev->name);
>>> +    return 0;
>>> +}
>>> +
>>>   static int vfio_v1_load_cleanup(void *opaque)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> @@ -662,7 +861,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>               uint64_t data_size = qemu_get_be64(f);
>>>
>>>               if (data_size) {
>>> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>>> +                if (vbasedev->migration->v2) {
>>> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
>>> +                } else {
>>> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>>> +                }
>>>                   if (ret < 0) {
>>>                       return ret;
>>>                   }
>>> @@ -683,6 +886,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>       return ret;
>>>   }
>>>
>>> +static SaveVMHandlers savevm_vfio_handlers = {
>>> +    .save_setup = vfio_save_setup,
>>> +    .save_cleanup = vfio_save_cleanup,
>>> +    .save_live_complete_precopy = vfio_save_complete_precopy,
>>> +    .save_state = vfio_save_state,
>>> +    .load_setup = vfio_load_setup,
>>> +    .load_cleanup = vfio_load_cleanup,
>>> +    .load_state = vfio_load_state,
>>> +};
>>> +
>>>   static SaveVMHandlers savevm_vfio_v1_handlers = {
>>>       .save_setup = vfio_v1_save_setup,
>>>       .save_cleanup = vfio_v1_save_cleanup,
>>> @@ -697,6 +910,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>>>
>>>   /* ---------------------------------------------------------------------- */
>>>
>>> +static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>>> +{
>>> +    VFIODevice *vbasedev = opaque;
>>> +    enum vfio_device_mig_state new_state;
>>> +    int ret;
>>> +
>>> +    if (running) {
>>> +        new_state = VFIO_DEVICE_STATE_RUNNING;
>>> +    } else {
>>> +        new_state = VFIO_DEVICE_STATE_STOP;
>>> +    }
>>> +
>>> +    ret = vfio_migration_set_state(vbasedev, new_state,
>>> +                                   VFIO_DEVICE_STATE_ERROR);
>>> +    if (ret) {
>>> +        /*
>>> +         * Migration should be aborted in this case, but vm_state_notify()
>>> +         * currently does not support reporting failures.
>>> +         */
>>> +        if (migrate_get_current()->to_dst_file) {
>>> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>>> +        }
>>> +    }
>>> +
>>> +    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
>>> +                              new_state);
>>> +}
>>> +
>>>   static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> @@ -770,12 +1011,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>>>       case MIGRATION_STATUS_CANCELLED:
>>>       case MIGRATION_STATUS_FAILED:
>>>           bytes_transferred = 0;
>>> -        ret = vfio_migration_v1_set_state(vbasedev,
>>> -                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
>>> -                                            VFIO_DEVICE_STATE_V1_RESUMING),
>>> -                                          VFIO_DEVICE_STATE_V1_RUNNING);
>>> -        if (ret) {
>>> -            error_report("%s: Failed to set state RUNNING", vbasedev->name);
>>> +        if (migration->v2) {
>>> +            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>> +                                     VFIO_DEVICE_STATE_ERROR);
>> Perhaps you are discarding the error?
>>
>> Shouldn't it be:
>>
>>          err =  vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>                                          VFIO_DEVICE_STATE_ERROR);
>>
>>> +        } else {
>>> +            ret = vfio_migration_v1_set_state(vbasedev,
>>> +                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
>>> +                                                VFIO_DEVICE_STATE_V1_RESUMING),
>>> +                                              VFIO_DEVICE_STATE_V1_RUNNING);
>>> +            if (ret) {
>>> +                error_report("%s: Failed to set state RUNNING", vbasedev->name);
>>> +            }
>> Perhaps this error_report and condition is in the wrong scope?
>>
>> Shouldn't it be more like this:
>>
>> if (migration->v2) {
>>          ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>                                   VFIO_DEVICE_STATE_ERROR);
>> } else {
>>          ret = vfio_migration_v1_set_state(vbasedev,
>>                                            ~(VFIO_DEVICE_STATE_V1_SAVING |
>>                                              VFIO_DEVICE_STATE_V1_RESUMING),
>>                                            VFIO_DEVICE_STATE_V1_RUNNING);
>> }
>>
>>
>> if (ret) {
>>      error_report("%s: Failed to set state RUNNING", vbasedev->name);
>> }
> 
> 
> It was intentionally discarded.
> The return value is used by v1 code to determine whether to print an 
> error message or not.
> In v2 code the error message print is done inside 
> vfio_migration_set_state(), so there is no
> need for the return value here.
> 
Oh yes, I forgot that other print.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-14 17:24       ` Joao Martins
@ 2022-06-15  6:40         ` Avihai Horon
  0 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-06-15  6:40 UTC (permalink / raw)
  To: Joao Martins
  Cc: Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Cornelia Huck, qemu-devel,
	Alex Williamson, Dr . David Alan Gilbert, Juan Quintela


On 6/14/2022 8:24 PM, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/14/22 17:34, Avihai Horon wrote:
>> On 6/14/2022 2:08 PM, Joao Martins wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 5/30/22 18:07, Avihai Horon wrote:
>>>> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>>> +{
>>>> +    VFIODevice *vbasedev = opaque;
>>>> +    enum vfio_device_mig_state recover_state;
>>>> +    int ret;
>>>> +
>>>> +    /* We reach here with device state STOP or STOP_COPY only */
>>>> +    recover_state = VFIO_DEVICE_STATE_STOP;
>>>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
>>>> +                                   recover_state);
>>>> +    if (ret) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    do {
>>>> +        ret = vfio_save_block(f, vbasedev->migration);
>>>> +        if (ret < 0) {
>>>> +            return ret;
>>>> +        }
>>>> +    } while (!ret);
>>>> +
>>>> +    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>>> +    ret = qemu_file_get_error(f);
>>>> +    if (ret) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
>>>> +                                   recover_state);
>>> Is it expected that you are setting VFIO_DEVICE_STATE_STOP while
>>> @recover_state is the same value (VFIO_DEVICE_STATE_STOP) ?
>>
>> Yes.
>> Transitioning to any other state from STOP_COPY will first go through
>> STOP state (this is done internally by kernel).
>> So there is no better option for the recover state but STOP.
>>
> I was think about ERROR state given that you can transition there
> from any state, but wasn't quite sure if it's appropriate to make that arc
> while in stop copy migration phase.

Moving to ERROR is possible but it will just fail, triggering a 
hw_error() (and with following patch triggering a device reset).
Failing to move to STOP recover state will go the same path - trigger a 
hw_error() and with following patch a device reset.

The only difference is that by moving to STOP recover state we try one 
more time to set it to STOP.
Probably it's a useless try, since we failed the first time.

We can change the recover state to ERROR if it makes the code clearer 
and avoids the extra try.


>>>> +    if (ret) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    trace_vfio_save_complete_precopy(vbasedev->name);
>>>> +
>>>> +    return 0;
> just a cosmetic nit: you could probably rewrite these last couple of lines as:
>
>          if (!ret) {
>              trace_vfio_save_complete_precopy(vbasedev->name);
>          }
>
>          return ret;
>
> Let's you avoid the double return path.

Ah thanks! Will change.

>>>> +}
>>>> +
>>>>    static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
>>>>    {
>>>>        VFIODevice *vbasedev = opaque;
>>>> @@ -593,6 +775,14 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>>>>        }
>>>>    }
>>>>
>>>> +static int vfio_load_setup(QEMUFile *f, void *opaque)
>>>> +{
>>>> +    VFIODevice *vbasedev = opaque;
>>>> +
>>>> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
>>>> +                                   vbasedev->migration->device_state);
>>>> +}
>>>> +
>>>>    static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>>>    {
>>>>        VFIODevice *vbasedev = opaque;
>>>> @@ -620,6 +810,15 @@ static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
>>>>        return ret;
>>>>    }
>>>>
>>>> +static int vfio_load_cleanup(void *opaque)
>>>> +{
>>>> +    VFIODevice *vbasedev = opaque;
>>>> +
>>>> +    vfio_migration_cleanup(vbasedev);
>>>> +    trace_vfio_load_cleanup(vbasedev->name);
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static int vfio_v1_load_cleanup(void *opaque)
>>>>    {
>>>>        VFIODevice *vbasedev = opaque;
>>>> @@ -662,7 +861,11 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>>                uint64_t data_size = qemu_get_be64(f);
>>>>
>>>>                if (data_size) {
>>>> -                ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>>>> +                if (vbasedev->migration->v2) {
>>>> +                    ret = vfio_load_buffer(f, vbasedev, data_size);
>>>> +                } else {
>>>> +                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
>>>> +                }
>>>>                    if (ret < 0) {
>>>>                        return ret;
>>>>                    }
>>>> @@ -683,6 +886,16 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>>>>        return ret;
>>>>    }
>>>>
>>>> +static SaveVMHandlers savevm_vfio_handlers = {
>>>> +    .save_setup = vfio_save_setup,
>>>> +    .save_cleanup = vfio_save_cleanup,
>>>> +    .save_live_complete_precopy = vfio_save_complete_precopy,
>>>> +    .save_state = vfio_save_state,
>>>> +    .load_setup = vfio_load_setup,
>>>> +    .load_cleanup = vfio_load_cleanup,
>>>> +    .load_state = vfio_load_state,
>>>> +};
>>>> +
>>>>    static SaveVMHandlers savevm_vfio_v1_handlers = {
>>>>        .save_setup = vfio_v1_save_setup,
>>>>        .save_cleanup = vfio_v1_save_cleanup,
>>>> @@ -697,6 +910,34 @@ static SaveVMHandlers savevm_vfio_v1_handlers = {
>>>>
>>>>    /* ---------------------------------------------------------------------- */
>>>>
>>>> +static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>>>> +{
>>>> +    VFIODevice *vbasedev = opaque;
>>>> +    enum vfio_device_mig_state new_state;
>>>> +    int ret;
>>>> +
>>>> +    if (running) {
>>>> +        new_state = VFIO_DEVICE_STATE_RUNNING;
>>>> +    } else {
>>>> +        new_state = VFIO_DEVICE_STATE_STOP;
>>>> +    }
>>>> +
>>>> +    ret = vfio_migration_set_state(vbasedev, new_state,
>>>> +                                   VFIO_DEVICE_STATE_ERROR);
>>>> +    if (ret) {
>>>> +        /*
>>>> +         * Migration should be aborted in this case, but vm_state_notify()
>>>> +         * currently does not support reporting failures.
>>>> +         */
>>>> +        if (migrate_get_current()->to_dst_file) {
>>>> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
>>>> +                              new_state);
>>>> +}
>>>> +
>>>>    static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
>>>>    {
>>>>        VFIODevice *vbasedev = opaque;
>>>> @@ -770,12 +1011,17 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>>>>        case MIGRATION_STATUS_CANCELLED:
>>>>        case MIGRATION_STATUS_FAILED:
>>>>            bytes_transferred = 0;
>>>> -        ret = vfio_migration_v1_set_state(vbasedev,
>>>> -                                          ~(VFIO_DEVICE_STATE_V1_SAVING |
>>>> -                                            VFIO_DEVICE_STATE_V1_RESUMING),
>>>> -                                          VFIO_DEVICE_STATE_V1_RUNNING);
>>>> -        if (ret) {
>>>> -            error_report("%s: Failed to set state RUNNING", vbasedev->name);
>>>> +        if (migration->v2) {
>>>> +            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>>> +                                     VFIO_DEVICE_STATE_ERROR);
>>> Perhaps you are discarding the error?
>>>
>>> Shouldn't it be:
>>>
>>>           err =  vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>>                                           VFIO_DEVICE_STATE_ERROR);
>>>
>>>> +        } else {
>>>> +            ret = vfio_migration_v1_set_state(vbasedev,
>>>> +                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
>>>> +                                                VFIO_DEVICE_STATE_V1_RESUMING),
>>>> +                                              VFIO_DEVICE_STATE_V1_RUNNING);
>>>> +            if (ret) {
>>>> +                error_report("%s: Failed to set state RUNNING", vbasedev->name);
>>>> +            }
>>> Perhaps this error_report and condition is in the wrong scope?
>>>
>>> Shouldn't it be more like this:
>>>
>>> if (migration->v2) {
>>>           ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
>>>                                    VFIO_DEVICE_STATE_ERROR);
>>> } else {
>>>           ret = vfio_migration_v1_set_state(vbasedev,
>>>                                             ~(VFIO_DEVICE_STATE_V1_SAVING |
>>>                                               VFIO_DEVICE_STATE_V1_RESUMING),
>>>                                             VFIO_DEVICE_STATE_V1_RUNNING);
>>> }
>>>
>>>
>>> if (ret) {
>>>       error_report("%s: Failed to set state RUNNING", vbasedev->name);
>>> }
>>
>> It was intentionally discarded.
>> The return value is used by v1 code to determine whether to print an
>> error message or not.
>> In v2 code the error message print is done inside
>> vfio_migration_set_state(), so there is no
>> need for the return value here.
>>
> Oh yes, I forgot that other print.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-13 11:21     ` Avihai Horon
@ 2022-06-17 21:51       ` Alex Williamson
  2022-06-23 14:56         ` Jason Gunthorpe
  2022-06-27  7:36         ` Avihai Horon
  0 siblings, 2 replies; 31+ messages in thread
From: Alex Williamson @ 2022-06-17 21:51 UTC (permalink / raw)
  To: Avihai Horon
  Cc: qemu-devel, Cornelia Huck, Juan Quintela,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Mark Bloch, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta

On Mon, 13 Jun 2022 14:21:26 +0300
Avihai Horon <avihaih@nvidia.com> wrote:

> On 6/8/2022 12:32 AM, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 7 Jun 2022 20:44:23 +0300
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >  
> >> On 5/30/2022 8:07 PM, Avihai Horon wrote:  
> >>> Hello,
> >>>
> >>> Following VFIO migration protocol v2 acceptance in kernel, this series
> >>> implements VFIO migration according to the new v2 protocol and replaces
> >>> the now deprecated v1 implementation.
> >>>
> >>> The main differences between v1 and v2 migration protocols are:
> >>> 1. VFIO device state is represented as a finite state machine instead of
> >>>      a bitmap.
> >>>
> >>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
> >>>      ioctl and normal read() and write() instead of the migration region
> >>>      used in v1.
> >>>
> >>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
> >>>      migration.
> >>>
> >>> Full description of the v2 protocol and the differences from v1 can be
> >>> found here [1].
> >>>
> >>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> >>> that will be used later.
> >>>
> >>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
> >>> protocol.
> >>>
> >>> Patches 7-11 implement v2 protocol and remove v1 protocol.
> >>>
> >>> Thanks.
> >>>
> >>> [1]
> >>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
> >>>
> >>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> >>> - Split the big patch that replaced v1 with v2 into several patches as
> >>>     suggested by Joao, to make review easier.
> >>> - Change warn_report to warn_report_once when container doesn't support
> >>>     dirty tracking.
> >>> - Add Reviewed-by tag.
> >>>
> >>> Avihai Horon (11):
> >>>     vfio/migration: Fix NULL pointer dereference bug
> >>>     vfio/migration: Skip pre-copy if dirty page tracking is not supported
> >>>     migration/qemu-file: Add qemu_file_get_to_fd()
> >>>     vfio/common: Change vfio_devices_all_running_and_saving() logic to
> >>>       equivalent one
> >>>     vfio/migration: Move migration v1 logic to vfio_migration_init()
> >>>     vfio/migration: Rename functions/structs related to v1 protocol
> >>>     vfio/migration: Implement VFIO migration protocol v2
> >>>     vfio/migration: Remove VFIO migration protocol v1
> >>>     vfio/migration: Reset device if setting recover state fails
> >>>     vfio: Alphabetize migration section of VFIO trace-events file
> >>>     docs/devel: Align vfio-migration docs to VFIO migration v2
> >>>
> >>>    docs/devel/vfio-migration.rst |  77 ++--
> >>>    hw/vfio/common.c              |  21 +-
> >>>    hw/vfio/migration.c           | 640 ++++++++--------------------------
> >>>    hw/vfio/trace-events          |  25 +-
> >>>    include/hw/vfio/vfio-common.h |   8 +-
> >>>    migration/migration.c         |   5 +
> >>>    migration/migration.h         |   3 +
> >>>    migration/qemu-file.c         |  34 ++
> >>>    migration/qemu-file.h         |   1 +
> >>>    9 files changed, 252 insertions(+), 562 deletions(-)
> >>>  
> >> Ping.  
> > Based on the changelog, this seems like a mostly cosmetic spin and I
> > don't see that all of the discussion threads from v1 were resolved to
> > everyone's satisfaction.  I'm certainly still uncomfortable with the
> > pre-copy behavior and I thought there were still some action items to
> > figure out whether an SLA is present and vet the solution with
> > management tools.  Thanks,  
> 
> Yes.
> OK, so let's clear things up and reach an agreement before I prepare the 
> v3 series.
> 
> There are three topics that came up in previous discussion:
> 
>  1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
>     Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
>     outside migration thread.
>     This code existed before and I fixed a NULL pointer dereference that
>     I encountered.
>     I suggested that later we can refactor VMChangeStateHandler to
>     return error.
>     I prefer not to do this refactor right now because I am not sure
>     it's as straightforward change as it might seem - if some notifier
>     fails and we abort do_vm_stop/vm_prepare_start in the middle, can
>     this leave the VM in some unstable state?
>     We plan to leave it as is and not do the refactor as part of this
>     series.
>     Are you ok with this?

I'll defer to Juan here, it's not 100% clear to me from the last reply
if he's looking for that sooner than later.  Juan?
 
>  2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page
>     tracking is not supported.
>     As previously discussed, this patch doesn't consider the configured
>     downtime limit.
>     One way to fix it is to allow such migration only when "no SLA" (no
>     downtime limit) is set. AFAIK today there is no way that one can set
>     "no SLA".
>     If we go with this option, we change normal flow of migration
>     (skipping pre-copy) and might need to change management tools.
> 
> Instead, what about letting QEMU VFIO code mark all pages dirty (instead 
> of kernel)?
> This way we don’t skip pre-copy and we get the same behavior we have now 
> of perpetual dirtying all RAM, which respects SLA.
> If we go with this option, do we need to block migration when IOMMU is 
> sPAPR TCE?
> Until now migration would be blocked because sPAPR TCE doesn't report 
> dirty_pages_supported cap, but going with this option we will allow 
> migration even when dirty_pages_supported cap is not set (and let QEMU 
> dirty all pages).

It's ok by me if QEMU vfio is the one that marks all mapped pages dirty
if the host interface provides no way to do so.  Would we toggle that
based on whether the device has bus-master enabled?

Regarding SPAPR, I'd tend to think that if we're dirtying in QEMU then
nothing prevents us from implementing the same there, but also I'm not
going to stand in the way of simply disabling migration for that IOMMU
backend unless someone speaks up that they think it deserves parity.
 
>  3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
>     Juan expressed his concern about the amount of data that will go
>     through main migration thread.
> 
> This is already the case in v1 protocol - VFIO devices send all their 
> data in the main migration thread. Note that like in v1 protocol, here 
> as well the data is sent in small sized chunks, each with a header.
> This patch just aims to eliminate an extra copy.
> 
> We plan to leave it as is. Is this ok?

I don't think we should lean too heavily on this being a bump from v1 to
v2 protocol as v1 was only ever experimental and hasn't been widely
used in practice AFAIK.  Again, I'll defer to the migration folks for
this, it requires their buy-in.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-17 21:51       ` Alex Williamson
@ 2022-06-23 14:56         ` Jason Gunthorpe
  2022-06-27  7:36         ` Avihai Horon
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2022-06-23 14:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Avihai Horon, qemu-devel, Cornelia Huck, Juan Quintela,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Fri, Jun 17, 2022 at 03:51:29PM -0600, Alex Williamson wrote:

> It's ok by me if QEMU vfio is the one that marks all mapped pages dirty
> if the host interface provides no way to do so.  Would we toggle that
> based on whether the device has bus-master enabled?

I don't think so, that is a very niche optimization, it would only
happen if a device is plugged in but never used.

If a device truely doesn't have bus master capability at all then it's
VFIO migration driver should implement report dirties and report no
dirties.

> Regarding SPAPR, I'd tend to think that if we're dirtying in QEMU then
> nothing prevents us from implementing the same there, but also I'm not
> going to stand in the way of simply disabling migration for that IOMMU
> backend unless someone speaks up that they think it deserves parity.

If the VFIO device internal tracker is being used it should work with
SPAPR too.

The full algorithm should be to try to find a dirty tracker for each
VFIO migration device and if none is found then always dirty
everything at STOP_COPY.

iommufd will provide the only global dirty tracker, so if SPAPR or
legacy VFIO type1 is used without a device internal tracker then it
should do the all-dirties.

Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2
  2022-06-17 21:51       ` Alex Williamson
  2022-06-23 14:56         ` Jason Gunthorpe
@ 2022-06-27  7:36         ` Avihai Horon
  1 sibling, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-06-27  7:36 UTC (permalink / raw)
  To: Alex Williamson, Juan Quintela
  Cc: qemu-devel, Cornelia Huck, Dr . David Alan Gilbert, Joao Martins,
	Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta


On 6/18/2022 12:51 AM, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 13 Jun 2022 14:21:26 +0300
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> On 6/8/2022 12:32 AM, Alex Williamson wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Tue, 7 Jun 2022 20:44:23 +0300
>>> Avihai Horon <avihaih@nvidia.com> wrote:
>>>
>>>> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>>>>> Hello,
>>>>>
>>>>> Following VFIO migration protocol v2 acceptance in kernel, this series
>>>>> implements VFIO migration according to the new v2 protocol and replaces
>>>>> the now deprecated v1 implementation.
>>>>>
>>>>> The main differences between v1 and v2 migration protocols are:
>>>>> 1. VFIO device state is represented as a finite state machine instead of
>>>>>       a bitmap.
>>>>>
>>>>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>>>>>       ioctl and normal read() and write() instead of the migration region
>>>>>       used in v1.
>>>>>
>>>>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>>>>>       migration.
>>>>>
>>>>> Full description of the v2 protocol and the differences from v1 can be
>>>>> found here [1].
>>>>>
>>>>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
>>>>> that will be used later.
>>>>>
>>>>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
>>>>> protocol.
>>>>>
>>>>> Patches 7-11 implement v2 protocol and remove v1 protocol.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>>>>>
>>>>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
>>>>> - Split the big patch that replaced v1 with v2 into several patches as
>>>>>      suggested by Joao, to make review easier.
>>>>> - Change warn_report to warn_report_once when container doesn't support
>>>>>      dirty tracking.
>>>>> - Add Reviewed-by tag.
>>>>>
>>>>> Avihai Horon (11):
>>>>>      vfio/migration: Fix NULL pointer dereference bug
>>>>>      vfio/migration: Skip pre-copy if dirty page tracking is not supported
>>>>>      migration/qemu-file: Add qemu_file_get_to_fd()
>>>>>      vfio/common: Change vfio_devices_all_running_and_saving() logic to
>>>>>        equivalent one
>>>>>      vfio/migration: Move migration v1 logic to vfio_migration_init()
>>>>>      vfio/migration: Rename functions/structs related to v1 protocol
>>>>>      vfio/migration: Implement VFIO migration protocol v2
>>>>>      vfio/migration: Remove VFIO migration protocol v1
>>>>>      vfio/migration: Reset device if setting recover state fails
>>>>>      vfio: Alphabetize migration section of VFIO trace-events file
>>>>>      docs/devel: Align vfio-migration docs to VFIO migration v2
>>>>>
>>>>>     docs/devel/vfio-migration.rst |  77 ++--
>>>>>     hw/vfio/common.c              |  21 +-
>>>>>     hw/vfio/migration.c           | 640 ++++++++--------------------------
>>>>>     hw/vfio/trace-events          |  25 +-
>>>>>     include/hw/vfio/vfio-common.h |   8 +-
>>>>>     migration/migration.c         |   5 +
>>>>>     migration/migration.h         |   3 +
>>>>>     migration/qemu-file.c         |  34 ++
>>>>>     migration/qemu-file.h         |   1 +
>>>>>     9 files changed, 252 insertions(+), 562 deletions(-)
>>>>>
>>>> Ping.
>>> Based on the changelog, this seems like a mostly cosmetic spin and I
>>> don't see that all of the discussion threads from v1 were resolved to
>>> everyone's satisfaction.  I'm certainly still uncomfortable with the
>>> pre-copy behavior and I thought there were still some action items to
>>> figure out whether an SLA is present and vet the solution with
>>> management tools.  Thanks,
>> Yes.
>> OK, so let's clear things up and reach an agreement before I prepare the
>> v3 series.
>>
>> There are three topics that came up in previous discussion:
>>
>>   1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
>>      Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
>>      outside migration thread.
>>      This code existed before and I fixed a NULL pointer dereference that
>>      I encountered.
>>      I suggested that later we can refactor VMChangeStateHandler to
>>      return error.
>>      I prefer not to do this refactor right now because I am not sure
>>      it's as straightforward change as it might seem - if some notifier
>>      fails and we abort do_vm_stop/vm_prepare_start in the middle, can
>>      this leave the VM in some unstable state?
>>      We plan to leave it as is and not do the refactor as part of this
>>      series.
>>      Are you ok with this?
> I'll defer to Juan here, it's not 100% clear to me from the last reply
> if he's looking for that sooner than later.  Juan?
<snip>
>>   3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
>>      Juan expressed his concern about the amount of data that will go
>>      through main migration thread.
>>
>> This is already the case in v1 protocol - VFIO devices send all their
>> data in the main migration thread. Note that like in v1 protocol, here
>> as well the data is sent in small sized chunks, each with a header.
>> This patch just aims to eliminate an extra copy.
>>
>> We plan to leave it as is. Is this ok?
> I don't think we should lean too heavily on this being a bump from v1 to
> v2 protocol as v1 was only ever experimental and hasn't been widely
> used in practice AFAIK.  Again, I'll defer to the migration folks for
> this, it requires their buy-in.  Thanks,

Ping.
Juan, can you respond to items 1 and 3?
Thanks.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
  2022-06-14 11:08   ` Joao Martins
@ 2022-07-18 15:12   ` Jason Gunthorpe
  2022-07-27 15:45     ` Avihai Horon
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 15:12 UTC (permalink / raw)
  To: Avihai Horon, Juan Quintela
  Cc: qemu-devel, Cornelia Huck, Alex Williamson,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On Mon, May 30, 2022 at 08:07:35PM +0300, Avihai Horon wrote:

> +/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
> +static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
> +{
> +    ssize_t data_size;
> +
> +    data_size = read(migration->data_fd, migration->data_buffer,
> +                     migration->data_buffer_size);
> +    if (data_size < 0) {
> +        return -1;
> +    }
> +    if (data_size == 0) {
> +        return 1;
> +    }
> +
> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> +    qemu_put_be64(f, data_size);
> +    qemu_put_buffer_async(f, migration->data_buffer, data_size, false);
> +    qemu_fflush(f);
> +    bytes_transferred += data_size;
> +
> +    trace_vfio_save_block(migration->vbasedev->name, data_size);
> +
> +    return qemu_file_get_error(f);
> +}

We looked at this from an eye to "how much data is transfered" per
callback.

The above function is the basic data mover, and
'migration->data_buffer_size' is set to 1MB at the moment.

So, we product up to 1MB VFIO_MIG_FLAG_DEV_DATA_STATE sections.

This series does not include the precopy support, but that will
include a precopy 'save_live_iterate' function like this:

static int vfio_save_iterate(QEMUFile *f, void *opaque)
{
    VFIODevice *vbasedev = opaque;
    VFIOMigration *migration = vbasedev->migration;
    int ret;

    ret = vfio_save_block(f, migration);
    if (ret < 0) {
        return ret;
    }
    if (ret == 1) {
        return 1;
    }
    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
    return 0;
}

Thus, during precopy this will never do more than 1MB per callback.

> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    enum vfio_device_mig_state recover_state;
> +    int ret;
> +
> +    /* We reach here with device state STOP or STOP_COPY only */
> +    recover_state = VFIO_DEVICE_STATE_STOP;
> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> +                                   recover_state);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    do {
> +        ret = vfio_save_block(f, vbasedev->migration);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    } while (!ret);

This seems to be the main problem where we chain together 1MB blocks
until the entire completed precopy data is completed. The above is
hooked to 'save_live_complete_precopy'

So, if we want to break the above up into some 'save_iterate' like
function, do you have some advice how to do it? The above do/while
must happen after the VFIO_DEVICE_STATE_STOP_COPY.

For mlx5 the above loop will often be ~10MB's for small VMs and
100MB's for big VMs (big meaning making extensive use of RDMA
functionality), and this will not change with pre-copy support or not.

Is it still a problem?

For other devices, like a GPU, I would imagine pre-copy support is
implemented and this will be a smaller post-precopy residual.

Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2
  2022-07-18 15:12   ` Jason Gunthorpe
@ 2022-07-27 15:45     ` Avihai Horon
  0 siblings, 0 replies; 31+ messages in thread
From: Avihai Horon @ 2022-07-27 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Juan Quintela
  Cc: qemu-devel, Cornelia Huck, Alex Williamson,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 7/18/2022 6:12 PM, Jason Gunthorpe wrote:
> On Mon, May 30, 2022 at 08:07:35PM +0300, Avihai Horon wrote:
>
>> +/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
>> +static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
>> +{
>> +    ssize_t data_size;
>> +
>> +    data_size = read(migration->data_fd, migration->data_buffer,
>> +                     migration->data_buffer_size);
>> +    if (data_size < 0) {
>> +        return -1;
>> +    }
>> +    if (data_size == 0) {
>> +        return 1;
>> +    }
>> +
>> +    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
>> +    qemu_put_be64(f, data_size);
>> +    qemu_put_buffer_async(f, migration->data_buffer, data_size, false);
>> +    qemu_fflush(f);
>> +    bytes_transferred += data_size;
>> +
>> +    trace_vfio_save_block(migration->vbasedev->name, data_size);
>> +
>> +    return qemu_file_get_error(f);
>> +}
> We looked at this from an eye to "how much data is transfered" per
> callback.
>
> The above function is the basic data mover, and
> 'migration->data_buffer_size' is set to 1MB at the moment.
>
> So, we product up to 1MB VFIO_MIG_FLAG_DEV_DATA_STATE sections.
>
> This series does not include the precopy support, but that will
> include a precopy 'save_live_iterate' function like this:
>
> static int vfio_save_iterate(QEMUFile *f, void *opaque)
> {
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
>      int ret;
>
>      ret = vfio_save_block(f, migration);
>      if (ret < 0) {
>          return ret;
>      }
>      if (ret == 1) {
>          return 1;
>      }
>      qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>      return 0;
> }
>
> Thus, during precopy this will never do more than 1MB per callback.
>
>> +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    enum vfio_device_mig_state recover_state;
>> +    int ret;
>> +
>> +    /* We reach here with device state STOP or STOP_COPY only */
>> +    recover_state = VFIO_DEVICE_STATE_STOP;
>> +    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
>> +                                   recover_state);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    do {
>> +        ret = vfio_save_block(f, vbasedev->migration);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    } while (!ret);
> This seems to be the main problem where we chain together 1MB blocks
> until the entire completed precopy data is completed. The above is
> hooked to 'save_live_complete_precopy'
>
> So, if we want to break the above up into some 'save_iterate' like
> function, do you have some advice how to do it? The above do/while
> must happen after the VFIO_DEVICE_STATE_STOP_COPY.

Ping.

Juan, AFAIU (and correct me if I am wrong) the problem on source side is 
that save_live_complete_precopy handlers are called with iothread 
locked, so during this time QEMU is non-responsive.
On destination side, we don't yield every now and then like RAM code 
does, so QEMU is non-responsive there as well.

Is it possible to solve this problem by letting the VFIO 
save_live_complete_precopy handler run outside the iothread lock?

For example, add a function to SaveVMHandlers that indicates whether 
this specific save_live_complete_precopy handler should run 
inside/outside iothread lock?
Or add a save_live_complete_precopy_nonblocking handler that runs 
outside the iothread lock?

On destination side, since VFIO data is sent in chunks of 1MB, we can 
yield every now and then.

What do you think?

Thanks.

> For mlx5 the above loop will often be ~10MB's for small VMs and
> 100MB's for big VMs (big meaning making extensive use of RDMA
> functionality), and this will not change with pre-copy support or not.
>
> Is it still a problem?
>
> For other devices, like a GPU, I would imagine pre-copy support is
> implemented and this will be a smaller post-precopy residual.
>
> Jason


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1
  2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
@ 2022-09-19  8:35   ` liulongfang via
  2022-09-19 11:50     ` Alex Williamson
  2022-09-19  9:41   ` Philippe Mathieu-Daudé via
  1 sibling, 1 reply; 31+ messages in thread
From: liulongfang via @ 2022-09-19  8:35 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel, Cornelia Huck, Alex Williamson,
	Juan Quintela, Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta,
	Shameerali Kolothum Thodi, jiangkunkun

On 2022/5/31 1:07, Avihai Horon Wrote:
> Now that v2 protocol implementation has been added, remove the
> deprecated v1 implementation.
> "struct vfio_device_migration_info" still exists in vfio.h,
why does qemu need to delete v1 implementation?

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/common.c              |  19 +-
>  hw/vfio/migration.c           | 698 +---------------------------------
>  hw/vfio/trace-events          |   5 -
>  include/hw/vfio/vfio-common.h |   5 -
>  4 files changed, 24 insertions(+), 703 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5541133ec9..00c6cb0ffe 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>                  return false;
>              }
>  
> -            if (!migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> -                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
> -                return false;
> -            }
> -
> -            if (migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> +            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>                  (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
>                   migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
>                  return false;
> @@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>                  return false;
>              }
>  
> -            if (!migration->v2 &&
> -                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> -                continue;
> -            }
> -
> -            if (migration->v2 &&
> -                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> -                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> +            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +                migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
>                  continue;
>              } else {
>                  return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index de68eadb09..852759e6ca 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -121,220 +121,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>      return 0;
>  }
>  
> -static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
> -                                  off_t off, bool iswrite)
> -{
> -    int ret;
> -
> -    ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
> -                    pread(vbasedev->fd, val, count, off);
> -    if (ret < count) {
> -        error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
> -                     HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
> -                     vbasedev->name, off, strerror(errno));
> -        return (ret < 0) ? ret : -EINVAL;
> -    }
> -    return 0;
> -}
> -
> -static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
> -                       off_t off, bool iswrite)
> -{
> -    int ret, done = 0;
> -    __u8 *tbuf = buf;
> -
> -    while (count) {
> -        int bytes = 0;
> -
> -        if (count >= 8 && !(off % 8)) {
> -            bytes = 8;
> -        } else if (count >= 4 && !(off % 4)) {
> -            bytes = 4;
> -        } else if (count >= 2 && !(off % 2)) {
> -            bytes = 2;
> -        } else {
> -            bytes = 1;
> -        }
> -
> -        ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        count -= bytes;
> -        done += bytes;
> -        off += bytes;
> -        tbuf += bytes;
> -    }
> -    return done;
> -}
> -
> -#define vfio_mig_read(f, v, c, o)       vfio_mig_rw(f, (__u8 *)v, c, o, false)
> -#define vfio_mig_write(f, v, c, o)      vfio_mig_rw(f, (__u8 *)v, c, o, true)
> -
> -#define VFIO_MIG_STRUCT_OFFSET(f)       \
> -                                 offsetof(struct vfio_device_migration_info, f)
> -/*
> - * Change the device_state register for device @vbasedev. Bits set in @mask
> - * are preserved, bits set in @value are set, and bits not set in either @mask
> - * or @value are cleared in device_state. If the register cannot be accessed,
> - * the resulting state would be invalid, or the device enters an error state,
> - * an error is returned.
> - */
> -
> -static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
> -                                       uint32_t value)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    off_t dev_state_off = region->fd_offset +
> -                          VFIO_MIG_STRUCT_OFFSET(device_state);
> -    uint32_t device_state;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                        dev_state_off);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    device_state = (device_state & mask) | value;
> -
> -    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
> -        return -EINVAL;
> -    }
> -
> -    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
> -                         dev_state_off);
> -    if (ret < 0) {
> -        int rret;
> -
> -        rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                             dev_state_off);
> -
> -        if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
> -            hw_error("%s: Device in error state 0x%x", vbasedev->name,
> -                     device_state);
> -            return rret ? rret : -EIO;
> -        }
> -        return ret;
> -    }
> -
> -    migration->device_state_v1 = device_state;
> -    trace_vfio_migration_set_state(vbasedev->name, device_state);
> -    return 0;
> -}
> -
> -static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
> -                                   uint64_t data_size, uint64_t *size)
> -{
> -    void *ptr = NULL;
> -    uint64_t limit = 0;
> -    int i;
> -
> -    if (!region->mmaps) {
> -        if (size) {
> -            *size = MIN(data_size, region->size - data_offset);
> -        }
> -        return ptr;
> -    }
> -
> -    for (i = 0; i < region->nr_mmaps; i++) {
> -        VFIOMmap *map = region->mmaps + i;
> -
> -        if ((data_offset >= map->offset) &&
> -            (data_offset < map->offset + map->size)) {
> -
> -            /* check if data_offset is within sparse mmap areas */
> -            ptr = map->mmap + data_offset - map->offset;
> -            if (size) {
> -                *size = MIN(data_size, map->offset + map->size - data_offset);
> -            }
> -            break;
> -        } else if ((data_offset < map->offset) &&
> -                   (!limit || limit > map->offset)) {
> -            /*
> -             * data_offset is not within sparse mmap areas, find size of
> -             * non-mapped area. Check through all list since region->mmaps list
> -             * is not sorted.
> -             */
> -            limit = map->offset;
> -        }
> -    }
> -
> -    if (!ptr && size) {
> -        *size = limit ? MIN(data_size, limit - data_offset) : data_size;
> -    }
> -    return ptr;
> -}
> -
> -static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t data_offset = 0, data_size = 0, sz;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
> -                           migration->pending_bytes);
> -
> -    qemu_put_be64(f, data_size);
> -    sz = data_size;
> -
> -    while (sz) {
> -        void *buf;
> -        uint64_t sec_size;
> -        bool buf_allocated = false;
> -
> -        buf = get_data_section_size(region, data_offset, sz, &sec_size);
> -
> -        if (!buf) {
> -            buf = g_try_malloc(sec_size);
> -            if (!buf) {
> -                error_report("%s: Error allocating buffer ", __func__);
> -                return -ENOMEM;
> -            }
> -            buf_allocated = true;
> -
> -            ret = vfio_mig_read(vbasedev, buf, sec_size,
> -                                region->fd_offset + data_offset);
> -            if (ret < 0) {
> -                g_free(buf);
> -                return ret;
> -            }
> -        }
> -
> -        qemu_put_buffer(f, buf, sec_size);
> -
> -        if (buf_allocated) {
> -            g_free(buf);
> -        }
> -        sz -= sec_size;
> -        data_offset += sec_size;
> -    }
> -
> -    ret = qemu_file_get_error(f);
> -
> -    if (!ret && size) {
> -        *size = data_size;
> -    }
> -
> -    bytes_transferred += data_size;
> -    return ret;
> -}
> -
>  static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>                              uint64_t data_size)
>  {
> @@ -351,96 +137,6 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>      return 0;
>  }
>  
> -static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
> -                               uint64_t data_size)
> -{
> -    VFIORegion *region = &vbasedev->migration->region;
> -    uint64_t data_offset = 0, size, report_size;
> -    int ret;
> -
> -    do {
> -        ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -
> -        if (data_offset + data_size > region->size) {
> -            /*
> -             * If data_size is greater than the data section of migration region
> -             * then iterate the write buffer operation. This case can occur if
> -             * size of migration region at destination is smaller than size of
> -             * migration region at source.
> -             */
> -            report_size = size = region->size - data_offset;
> -            data_size -= size;
> -        } else {
> -            report_size = size = data_size;
> -            data_size = 0;
> -        }
> -
> -        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
> -
> -        while (size) {
> -            void *buf;
> -            uint64_t sec_size;
> -            bool buf_alloc = false;
> -
> -            buf = get_data_section_size(region, data_offset, size, &sec_size);
> -
> -            if (!buf) {
> -                buf = g_try_malloc(sec_size);
> -                if (!buf) {
> -                    error_report("%s: Error allocating buffer ", __func__);
> -                    return -ENOMEM;
> -                }
> -                buf_alloc = true;
> -            }
> -
> -            qemu_get_buffer(f, buf, sec_size);
> -
> -            if (buf_alloc) {
> -                ret = vfio_mig_write(vbasedev, buf, sec_size,
> -                        region->fd_offset + data_offset);
> -                g_free(buf);
> -
> -                if (ret < 0) {
> -                    return ret;
> -                }
> -            }
> -            size -= sec_size;
> -            data_offset += sec_size;
> -        }
> -
> -        ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    } while (data_size);
> -
> -    return 0;
> -}
> -
> -static int vfio_update_pending(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t pending_bytes = 0;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
> -                    region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
> -    if (ret < 0) {
> -        migration->pending_bytes = 0;
> -        return ret;
> -    }
> -
> -    migration->pending_bytes = pending_bytes;
> -    trace_vfio_update_pending(vbasedev->name, pending_bytes);
> -    return 0;
> -}
> -
>  static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -493,15 +189,6 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
>      migration->data_fd = -1;
>  }
>  
> -static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (migration->region.mmaps) {
> -        vfio_region_unmap(&migration->region);
> -    }
> -}
> -
>  /* ---------------------------------------------------------------------- */
>  
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -516,49 +203,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>      return qemu_file_get_error(f);
>  }
>  
> -static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    trace_vfio_save_setup(vbasedev->name);
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> -
> -    if (migration->region.mmaps) {
> -        /*
> -         * Calling vfio_region_mmap() from migration thread. Memory API called
> -         * from this function require locking the iothread when called from
> -         * outside the main loop thread.
> -         */
> -        qemu_mutex_lock_iothread();
> -        ret = vfio_region_mmap(&migration->region);
> -        qemu_mutex_unlock_iothread();
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region: %s",
> -                         vbasedev->name, strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state SAVING", vbasedev->name);
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return 0;
> -}
> -
>  static void vfio_save_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -567,82 +211,6 @@ static void vfio_save_cleanup(void *opaque)
>      trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_v1_save_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_save_cleanup(vbasedev->name);
> -}
> -
> -static void vfio_save_pending(QEMUFile *f, void *opaque,
> -                              uint64_t threshold_size,
> -                              uint64_t *res_precopy_only,
> -                              uint64_t *res_compatible,
> -                              uint64_t *res_postcopy_only)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return;
> -    }
> -
> -    *res_precopy_only += migration->pending_bytes;
> -
> -    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
> -                            *res_postcopy_only, *res_compatible);
> -}
> -
> -static int vfio_save_iterate(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -
> -    if (migration->pending_bytes == 0) {
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        if (migration->pending_bytes == 0) {
> -            qemu_put_be64(f, 0);
> -            qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -            /* indicates data finished, goto complete phase */
> -            return 1;
> -        }
> -    }
> -
> -    ret = vfio_save_buffer(f, vbasedev, &data_size);
> -    if (ret) {
> -        error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
> -                     strerror(errno));
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    /*
> -     * Reset pending_bytes as .save_live_pending is not called during savevm or
> -     * snapshot case, in such case vfio_update_pending() at the start of this
> -     * function updates pending_bytes.
> -     */
> -    migration->pending_bytes = 0;
> -    trace_vfio_save_iterate(vbasedev->name, data_size);
> -    return 0;
> -}
> -
>  /* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
>  static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
>  {
> @@ -706,62 +274,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOP and SAVING",
> -                     vbasedev->name);
> -        return ret;
> -    }
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    while (migration->pending_bytes > 0) {
> -        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -        ret = vfio_save_buffer(f, vbasedev, &data_size);
> -        if (ret < 0) {
> -            error_report("%s: Failed to save buffer", vbasedev->name);
> -            return ret;
> -        }
> -
> -        if (data_size == 0) {
> -            break;
> -        }
> -
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
> -                                      0);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOPPED", vbasedev->name);
> -        return ret;
> -    }
> -
> -    trace_vfio_save_complete_precopy(vbasedev->name);
> -    return ret;
> -}
> -
>  static void vfio_save_state(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -783,33 +295,6 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
>                                     vbasedev->migration->device_state);
>  }
>  
> -static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret = 0;
> -
> -    if (migration->region.mmaps) {
> -        ret = vfio_region_mmap(&migration->region);
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region %d: %s",
> -                         vbasedev->name, migration->region.nr,
> -                         strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_RESUMING);
> -    if (ret) {
> -        error_report("%s: Failed to set state RESUMING", vbasedev->name);
> -        if (migration->region.mmaps) {
> -            vfio_region_unmap(&migration->region);
> -        }
> -    }
> -    return ret;
> -}
> -
>  static int vfio_load_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -819,15 +304,6 @@ static int vfio_load_cleanup(void *opaque)
>      return 0;
>  }
>  
> -static int vfio_v1_load_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_load_cleanup(vbasedev->name);
> -    return 0;
> -}
> -
>  static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -861,11 +337,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>              uint64_t data_size = qemu_get_be64(f);
>  
>              if (data_size) {
> -                if (vbasedev->migration->v2) {
> -                    ret = vfio_load_buffer(f, vbasedev, data_size);
> -                } else {
> -                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> -                }
> +                ret = vfio_load_buffer(f, vbasedev, data_size);
>                  if (ret < 0) {
>                      return ret;
>                  }
> @@ -896,18 +368,6 @@ static SaveVMHandlers savevm_vfio_handlers = {
>      .load_state = vfio_load_state,
>  };
>  
> -static SaveVMHandlers savevm_vfio_v1_handlers = {
> -    .save_setup = vfio_v1_save_setup,
> -    .save_cleanup = vfio_v1_save_cleanup,
> -    .save_live_pending = vfio_save_pending,
> -    .save_live_iterate = vfio_save_iterate,
> -    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
> -    .save_state = vfio_save_state,
> -    .load_setup = vfio_v1_load_setup,
> -    .load_cleanup = vfio_v1_load_cleanup,
> -    .load_state = vfio_load_state,
> -};
> -
>  /* ---------------------------------------------------------------------- */
>  
>  static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> @@ -938,70 +398,12 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>                                new_state);
>  }
>  
> -static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint32_t value, mask;
> -    int ret;
> -
> -    if (vbasedev->migration->vm_running == running) {
> -        return;
> -    }
> -
> -    if (running) {
> -        /*
> -         * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
> -         * Transition from _SAVING to _RUNNING can happen if there is migration
> -         * failure, in that case clear _SAVING bit.
> -         * Transition from _RESUMING to _RUNNING occurs during resuming
> -         * phase, in that case clear _RESUMING bit.
> -         * In both the above cases, set _RUNNING bit.
> -         */
> -        mask = ~VFIO_DEVICE_STATE_MASK;
> -        value = VFIO_DEVICE_STATE_V1_RUNNING;
> -    } else {
> -        /*
> -         * Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
> -         * _RUNNING bit
> -         */
> -        mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
> -
> -        /*
> -         * When VM state transition to stop for savevm command, device should
> -         * start saving data.
> -         */
> -        if (state == RUN_STATE_SAVE_VM) {
> -            value = VFIO_DEVICE_STATE_V1_SAVING;
> -        } else {
> -            value = 0;
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
> -    if (ret) {
> -        /*
> -         * Migration should be aborted in this case, but vm_state_notify()
> -         * currently does not support reporting failures.
> -         */
> -        error_report("%s: Failed to set device state 0x%x", vbasedev->name,
> -                     (migration->device_state_v1 & mask) | value);
> -        if (migrate_get_current()->to_dst_file) {
> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> -        }
> -    }
> -    vbasedev->migration->vm_running = running;
> -    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
> -            (migration->device_state_v1 & mask) | value);
> -}
> -
>  static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>  {
>      MigrationState *s = data;
>      VFIOMigration *migration = container_of(notifier, VFIOMigration,
>                                              migration_state);
>      VFIODevice *vbasedev = migration->vbasedev;
> -    int ret;
>  
>      trace_vfio_migration_state_notifier(vbasedev->name,
>                                          MigrationStatus_str(s->state));
> @@ -1011,31 +413,14 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>      case MIGRATION_STATUS_CANCELLED:
>      case MIGRATION_STATUS_FAILED:
>          bytes_transferred = 0;
> -        if (migration->v2) {
> -            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> -                                     VFIO_DEVICE_STATE_ERROR);
> -        } else {
> -            ret = vfio_migration_v1_set_state(vbasedev,
> -                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
> -                                                VFIO_DEVICE_STATE_V1_RESUMING),
> -                                              VFIO_DEVICE_STATE_V1_RUNNING);
> -            if (ret) {
> -                error_report("%s: Failed to set state RUNNING", vbasedev->name);
> -            }
> -        }
> +        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> +                                 VFIO_DEVICE_STATE_ERROR);
>      }
>  }
>  
>  static void vfio_migration_exit(VFIODevice *vbasedev)
>  {
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (migration->v2) {
> -        g_free(migration->data_buffer);
> -    } else {
> -        vfio_region_exit(&migration->region);
> -        vfio_region_finalize(&migration->region);
> -    }
> +    g_free(vbasedev->migration->data_buffer);
>      g_free(vbasedev->migration);
>      vbasedev->migration = NULL;
>  }
> @@ -1066,7 +451,6 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      VFIOMigration *migration;
>      char id[256] = "";
>      g_autofree char *path = NULL, *oid = NULL;
> -    struct vfio_region_info *info = NULL;
>      uint64_t mig_flags;
>  
>      if (!vbasedev->ops->vfio_get_object) {
> @@ -1079,48 +463,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      }
>  
>      ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> -    if (!ret) {
> -        /* Migration v2 */
> -        /* Basic migration functionality must be supported */
> -        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> -            return -EOPNOTSUPP;
> -        }
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> -        vbasedev->migration->data_buffer =
> -            g_malloc0(vbasedev->migration->data_buffer_size);
> -        vbasedev->migration->data_fd = -1;
> -        vbasedev->migration->v2 = true;
> -    } else {
> -        /* Migration v1 */
> -        ret = vfio_get_dev_region_info(vbasedev,
> -                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> -                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> -                                       &info);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -
> -        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> -                                info->index, "migration");
> -        if (ret) {
> -            error_report("%s: Failed to setup VFIO migration region %d: %s",
> -                         vbasedev->name, info->index, strerror(-ret));
> -            goto err;
> -        }
> -
> -        if (!vbasedev->migration->region.size) {
> -            error_report("%s: Invalid zero-sized VFIO migration region %d",
> -                         vbasedev->name, info->index);
> -            ret = -EINVAL;
> -            goto err;
> -        }
> +    if (ret) {
> +        return ret;
> +    }
>  
> -        g_free(info);
> +    /* Basic migration functionality must be supported */
> +    if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> +        return -EOPNOTSUPP;
>      }
>  
> +    vbasedev->migration = g_new0(VFIOMigration, 1);
> +    vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> +    vbasedev->migration->data_buffer =
> +        g_malloc0(vbasedev->migration->data_buffer_size);
> +    vbasedev->migration->data_fd = -1;
>      migration = vbasedev->migration;
>      migration->vbasedev = vbasedev;
>  
> @@ -1132,28 +488,16 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      }
>      strpadcpy(id, sizeof(id), path, '\0');
>  
> -    if (migration->v2) {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_vmstate_change, vbasedev);
> -    } else {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_v1_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
> -    }
> +    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
> +                         vbasedev);
>  
> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> +                                                           vfio_vmstate_change,
> +                                                           vbasedev);
>      migration->migration_state.notify = vfio_migration_state_notifier;
>      add_migration_state_change_notifier(&migration->migration_state);
> -    return 0;
>  
> -err:
> -    g_free(info);
> -    vfio_migration_exit(vbasedev);
> -    return ret;
> +    return 0;
>  }
>  
>  /* ---------------------------------------------------------------------- */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 6e8c5958b9..a24ea7d8b0 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -154,15 +154,10 @@ vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t
>  vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
>  vfio_save_setup(const char *name) " (%s)"
>  vfio_save_cleanup(const char *name) " (%s)"
> -vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
> -vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>  vfio_save_device_config_state(const char *name) " (%s)"
> -vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
> -vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
>  vfio_save_complete_precopy(const char *name) " (%s)"
>  vfio_load_device_config_state(const char *name) " (%s)"
>  vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
> -vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
>  vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
>  vfio_load_cleanup(const char *name) " (%s)"
>  vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 2ec3346fea..76d470178f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -61,16 +61,11 @@ typedef struct VFIORegion {
>  typedef struct VFIOMigration {
>      struct VFIODevice *vbasedev;
>      VMChangeStateEntry *vm_state;
> -    VFIORegion region;
> -    uint32_t device_state_v1;
> -    int vm_running;
>      Notifier migration_state;
> -    uint64_t pending_bytes;
>      enum vfio_device_mig_state device_state;
>      int data_fd;
>      void *data_buffer;
>      size_t data_buffer_size;
> -    bool v2;
>  } VFIOMigration;
>  
>  typedef struct VFIOAddressSpace {
> 
Thanks,
Longfang.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1
  2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
  2022-09-19  8:35   ` liulongfang via
@ 2022-09-19  9:41   ` Philippe Mathieu-Daudé via
  1 sibling, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-09-19  9:41 UTC (permalink / raw)
  To: Avihai Horon, reviewer:Incompatible changes
  Cc: qemu-devel@nongnu.org Developers, Cornelia Huck, Alex Williamson,
	Juan Quintela, Dr . David Alan Gilbert, Joao Martins,
	Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta

On Mon, May 30, 2022 at 7:56 PM Avihai Horon <avihaih@nvidia.com> wrote:
>
> Now that v2 protocol implementation has been added, remove the
> deprecated v1 implementation.

Worth a note in docs/about/deprecated.rst?

> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/common.c              |  19 +-
>  hw/vfio/migration.c           | 698 +---------------------------------
>  hw/vfio/trace-events          |   5 -
>  include/hw/vfio/vfio-common.h |   5 -
>  4 files changed, 24 insertions(+), 703 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5541133ec9..00c6cb0ffe 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>                  return false;
>              }
>
> -            if (!migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> -                (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
> -                return false;
> -            }
> -
> -            if (migration->v2 &&
> -                (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
> +            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
>                  (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
>                   migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
>                  return false;
> @@ -393,14 +386,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>                  return false;
>              }
>
> -            if (!migration->v2 &&
> -                migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
> -                continue;
> -            }
> -
> -            if (migration->v2 &&
> -                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> -                 migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
> +            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +                migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
>                  continue;
>              } else {
>                  return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index de68eadb09..852759e6ca 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -121,220 +121,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>      return 0;
>  }
>
> -static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
> -                                  off_t off, bool iswrite)
> -{
> -    int ret;
> -
> -    ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
> -                    pread(vbasedev->fd, val, count, off);
> -    if (ret < count) {
> -        error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
> -                     HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
> -                     vbasedev->name, off, strerror(errno));
> -        return (ret < 0) ? ret : -EINVAL;
> -    }
> -    return 0;
> -}
> -
> -static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
> -                       off_t off, bool iswrite)
> -{
> -    int ret, done = 0;
> -    __u8 *tbuf = buf;
> -
> -    while (count) {
> -        int bytes = 0;
> -
> -        if (count >= 8 && !(off % 8)) {
> -            bytes = 8;
> -        } else if (count >= 4 && !(off % 4)) {
> -            bytes = 4;
> -        } else if (count >= 2 && !(off % 2)) {
> -            bytes = 2;
> -        } else {
> -            bytes = 1;
> -        }
> -
> -        ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        count -= bytes;
> -        done += bytes;
> -        off += bytes;
> -        tbuf += bytes;
> -    }
> -    return done;
> -}
> -
> -#define vfio_mig_read(f, v, c, o)       vfio_mig_rw(f, (__u8 *)v, c, o, false)
> -#define vfio_mig_write(f, v, c, o)      vfio_mig_rw(f, (__u8 *)v, c, o, true)
> -
> -#define VFIO_MIG_STRUCT_OFFSET(f)       \
> -                                 offsetof(struct vfio_device_migration_info, f)
> -/*
> - * Change the device_state register for device @vbasedev. Bits set in @mask
> - * are preserved, bits set in @value are set, and bits not set in either @mask
> - * or @value are cleared in device_state. If the register cannot be accessed,
> - * the resulting state would be invalid, or the device enters an error state,
> - * an error is returned.
> - */
> -
> -static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
> -                                       uint32_t value)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    off_t dev_state_off = region->fd_offset +
> -                          VFIO_MIG_STRUCT_OFFSET(device_state);
> -    uint32_t device_state;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                        dev_state_off);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    device_state = (device_state & mask) | value;
> -
> -    if (!VFIO_DEVICE_STATE_VALID(device_state)) {
> -        return -EINVAL;
> -    }
> -
> -    ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
> -                         dev_state_off);
> -    if (ret < 0) {
> -        int rret;
> -
> -        rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
> -                             dev_state_off);
> -
> -        if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
> -            hw_error("%s: Device in error state 0x%x", vbasedev->name,
> -                     device_state);
> -            return rret ? rret : -EIO;
> -        }
> -        return ret;
> -    }
> -
> -    migration->device_state_v1 = device_state;
> -    trace_vfio_migration_set_state(vbasedev->name, device_state);
> -    return 0;
> -}
> -
> -static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
> -                                   uint64_t data_size, uint64_t *size)
> -{
> -    void *ptr = NULL;
> -    uint64_t limit = 0;
> -    int i;
> -
> -    if (!region->mmaps) {
> -        if (size) {
> -            *size = MIN(data_size, region->size - data_offset);
> -        }
> -        return ptr;
> -    }
> -
> -    for (i = 0; i < region->nr_mmaps; i++) {
> -        VFIOMmap *map = region->mmaps + i;
> -
> -        if ((data_offset >= map->offset) &&
> -            (data_offset < map->offset + map->size)) {
> -
> -            /* check if data_offset is within sparse mmap areas */
> -            ptr = map->mmap + data_offset - map->offset;
> -            if (size) {
> -                *size = MIN(data_size, map->offset + map->size - data_offset);
> -            }
> -            break;
> -        } else if ((data_offset < map->offset) &&
> -                   (!limit || limit > map->offset)) {
> -            /*
> -             * data_offset is not within sparse mmap areas, find size of
> -             * non-mapped area. Check through all list since region->mmaps list
> -             * is not sorted.
> -             */
> -            limit = map->offset;
> -        }
> -    }
> -
> -    if (!ptr && size) {
> -        *size = limit ? MIN(data_size, limit - data_offset) : data_size;
> -    }
> -    return ptr;
> -}
> -
> -static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t data_offset = 0, data_size = 0, sz;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
> -                           migration->pending_bytes);
> -
> -    qemu_put_be64(f, data_size);
> -    sz = data_size;
> -
> -    while (sz) {
> -        void *buf;
> -        uint64_t sec_size;
> -        bool buf_allocated = false;
> -
> -        buf = get_data_section_size(region, data_offset, sz, &sec_size);
> -
> -        if (!buf) {
> -            buf = g_try_malloc(sec_size);
> -            if (!buf) {
> -                error_report("%s: Error allocating buffer ", __func__);
> -                return -ENOMEM;
> -            }
> -            buf_allocated = true;
> -
> -            ret = vfio_mig_read(vbasedev, buf, sec_size,
> -                                region->fd_offset + data_offset);
> -            if (ret < 0) {
> -                g_free(buf);
> -                return ret;
> -            }
> -        }
> -
> -        qemu_put_buffer(f, buf, sec_size);
> -
> -        if (buf_allocated) {
> -            g_free(buf);
> -        }
> -        sz -= sec_size;
> -        data_offset += sec_size;
> -    }
> -
> -    ret = qemu_file_get_error(f);
> -
> -    if (!ret && size) {
> -        *size = data_size;
> -    }
> -
> -    bytes_transferred += data_size;
> -    return ret;
> -}
> -
>  static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>                              uint64_t data_size)
>  {
> @@ -351,96 +137,6 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>      return 0;
>  }
>
> -static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
> -                               uint64_t data_size)
> -{
> -    VFIORegion *region = &vbasedev->migration->region;
> -    uint64_t data_offset = 0, size, report_size;
> -    int ret;
> -
> -    do {
> -        ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
> -                      region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -
> -        if (data_offset + data_size > region->size) {
> -            /*
> -             * If data_size is greater than the data section of migration region
> -             * then iterate the write buffer operation. This case can occur if
> -             * size of migration region at destination is smaller than size of
> -             * migration region at source.
> -             */
> -            report_size = size = region->size - data_offset;
> -            data_size -= size;
> -        } else {
> -            report_size = size = data_size;
> -            data_size = 0;
> -        }
> -
> -        trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, size);
> -
> -        while (size) {
> -            void *buf;
> -            uint64_t sec_size;
> -            bool buf_alloc = false;
> -
> -            buf = get_data_section_size(region, data_offset, size, &sec_size);
> -
> -            if (!buf) {
> -                buf = g_try_malloc(sec_size);
> -                if (!buf) {
> -                    error_report("%s: Error allocating buffer ", __func__);
> -                    return -ENOMEM;
> -                }
> -                buf_alloc = true;
> -            }
> -
> -            qemu_get_buffer(f, buf, sec_size);
> -
> -            if (buf_alloc) {
> -                ret = vfio_mig_write(vbasedev, buf, sec_size,
> -                        region->fd_offset + data_offset);
> -                g_free(buf);
> -
> -                if (ret < 0) {
> -                    return ret;
> -                }
> -            }
> -            size -= sec_size;
> -            data_offset += sec_size;
> -        }
> -
> -        ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
> -                        region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    } while (data_size);
> -
> -    return 0;
> -}
> -
> -static int vfio_update_pending(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -    VFIORegion *region = &migration->region;
> -    uint64_t pending_bytes = 0;
> -    int ret;
> -
> -    ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
> -                    region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
> -    if (ret < 0) {
> -        migration->pending_bytes = 0;
> -        return ret;
> -    }
> -
> -    migration->pending_bytes = pending_bytes;
> -    trace_vfio_update_pending(vbasedev->name, pending_bytes);
> -    return 0;
> -}
> -
>  static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -493,15 +189,6 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
>      migration->data_fd = -1;
>  }
>
> -static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
> -{
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (migration->region.mmaps) {
> -        vfio_region_unmap(&migration->region);
> -    }
> -}
> -
>  /* ---------------------------------------------------------------------- */
>
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -516,49 +203,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>      return qemu_file_get_error(f);
>  }
>
> -static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    trace_vfio_save_setup(vbasedev->name);
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> -
> -    if (migration->region.mmaps) {
> -        /*
> -         * Calling vfio_region_mmap() from migration thread. Memory API called
> -         * from this function require locking the iothread when called from
> -         * outside the main loop thread.
> -         */
> -        qemu_mutex_lock_iothread();
> -        ret = vfio_region_mmap(&migration->region);
> -        qemu_mutex_unlock_iothread();
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region: %s",
> -                         vbasedev->name, strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state SAVING", vbasedev->name);
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return 0;
> -}
> -
>  static void vfio_save_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -567,82 +211,6 @@ static void vfio_save_cleanup(void *opaque)
>      trace_vfio_save_cleanup(vbasedev->name);
>  }
>
> -static void vfio_v1_save_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_save_cleanup(vbasedev->name);
> -}
> -
> -static void vfio_save_pending(QEMUFile *f, void *opaque,
> -                              uint64_t threshold_size,
> -                              uint64_t *res_precopy_only,
> -                              uint64_t *res_compatible,
> -                              uint64_t *res_postcopy_only)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret;
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return;
> -    }
> -
> -    *res_precopy_only += migration->pending_bytes;
> -
> -    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
> -                            *res_postcopy_only, *res_compatible);
> -}
> -
> -static int vfio_save_iterate(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -
> -    if (migration->pending_bytes == 0) {
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        if (migration->pending_bytes == 0) {
> -            qemu_put_be64(f, 0);
> -            qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -            /* indicates data finished, goto complete phase */
> -            return 1;
> -        }
> -    }
> -
> -    ret = vfio_save_buffer(f, vbasedev, &data_size);
> -    if (ret) {
> -        error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
> -                     strerror(errno));
> -        return ret;
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    /*
> -     * Reset pending_bytes as .save_live_pending is not called during savevm or
> -     * snapshot case, in such case vfio_update_pending() at the start of this
> -     * function updates pending_bytes.
> -     */
> -    migration->pending_bytes = 0;
> -    trace_vfio_save_iterate(vbasedev->name, data_size);
> -    return 0;
> -}
> -
>  /* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
>  static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
>  {
> @@ -706,62 +274,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>      return 0;
>  }
>
> -static int vfio_v1_save_complete_precopy(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint64_t data_size;
> -    int ret;
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
> -                                      VFIO_DEVICE_STATE_V1_SAVING);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOP and SAVING",
> -                     vbasedev->name);
> -        return ret;
> -    }
> -
> -    ret = vfio_update_pending(vbasedev);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    while (migration->pending_bytes > 0) {
> -        qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
> -        ret = vfio_save_buffer(f, vbasedev, &data_size);
> -        if (ret < 0) {
> -            error_report("%s: Failed to save buffer", vbasedev->name);
> -            return ret;
> -        }
> -
> -        if (data_size == 0) {
> -            break;
> -        }
> -
> -        ret = vfio_update_pending(vbasedev);
> -        if (ret) {
> -            return ret;
> -        }
> -    }
> -
> -    qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> -
> -    ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING,
> -                                      0);
> -    if (ret) {
> -        error_report("%s: Failed to set state STOPPED", vbasedev->name);
> -        return ret;
> -    }
> -
> -    trace_vfio_save_complete_precopy(vbasedev->name);
> -    return ret;
> -}
> -
>  static void vfio_save_state(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -783,33 +295,6 @@ static int vfio_load_setup(QEMUFile *f, void *opaque)
>                                     vbasedev->migration->device_state);
>  }
>
> -static int vfio_v1_load_setup(QEMUFile *f, void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    int ret = 0;
> -
> -    if (migration->region.mmaps) {
> -        ret = vfio_region_mmap(&migration->region);
> -        if (ret) {
> -            error_report("%s: Failed to mmap VFIO migration region %d: %s",
> -                         vbasedev->name, migration->region.nr,
> -                         strerror(-ret));
> -            error_report("%s: Falling back to slow path", vbasedev->name);
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
> -                                      VFIO_DEVICE_STATE_V1_RESUMING);
> -    if (ret) {
> -        error_report("%s: Failed to set state RESUMING", vbasedev->name);
> -        if (migration->region.mmaps) {
> -            vfio_region_unmap(&migration->region);
> -        }
> -    }
> -    return ret;
> -}
> -
>  static int vfio_load_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -819,15 +304,6 @@ static int vfio_load_cleanup(void *opaque)
>      return 0;
>  }
>
> -static int vfio_v1_load_cleanup(void *opaque)
> -{
> -    VFIODevice *vbasedev = opaque;
> -
> -    vfio_migration_v1_cleanup(vbasedev);
> -    trace_vfio_load_cleanup(vbasedev->name);
> -    return 0;
> -}
> -
>  static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>  {
>      VFIODevice *vbasedev = opaque;
> @@ -861,11 +337,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>              uint64_t data_size = qemu_get_be64(f);
>
>              if (data_size) {
> -                if (vbasedev->migration->v2) {
> -                    ret = vfio_load_buffer(f, vbasedev, data_size);
> -                } else {
> -                    ret = vfio_v1_load_buffer(f, vbasedev, data_size);
> -                }
> +                ret = vfio_load_buffer(f, vbasedev, data_size);
>                  if (ret < 0) {
>                      return ret;
>                  }
> @@ -896,18 +368,6 @@ static SaveVMHandlers savevm_vfio_handlers = {
>      .load_state = vfio_load_state,
>  };
>
> -static SaveVMHandlers savevm_vfio_v1_handlers = {
> -    .save_setup = vfio_v1_save_setup,
> -    .save_cleanup = vfio_v1_save_cleanup,
> -    .save_live_pending = vfio_save_pending,
> -    .save_live_iterate = vfio_save_iterate,
> -    .save_live_complete_precopy = vfio_v1_save_complete_precopy,
> -    .save_state = vfio_save_state,
> -    .load_setup = vfio_v1_load_setup,
> -    .load_cleanup = vfio_v1_load_cleanup,
> -    .load_state = vfio_load_state,
> -};
> -
>  /* ---------------------------------------------------------------------- */
>
>  static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> @@ -938,70 +398,12 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>                                new_state);
>  }
>
> -static void vfio_v1_vmstate_change(void *opaque, bool running, RunState state)
> -{
> -    VFIODevice *vbasedev = opaque;
> -    VFIOMigration *migration = vbasedev->migration;
> -    uint32_t value, mask;
> -    int ret;
> -
> -    if (vbasedev->migration->vm_running == running) {
> -        return;
> -    }
> -
> -    if (running) {
> -        /*
> -         * Here device state can have one of _SAVING, _RESUMING or _STOP bit.
> -         * Transition from _SAVING to _RUNNING can happen if there is migration
> -         * failure, in that case clear _SAVING bit.
> -         * Transition from _RESUMING to _RUNNING occurs during resuming
> -         * phase, in that case clear _RESUMING bit.
> -         * In both the above cases, set _RUNNING bit.
> -         */
> -        mask = ~VFIO_DEVICE_STATE_MASK;
> -        value = VFIO_DEVICE_STATE_V1_RUNNING;
> -    } else {
> -        /*
> -         * Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
> -         * _RUNNING bit
> -         */
> -        mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
> -
> -        /*
> -         * When VM state transition to stop for savevm command, device should
> -         * start saving data.
> -         */
> -        if (state == RUN_STATE_SAVE_VM) {
> -            value = VFIO_DEVICE_STATE_V1_SAVING;
> -        } else {
> -            value = 0;
> -        }
> -    }
> -
> -    ret = vfio_migration_v1_set_state(vbasedev, mask, value);
> -    if (ret) {
> -        /*
> -         * Migration should be aborted in this case, but vm_state_notify()
> -         * currently does not support reporting failures.
> -         */
> -        error_report("%s: Failed to set device state 0x%x", vbasedev->name,
> -                     (migration->device_state_v1 & mask) | value);
> -        if (migrate_get_current()->to_dst_file) {
> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> -        }
> -    }
> -    vbasedev->migration->vm_running = running;
> -    trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
> -            (migration->device_state_v1 & mask) | value);
> -}
> -
>  static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>  {
>      MigrationState *s = data;
>      VFIOMigration *migration = container_of(notifier, VFIOMigration,
>                                              migration_state);
>      VFIODevice *vbasedev = migration->vbasedev;
> -    int ret;
>
>      trace_vfio_migration_state_notifier(vbasedev->name,
>                                          MigrationStatus_str(s->state));
> @@ -1011,31 +413,14 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>      case MIGRATION_STATUS_CANCELLED:
>      case MIGRATION_STATUS_FAILED:
>          bytes_transferred = 0;
> -        if (migration->v2) {
> -            vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> -                                     VFIO_DEVICE_STATE_ERROR);
> -        } else {
> -            ret = vfio_migration_v1_set_state(vbasedev,
> -                                              ~(VFIO_DEVICE_STATE_V1_SAVING |
> -                                                VFIO_DEVICE_STATE_V1_RESUMING),
> -                                              VFIO_DEVICE_STATE_V1_RUNNING);
> -            if (ret) {
> -                error_report("%s: Failed to set state RUNNING", vbasedev->name);
> -            }
> -        }
> +        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
> +                                 VFIO_DEVICE_STATE_ERROR);
>      }
>  }
>
>  static void vfio_migration_exit(VFIODevice *vbasedev)
>  {
> -    VFIOMigration *migration = vbasedev->migration;
> -
> -    if (migration->v2) {
> -        g_free(migration->data_buffer);
> -    } else {
> -        vfio_region_exit(&migration->region);
> -        vfio_region_finalize(&migration->region);
> -    }
> +    g_free(vbasedev->migration->data_buffer);
>      g_free(vbasedev->migration);
>      vbasedev->migration = NULL;
>  }
> @@ -1066,7 +451,6 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      VFIOMigration *migration;
>      char id[256] = "";
>      g_autofree char *path = NULL, *oid = NULL;
> -    struct vfio_region_info *info = NULL;
>      uint64_t mig_flags;
>
>      if (!vbasedev->ops->vfio_get_object) {
> @@ -1079,48 +463,20 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      }
>
>      ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> -    if (!ret) {
> -        /* Migration v2 */
> -        /* Basic migration functionality must be supported */
> -        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> -            return -EOPNOTSUPP;
> -        }
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> -        vbasedev->migration->data_buffer =
> -            g_malloc0(vbasedev->migration->data_buffer_size);
> -        vbasedev->migration->data_fd = -1;
> -        vbasedev->migration->v2 = true;
> -    } else {
> -        /* Migration v1 */
> -        ret = vfio_get_dev_region_info(vbasedev,
> -                                       VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> -                                       VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> -                                       &info);
> -        if (ret) {
> -            return ret;
> -        }
> -
> -        vbasedev->migration = g_new0(VFIOMigration, 1);
> -
> -        ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
> -                                info->index, "migration");
> -        if (ret) {
> -            error_report("%s: Failed to setup VFIO migration region %d: %s",
> -                         vbasedev->name, info->index, strerror(-ret));
> -            goto err;
> -        }
> -
> -        if (!vbasedev->migration->region.size) {
> -            error_report("%s: Invalid zero-sized VFIO migration region %d",
> -                         vbasedev->name, info->index);
> -            ret = -EINVAL;
> -            goto err;
> -        }
> +    if (ret) {
> +        return ret;
> +    }
>
> -        g_free(info);
> +    /* Basic migration functionality must be supported */
> +    if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> +        return -EOPNOTSUPP;
>      }
>
> +    vbasedev->migration = g_new0(VFIOMigration, 1);
> +    vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> +    vbasedev->migration->data_buffer =
> +        g_malloc0(vbasedev->migration->data_buffer_size);
> +    vbasedev->migration->data_fd = -1;
>      migration = vbasedev->migration;
>      migration->vbasedev = vbasedev;
>
> @@ -1132,28 +488,16 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>      }
>      strpadcpy(id, sizeof(id), path, '\0');
>
> -    if (migration->v2) {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_vmstate_change, vbasedev);
> -    } else {
> -        register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
> -                             &savevm_vfio_v1_handlers, vbasedev);
> -
> -        migration->vm_state = qdev_add_vm_change_state_handler(
> -            vbasedev->dev, vfio_v1_vmstate_change, vbasedev);
> -    }
> +    register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
> +                         vbasedev);
>
> +    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> +                                                           vfio_vmstate_change,
> +                                                           vbasedev);
>      migration->migration_state.notify = vfio_migration_state_notifier;
>      add_migration_state_change_notifier(&migration->migration_state);
> -    return 0;
>
> -err:
> -    g_free(info);
> -    vfio_migration_exit(vbasedev);
> -    return ret;
> +    return 0;
>  }
>
>  /* ---------------------------------------------------------------------- */
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 6e8c5958b9..a24ea7d8b0 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -154,15 +154,10 @@ vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t
>  vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
>  vfio_save_setup(const char *name) " (%s)"
>  vfio_save_cleanup(const char *name) " (%s)"
> -vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
> -vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>  vfio_save_device_config_state(const char *name) " (%s)"
> -vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
> -vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
>  vfio_save_complete_precopy(const char *name) " (%s)"
>  vfio_load_device_config_state(const char *name) " (%s)"
>  vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
> -vfio_v1_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
>  vfio_load_state_device_data(const char *name, uint64_t data_size) " (%s) size 0x%"PRIx64
>  vfio_load_cleanup(const char *name) " (%s)"
>  vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 2ec3346fea..76d470178f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -61,16 +61,11 @@ typedef struct VFIORegion {
>  typedef struct VFIOMigration {
>      struct VFIODevice *vbasedev;
>      VMChangeStateEntry *vm_state;
> -    VFIORegion region;
> -    uint32_t device_state_v1;
> -    int vm_running;
>      Notifier migration_state;
> -    uint64_t pending_bytes;
>      enum vfio_device_mig_state device_state;
>      int data_fd;
>      void *data_buffer;
>      size_t data_buffer_size;
> -    bool v2;
>  } VFIOMigration;
>
>  typedef struct VFIOAddressSpace {
> --
> 2.21.3
>
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1
  2022-09-19  8:35   ` liulongfang via
@ 2022-09-19 11:50     ` Alex Williamson
  2022-09-19 12:58       ` Philippe Mathieu-Daudé via
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2022-09-19 11:50 UTC (permalink / raw)
  To: liulongfang
  Cc: Avihai Horon, qemu-devel, Cornelia Huck, Juan Quintela,
	Dr . David Alan Gilbert, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Mark Bloch, Maor Gottlieb, Kirti Wankhede,
	Tarun Gupta, Shameerali Kolothum Thodi, jiangkunkun

On Mon, 19 Sep 2022 16:35:49 +0800
liulongfang <liulongfang@huawei.com> wrote:

> On 2022/5/31 1:07, Avihai Horon Wrote:
> > Now that v2 protocol implementation has been added, remove the
> > deprecated v1 implementation.
> > "struct vfio_device_migration_info" still exists in vfio.h,  
> why does qemu need to delete v1 implementation?

It never progressed past experimental support, upstream never committed
to support it, it's dead code relative to the kernel specification now.
Thanks,

Alex



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1
  2022-09-19 11:50     ` Alex Williamson
@ 2022-09-19 12:58       ` Philippe Mathieu-Daudé via
  0 siblings, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé via @ 2022-09-19 12:58 UTC (permalink / raw)
  To: Alex Williamson, Avihai Horon
  Cc: liulongfang, qemu-devel@nongnu.org Developers, Cornelia Huck,
	Juan Quintela, Dr . David Alan Gilbert, Joao Martins,
	Yishai Hadas, Jason Gunthorpe, Mark Bloch, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Shameerali Kolothum Thodi,
	jiangkunkun

On Mon, Sep 19, 2022 at 1:57 PM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Mon, 19 Sep 2022 16:35:49 +0800
> liulongfang <liulongfang@huawei.com> wrote:
>
> > On 2022/5/31 1:07, Avihai Horon Wrote:
> > > Now that v2 protocol implementation has been added, remove the
> > > deprecated v1 implementation.
> > > "struct vfio_device_migration_info" still exists in vfio.h,
> > why does qemu need to delete v1 implementation?
>
> It never progressed past experimental support, upstream never committed
> to support it, it's dead code relative to the kernel specification now.

Avihai, do you mind adding Alex explanation in your commit description please?

> Thanks,
>
> Alex
>
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails
  2022-05-30 17:07 ` [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails Avihai Horon
@ 2022-10-11  1:41   ` liulongfang via
  0 siblings, 0 replies; 31+ messages in thread
From: liulongfang via @ 2022-10-11  1:41 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel, Cornelia Huck, Alex Williamson,
	Juan Quintela, Dr . David Alan Gilbert
  Cc: Joao Martins, Yishai Hadas, Jason Gunthorpe, Mark Bloch,
	Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 2022/5/31 1:07, Avihai Horon wrote:
> If vfio_migration_set_state() fails to set the device in the requested
> state it tries to put it in a recover state. If setting the device in
> the recover state fails as well, hw_error is triggered and the VM is
> aborted.
> 
> To improve user experience and avoid VM data loss, reset the device with
> VFIO_RESET_DEVICE instead of aborting the VM.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/migration.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 852759e6ca..6c34502611 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>          /* Try to put the device in some good state */
>          mig_state->device_state = recover_state;
>          if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -            hw_error("%s: Device in error state, can't recover",
> -                     vbasedev->name);
> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> +                hw_error("%s: Device in error state, can't recover",
> +                         vbasedev->name);
> +            }
> +
> +            error_report(
> +                "%s: Device was reset due to failure in changing device state to recover state %s",
> +                vbasedev->name, mig_state_to_str(recover_state));
> +
> +            return -1;
>          }
> 

When I used the qemu 7.1.50 version compiled with this set of patches,
I found that after the migration failed due to disconnecting the destination VM
during the live migration process, when I exited the source qemu, the
following error would appear:

[100337.287047] BUG: Bad page state in process qemu-system-aar  pfn:82199518
[100337.295815] page:00000000356de4da refcount:-2 mapcount:0 mapping:00000000000
00000 index:0x0 pfn:0x82199518
[100337.306403] flags: 0xbfff80000000000(node=0|zone=2|lastcpupid=0x7fff)
[100337.314091] raw: 0bfff80000000000 dead000000000100 dead000000000122 00000000
00000000
[100337.322589] raw: 0000000000000000 0000000000000000 fffffffeffffffff 00000000
00000000
[100337.330630] page dumped because: nonzero _refcount
[100337.335840] Modules linked in: hisi_acc_vfio_pci hisi_sec2 hisi_zip hisi_hpr
e hisi_qm uacce vfio_iommu_type1 vfio_pci vfio_pci_core vfio_virqfd vfio pv680_m
ii(O) [last unloaded: hisi_sec2]
[100337.354564] CPU: 1 PID: 786 Comm: qemu-system-aar Tainted: G    B      O
   6.0.0-rc4+ #1
[100337.377378] Call trace:
[100337.380382]  dump_backtrace.part.0+0xc4/0xd0
[100337.385791]  show_stack+0x24/0x40
[100337.389478]  dump_stack_lvl+0x68/0x84
[100337.394155]  dump_stack+0x18/0x34
[100337.398006]  bad_page+0xf0/0x120
[100337.401796]  check_free_page_bad+0x84/0x90
[100337.406404]  free_pcppages_bulk+0x1bc/0x2b0
[100337.411126]  free_unref_page_commit+0x120/0x15c
[100337.416935]  free_unref_page+0x15c/0x254
[100337.421436]  free_compound_page+0x6c/0x100
[100337.425868]  free_transhuge_page+0xd4/0x140
[100337.430535]  destroy_large_folio+0x30/0x40
[100337.434953]  release_pages+0x1bc/0x4d0
[100337.439268]  free_pages_and_swap_cache+0x68/0x80
[100337.444224]  tlb_batch_pages_flush+0x5c/0x94
[100337.448976]  tlb_flush_mmu+0x4c/0xd4
[100337.453062]  unmap_page_range+0x8d0/0xbd0
[100337.457432]  unmap_single_vma+0x90/0x12c
[100337.461673]  unmap_vmas+0x84/0xfc
[100337.465354]  exit_mmap+0x88/0x1b0
[100337.469008]  __mmput+0x48/0x134
[100337.472637]  mmput+0x44/0x50
[100337.475857]  do_exit+0x2b8/0x970
[100337.479641]  do_group_exit+0x40/0xac
[100337.484079]  get_signal+0x8c0/0x934
[100337.488215]  do_notify_resume+0x1d0/0x1570
[100337.492795]  el0_svc+0xa8/0xc0
[100337.496452]  el0t_64_sync_handler+0x1ac/0x1b0
[100337.501187]  el0t_64_sync+0x19c/0x1a0

Can anyone see what is causing this error?

>          error_report("%s: Failed changing device state to %s", vbasedev->name,
> 
Thanks
Longfang.


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-10-11  1:42 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-05-30 17:07 ` [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2022-05-30 17:07 ` [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported Avihai Horon
2022-05-30 17:12   ` Avihai Horon
2022-06-07 17:53     ` Avihai Horon
2022-05-30 17:07 ` [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
2022-05-30 17:07 ` [PATCH v2 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2022-05-30 17:07 ` [PATCH v2 05/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2022-05-30 17:07 ` [PATCH v2 06/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-06-14 11:08   ` Joao Martins
2022-06-14 16:34     ` Avihai Horon
2022-06-14 17:24       ` Joao Martins
2022-06-15  6:40         ` Avihai Horon
2022-07-18 15:12   ` Jason Gunthorpe
2022-07-27 15:45     ` Avihai Horon
2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2022-09-19  8:35   ` liulongfang via
2022-09-19 11:50     ` Alex Williamson
2022-09-19 12:58       ` Philippe Mathieu-Daudé via
2022-09-19  9:41   ` Philippe Mathieu-Daudé via
2022-05-30 17:07 ` [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails Avihai Horon
2022-10-11  1:41   ` liulongfang via
2022-05-30 17:07 ` [PATCH v2 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2022-05-30 17:07 ` [PATCH v2 11/11] docs/devel: Align vfio-migration docs to VFIO migration v2 Avihai Horon
2022-06-07 17:44 ` [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-06-07 21:32   ` Alex Williamson
2022-06-13 11:21     ` Avihai Horon
2022-06-17 21:51       ` Alex Williamson
2022-06-23 14:56         ` Jason Gunthorpe
2022-06-27  7:36         ` Avihai Horon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.