qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration
@ 2023-07-31 10:25 Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

Hi all,

The first patch in this series adds a small optimization to VFIO
migration by moving the STOP_COPY->STOP transition to
vfio_save_cleanup(). Testing with a ConnectX-7 VFIO device showed that
this can reduce downtime by up to 6%.

The rest of the series adds P2P support for VFIO migration.

VFIO migration uAPI defines an optional intermediate P2P quiescent
state. While in the P2P quiescent state, P2P DMA transactions cannot be
initiated by the device, but the device can respond to incoming ones.
Additionally, all outstanding P2P transactions are guaranteed to have
been completed by the time the device enters this state.

The purpose of this state is to support migration of multiple devices
that might do P2P transactions between themselves.

To implement P2P migration support, all the devices must be transitioned
to the P2P quiescent state before being stopped or started.

This behavior is achieved by adding an optional prepare callback to
VMChangeStateEntry. These callbacks are invoked before the main VM state
change callbacks, transitioning all the VFIO devices to the P2P state,
and only then are the main callbacks invoked, which stop or start the
devices.

This will allow migration of multiple VFIO devices if all of them
support P2P migration.

Thanks.

Changes from v1 [1]:
* Rebased on latest master branch.
* Renamed pre_change_cb to prepare_cb and adjusted relevant code
  accordingly. (Cedric)
* Split VFIO vmstate change handler to two separate handlers, one for
  prepare callback and another for main callback. (Cedric)
* Renamed vfio_should_block_multiple_devices_migration() to
  vfio_multiple_devices_migration_is_supported() and reverted its logic
  accordingly. (Cedric)
* Rephrased "that are doing P2P" to "that might do P2P" in docs and
  commit message. (Jason)
* Added Cedric R-b tag to patch #4.

[1]
https://lore.kernel.org/qemu-devel/20230716081541.27900-1-avihaih@nvidia.com/

Avihai Horon (5):
  vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
  sysemu: Add prepare callback to struct VMChangeStateEntry
  qdev: Add qdev_add_vm_change_state_handler_full()
  vfio/migration: Add P2P support for VFIO migration
  vfio/migration: Allow migration of multiple P2P supporting devices

Joao Martins (1):
  vfio/migration: Refactor PRE_COPY and RUNNING state checks

 docs/devel/vfio-migration.rst     | 93 +++++++++++++++++++------------
 include/hw/vfio/vfio-common.h     |  2 +
 include/sysemu/runstate.h         |  7 +++
 hw/core/vm-change-state-handler.c | 14 ++++-
 hw/vfio/common.c                  | 50 +++++++++++++----
 hw/vfio/migration.c               | 76 ++++++++++++++++++++-----
 softmmu/runstate.c                | 40 +++++++++++++
 hw/vfio/trace-events              |  1 +
 8 files changed, 219 insertions(+), 64 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry Avihai Horon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

Changing the device state from STOP_COPY to STOP can take time as the
device may need to free resources and do other operations as part of the
transition. Currently, this is done in vfio_save_complete_precopy() and
therefore it is counted in the migration downtime.

To avoid this, change the device state from STOP_COPY to STOP in
vfio_save_cleanup(), which is called after migration has completed and
thus is not part of migration downtime.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 2674f4bc47..8acd182a8b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
 
+    /*
+     * Changing device state from STOP_COPY to STOP can take time. Do it here,
+     * after migration has completed, so it won't increase downtime.
+     */
+    if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
+        /*
+         * If setting the device in STOP state fails, the device should be
+         * reset. To do so, use ERROR state as a recover state.
+         */
+        vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+                                 VFIO_DEVICE_STATE_ERROR);
+    }
+
     g_free(migration->data_buffer);
     migration->data_buffer = NULL;
     migration->precopy_init_size = 0;
@@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    /*
-     * If setting the device in STOP state fails, the device should be reset.
-     * To do so, use ERROR state as a recover state.
-     */
-    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
-                                   VFIO_DEVICE_STATE_ERROR);
     trace_vfio_save_complete_precopy(vbasedev->name, ret);
 
     return ret;
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-08-01 12:03   ` Cédric Le Goater
  2023-07-31 10:25 ` [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full() Avihai Horon
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

Add prepare callback to struct VMChangeStateEntry.

The prepare callback is optional and can be set by the new function
qemu_add_vm_change_state_handler_prio_full() that allows setting this
callback in addition to the main callback.

The prepare callbacks and main callbacks are called in two separate
phases: First all prepare callbacks are called and only then all main
callbacks are called.

The purpose of the new prepare callback is to allow all devices to run a
preliminary task before calling the devices' main callbacks.

This will facilitate adding P2P support for VFIO migration where all
VFIO devices need to be put in an intermediate P2P quiescent state
before being stopped or started by the main callback.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 include/sysemu/runstate.h |  4 ++++
 softmmu/runstate.c        | 40 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..764a0fc6a4 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -16,6 +16,10 @@ VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
                                                      void *opaque);
 VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
         VMChangeStateHandler *cb, void *opaque, int priority);
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+                                           VMChangeStateHandler *prepare_cb,
+                                           void *opaque, int priority);
 VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
                                                      VMChangeStateHandler *cb,
                                                      void *opaque);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f3bd862818..1652ed0439 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -271,6 +271,7 @@ void qemu_system_vmstop_request(RunState state)
 }
 struct VMChangeStateEntry {
     VMChangeStateHandler *cb;
+    VMChangeStateHandler *prepare_cb;
     void *opaque;
     QTAILQ_ENTRY(VMChangeStateEntry) entries;
     int priority;
@@ -293,12 +294,39 @@ static QTAILQ_HEAD(, VMChangeStateEntry) vm_change_state_head =
  */
 VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
         VMChangeStateHandler *cb, void *opaque, int priority)
+{
+    return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
+                                                      priority);
+}
+
+/**
+ * qemu_add_vm_change_state_handler_prio_full:
+ * @cb: the main callback to invoke
+ * @prepare_cb: a callback to invoke before the main callback
+ * @opaque: user data passed to the callbacks
+ * @priority: low priorities execute first when the vm runs and the reverse is
+ *            true when the vm stops
+ *
+ * Register a main callback function and an optional prepare callback function
+ * that are invoked when the vm starts or stops running. The main callback and
+ * the prepare callback are called in two separate phases: First all prepare
+ * callbacks are called and only then all main callbacks are called. As its
+ * name suggests, the prepare callback can be used to do some preparatory work
+ * before invoking the main callback.
+ *
+ * Returns: an entry to be freed using qemu_del_vm_change_state_handler()
+ */
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+                                           VMChangeStateHandler *prepare_cb,
+                                           void *opaque, int priority)
 {
     VMChangeStateEntry *e;
     VMChangeStateEntry *other;
 
     e = g_malloc0(sizeof(*e));
     e->cb = cb;
+    e->prepare_cb = prepare_cb;
     e->opaque = opaque;
     e->priority = priority;
 
@@ -333,10 +361,22 @@ void vm_state_notify(bool running, RunState state)
     trace_vm_state_notify(running, state, RunState_str(state));
 
     if (running) {
+        QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
+            if (e->prepare_cb) {
+                e->prepare_cb(e->opaque, running, state);
+            }
+        }
+
         QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
             e->cb(e->opaque, running, state);
         }
     } else {
+        QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
+            if (e->prepare_cb) {
+                e->prepare_cb(e->opaque, running, state);
+            }
+        }
+
         QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
             e->cb(e->opaque, running, state);
         }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full()
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-08-01 12:04   ` Cédric Le Goater
  2023-07-31 10:25 ` [PATCH for-8.2 v2 4/6] vfio/migration: Refactor PRE_COPY and RUNNING state checks Avihai Horon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

Add qdev_add_vm_change_state_handler_full() variant that allows setting
a prepare callback in addition to the main callback.

This will facilitate adding P2P support for VFIO migration in the
following patches.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 include/sysemu/runstate.h         |  3 +++
 hw/core/vm-change-state-handler.c | 14 +++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 764a0fc6a4..08afb97695 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -23,6 +23,9 @@ qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
 VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
                                                      VMChangeStateHandler *cb,
                                                      void *opaque);
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+    DeviceState *dev, VMChangeStateHandler *cb,
+    VMChangeStateHandler *prepare_cb, void *opaque);
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
 /**
  * vm_state_notify: Notify the state of the VM
diff --git a/hw/core/vm-change-state-handler.c b/hw/core/vm-change-state-handler.c
index 1f3630986d..8e2639224e 100644
--- a/hw/core/vm-change-state-handler.c
+++ b/hw/core/vm-change-state-handler.c
@@ -55,8 +55,20 @@ static int qdev_get_dev_tree_depth(DeviceState *dev)
 VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
                                                      VMChangeStateHandler *cb,
                                                      void *opaque)
+{
+    return qdev_add_vm_change_state_handler_full(dev, cb, NULL, opaque);
+}
+
+/*
+ * Exactly like qdev_add_vm_change_state_handler() but passes a prepare_cb
+ * argument too.
+ */
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+    DeviceState *dev, VMChangeStateHandler *cb,
+    VMChangeStateHandler *prepare_cb, void *opaque)
 {
     int depth = qdev_get_dev_tree_depth(dev);
 
-    return qemu_add_vm_change_state_handler_prio(cb, opaque, depth);
+    return qemu_add_vm_change_state_handler_prio_full(cb, prepare_cb, opaque,
+                                                      depth);
 }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 4/6] vfio/migration: Refactor PRE_COPY and RUNNING state checks
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
                   ` (2 preceding siblings ...)
  2023-07-31 10:25 ` [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full() Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
  2023-07-31 10:25 ` [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices Avihai Horon
  5 siblings, 0 replies; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

From: Joao Martins <joao.m.martins@oracle.com>

Move the PRE_COPY and RUNNING state checks to helper functions.

This is in preparation for adding P2P VFIO migration support, where
these helpers will also test for PRE_COPY_P2P and RUNNING_P2P states.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c              | 22 ++++++++++++++++++----
 hw/vfio/migration.c           | 10 ++++------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index da43d27352..e9b8954595 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -230,6 +230,8 @@ void vfio_unblock_multiple_devices_migration(void);
 bool vfio_viommu_preset(VFIODevice *vbasedev);
 int64_t vfio_mig_bytes_transferred(void);
 void vfio_reset_bytes_transferred(void);
+bool vfio_device_state_is_running(VFIODevice *vbasedev);
+bool vfio_device_state_is_precopy(VFIODevice *vbasedev);
 
 #ifdef CONFIG_LINUX
 int vfio_get_region_info(VFIODevice *vbasedev, int index,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9aac21abb7..16cf79a76c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -437,6 +437,20 @@ static void vfio_set_migration_error(int err)
     }
 }
 
+bool vfio_device_state_is_running(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    return migration->device_state == VFIO_DEVICE_STATE_RUNNING;
+}
+
+bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+
+    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
+}
+
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
     VFIOGroup *group;
@@ -457,8 +471,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
             }
 
             if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
-                (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
-                 migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) {
+                (vfio_device_state_is_running(vbasedev) ||
+                 vfio_device_state_is_precopy(vbasedev))) {
                 return false;
             }
         }
@@ -503,8 +517,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
                 return false;
             }
 
-            if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
-                migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
+            if (vfio_device_state_is_running(vbasedev) ||
+                vfio_device_state_is_precopy(vbasedev)) {
                 continue;
             } else {
                 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 8acd182a8b..48f9c23cbe 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -411,7 +411,7 @@ static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
 
-    if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
+    if (!vfio_device_state_is_precopy(vbasedev)) {
         return;
     }
 
@@ -444,7 +444,7 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
     vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
     *must_precopy += stop_copy_size;
 
-    if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
+    if (vfio_device_state_is_precopy(vbasedev)) {
         vfio_query_precopy_size(migration);
 
         *must_precopy +=
@@ -459,9 +459,8 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
 static bool vfio_is_active_iterate(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
 
-    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
+    return vfio_device_state_is_precopy(vbasedev);
 }
 
 static int vfio_save_iterate(QEMUFile *f, void *opaque)
@@ -656,7 +655,6 @@ static const SaveVMHandlers savevm_vfio_handlers = {
 static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
-    VFIOMigration *migration = vbasedev->migration;
     enum vfio_device_mig_state new_state;
     int ret;
 
@@ -664,7 +662,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
         new_state = VFIO_DEVICE_STATE_RUNNING;
     } else {
         new_state =
-            (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
+            (vfio_device_state_is_precopy(vbasedev) &&
              (state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
                 VFIO_DEVICE_STATE_STOP_COPY :
                 VFIO_DEVICE_STATE_STOP;
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
                   ` (3 preceding siblings ...)
  2023-07-31 10:25 ` [PATCH for-8.2 v2 4/6] vfio/migration: Refactor PRE_COPY and RUNNING state checks Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-08-01 12:02   ` Cédric Le Goater
  2023-07-31 10:25 ` [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices Avihai Horon
  5 siblings, 1 reply; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

VFIO migration uAPI defines an optional intermediate P2P quiescent
state. While in the P2P quiescent state, P2P DMA transactions cannot be
initiated by the device, but the device can respond to incoming ones.
Additionally, all outstanding P2P transactions are guaranteed to have
been completed by the time the device enters this state.

The purpose of this state is to support migration of multiple devices
that might do P2P transactions between themselves.

Add support for P2P migration by transitioning all the devices to the
P2P quiescent state before stopping or starting the devices. Use the new
VMChangeStateHandler prepare_cb to achieve that behavior.

This will allow migration of multiple VFIO devices if all of them
support P2P migration.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 docs/devel/vfio-migration.rst | 93 +++++++++++++++++++++--------------
 hw/vfio/common.c              |  6 ++-
 hw/vfio/migration.c           | 47 ++++++++++++++++--
 hw/vfio/trace-events          |  1 +
 4 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index b433cb5bb2..605fe60e96 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and loaded in the destination
 before stopping the source VM. Enabling this migration capability will
 guarantee that and thus, can potentially reduce downtime even further.
 
-Note that currently VFIO migration is supported only for a single device. This
-is due to VFIO migration's lack of P2P support. However, P2P support is planned
-to be added later on.
+To support migration of multiple devices that might do P2P transactions between
+themselves, VFIO migration uAPI defines an intermediate P2P quiescent state.
+While in the P2P quiescent state, P2P DMA transactions cannot be initiated by
+the device, but the device can respond to incoming ones. Additionally, all
+outstanding P2P transactions are guaranteed to have been completed by the time
+the device enters this state.
+
+All the devices that support P2P migration are first transitioned to the P2P
+quiescent state and only then are they stopped or started. This makes migration
+safe P2P-wise, since starting and stopping the devices is not done atomically
+for all the devices together.
+
+Thus, multiple VFIO devices migration is allowed only if all the devices
+support P2P migration. Single VFIO device migration is allowed regardless of
+P2P migration support.
 
 A detailed description of the UAPI for VFIO device migration can be found in
 the comment for the ``vfio_device_mig_state`` structure in the header file
@@ -132,54 +144,63 @@ will be blocked.
 Flow of state changes during Live migration
 ===========================================
 
-Below is the flow of state change during live migration.
+Below is the state change flow during live migration for a VFIO device that
+supports both precopy and P2P migration. The flow for devices that don't
+support it is similar, except that the relevant states for precopy and P2P are
+skipped.
 The values in the parentheses represent the VM state, the migration state, and
 the VFIO device state, respectively.
-The text in the square brackets represents the flow if the VFIO device supports
-pre-copy.
 
 Live migration save path
 ------------------------
 
 ::
 
-                        QEMU normal running state
-                        (RUNNING, _NONE, _RUNNING)
-                                  |
+                           QEMU normal running state
+                           (RUNNING, _NONE, _RUNNING)
+                                      |
                      migrate_init spawns migration_thread
-                Migration thread then calls each device's .save_setup()
-                  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
-                                  |
-                  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
-      If device is active, get pending_bytes by .state_pending_{estimate,exact}()
-          If total pending_bytes >= threshold_size, call .save_live_iterate()
-                  [Data of VFIO device for pre-copy phase is copied]
-        Iterate till total pending bytes converge and are less than threshold
-                                  |
-  On migration completion, vCPU stops and calls .save_live_complete_precopy for
-  each active device. The VFIO device is then transitioned into _STOP_COPY state
-                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
-                                  |
-     For the VFIO device, iterate in .save_live_complete_precopy until
-                         pending data is 0
-                   (FINISH_MIGRATE, _DEVICE, _STOP)
-                                  |
-                 (FINISH_MIGRATE, _COMPLETED, _STOP)
-             Migraton thread schedules cleanup bottom half and exits
+            Migration thread then calls each device's .save_setup()
+                          (RUNNING, _SETUP, _PRE_COPY)
+                                      |
+                         (RUNNING, _ACTIVE, _PRE_COPY)
+  If device is active, get pending_bytes by .state_pending_{estimate,exact}()
+       If total pending_bytes >= threshold_size, call .save_live_iterate()
+                Data of VFIO device for pre-copy phase is copied
+      Iterate till total pending bytes converge and are less than threshold
+                                      |
+       On migration completion, the vCPUs and the VFIO device are stopped
+              The VFIO device is first put in P2P quiescent state
+                    (FINISH_MIGRATE, _ACTIVE, _PRE_COPY_P2P)
+                                      |
+                Then the VFIO device is put in _STOP_COPY state
+                     (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
+         .save_live_complete_precopy() is called for each active device
+      For the VFIO device, iterate in .save_live_complete_precopy() until
+                               pending data is 0
+                                      |
+                     (POSTMIGRATE, _COMPLETED, _STOP_COPY)
+            Migraton thread schedules cleanup bottom half and exits
+                                      |
+                           .save_cleanup() is called
+                        (POSTMIGRATE, _COMPLETED, _STOP)
 
 Live migration resume path
 --------------------------
 
 ::
 
-              Incoming migration calls .load_setup for each device
-                       (RESTORE_VM, _ACTIVE, _STOP)
-                                 |
-       For each device, .load_state is called for that device section data
-                       (RESTORE_VM, _ACTIVE, _RESUMING)
-                                 |
-    At the end, .load_cleanup is called for each device and vCPUs are started
-                       (RUNNING, _NONE, _RUNNING)
+             Incoming migration calls .load_setup() for each device
+                          (RESTORE_VM, _ACTIVE, _STOP)
+                                      |
+     For each device, .load_state() is called for that device section data
+                        (RESTORE_VM, _ACTIVE, _RESUMING)
+                                      |
+  At the end, .load_cleanup() is called for each device and vCPUs are started
+              The VFIO device is first put in P2P quiescent state
+                        (RUNNING, _ACTIVE, _RUNNING_P2P)
+                                      |
+                           (RUNNING, _NONE, _RUNNING)
 
 Postcopy
 ========
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 16cf79a76c..7c3d636025 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -441,14 +441,16 @@ bool vfio_device_state_is_running(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
-    return migration->device_state == VFIO_DEVICE_STATE_RUNNING;
+    return migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+           migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P;
 }
 
 bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
 
-    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
+    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
+           migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
 }
 
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 48f9c23cbe..05b1dac2ed 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -71,8 +71,12 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
         return "STOP_COPY";
     case VFIO_DEVICE_STATE_RESUMING:
         return "RESUMING";
+    case VFIO_DEVICE_STATE_RUNNING_P2P:
+        return "RUNNING_P2P";
     case VFIO_DEVICE_STATE_PRE_COPY:
         return "PRE_COPY";
+    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+        return "PRE_COPY_P2P";
     default:
         return "UNKNOWN STATE";
     }
@@ -652,6 +656,43 @@ static const SaveVMHandlers savevm_vfio_handlers = {
 
 /* ---------------------------------------------------------------------- */
 
+static void vfio_vmstate_change_prepare(void *opaque, bool running,
+                                        RunState state)
+{
+    VFIODevice *vbasedev = opaque;
+    VFIOMigration *migration = vbasedev->migration;
+    enum vfio_device_mig_state new_state;
+    int ret;
+
+    if (!(migration->mig_flags & VFIO_MIGRATION_P2P)) {
+        return;
+    }
+
+    new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
+                    VFIO_DEVICE_STATE_PRE_COPY_P2P :
+                    VFIO_DEVICE_STATE_RUNNING_P2P;
+
+    /*
+     * If setting the device in new_state fails, the device should be reset.
+     * To do so, use ERROR state as a recover state.
+     */
+    ret = vfio_migration_set_state(vbasedev, new_state,
+                                   VFIO_DEVICE_STATE_ERROR);
+    if (ret) {
+        /*
+         * Migration should be aborted in this case, but vm_state_notify()
+         * currently does not support reporting failures.
+         */
+        if (migrate_get_current()->to_dst_file) {
+            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+        }
+    }
+
+    trace_vfio_vmstate_change_prepare(vbasedev->name, running,
+                                      RunState_str(state),
+                                      mig_state_to_str(new_state));
+}
+
 static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
@@ -798,9 +839,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
     register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
                          vbasedev);
 
-    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
-                                                           vfio_vmstate_change,
-                                                           vbasedev);
+    migration->vm_state = qdev_add_vm_change_state_handler_full(
+        vbasedev->dev, vfio_vmstate_change, vfio_vmstate_change_prepare,
+        vbasedev);
     migration->migration_state.notify = vfio_migration_state_notifier;
     add_migration_state_change_notifier(&migration->migration_state);
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index ee7509e68e..329736a738 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -167,3 +167,4 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
 vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
 vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
+vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices
  2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
                   ` (4 preceding siblings ...)
  2023-07-31 10:25 ` [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
@ 2023-07-31 10:25 ` Avihai Horon
  2023-08-01 12:03   ` Cédric Le Goater
  5 siblings, 1 reply; 12+ messages in thread
From: Avihai Horon @ 2023-07-31 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Joao Martins, Yishai Hadas, Jason Gunthorpe, Maor Gottlieb,
	Kirti Wankhede, Tarun Gupta, Avihai Horon

Now that P2P support has been added to VFIO migration, allow migration
of multiple devices if all of them support P2P migration.

Single device migration is allowed regardless of P2P migration support.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 hw/vfio/common.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c3d636025..8a8d074e18 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -363,21 +363,31 @@ bool vfio_mig_active(void)
 
 static Error *multiple_devices_migration_blocker;
 
-static unsigned int vfio_migratable_device_num(void)
+/*
+ * Multiple devices migration is allowed only if all devices support P2P
+ * migration. Single device migration is allowed regardless of P2P migration
+ * support.
+ */
+static bool vfio_multiple_devices_migration_is_supported(void)
 {
     VFIOGroup *group;
     VFIODevice *vbasedev;
     unsigned int device_num = 0;
+    bool all_support_p2p = true;
 
     QLIST_FOREACH(group, &vfio_group_list, next) {
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             if (vbasedev->migration) {
                 device_num++;
+
+                if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
+                    all_support_p2p = false;
+                }
             }
         }
     }
 
-    return device_num;
+    return all_support_p2p || device_num <= 1;
 }
 
 int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
@@ -385,19 +395,19 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
     int ret;
 
     if (multiple_devices_migration_blocker ||
-        vfio_migratable_device_num() <= 1) {
+        vfio_multiple_devices_migration_is_supported()) {
         return 0;
     }
 
     if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
-        error_setg(errp, "Migration is currently not supported with multiple "
-                         "VFIO devices");
+        error_setg(errp, "Multiple VFIO devices migration is supported only if "
+                         "all of them support P2P migration");
         return -EINVAL;
     }
 
     error_setg(&multiple_devices_migration_blocker,
-               "Migration is currently not supported with multiple "
-               "VFIO devices");
+               "Multiple VFIO devices migration is supported only if all of "
+               "them support P2P migration");
     ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
     if (ret < 0) {
         error_free(multiple_devices_migration_blocker);
@@ -410,7 +420,7 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
 void vfio_unblock_multiple_devices_migration(void)
 {
     if (!multiple_devices_migration_blocker ||
-        vfio_migratable_device_num() > 1) {
+        !vfio_multiple_devices_migration_is_supported()) {
         return;
     }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration
  2023-07-31 10:25 ` [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
@ 2023-08-01 12:02   ` Cédric Le Goater
  2023-08-01 13:24     ` Avihai Horon
  0 siblings, 1 reply; 12+ messages in thread
From: Cédric Le Goater @ 2023-08-01 12:02 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Paolo Bonzini, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 7/31/23 12:25, Avihai Horon wrote:
> VFIO migration uAPI defines an optional intermediate P2P quiescent
> state. While in the P2P quiescent state, P2P DMA transactions cannot be
> initiated by the device, but the device can respond to incoming ones.
> Additionally, all outstanding P2P transactions are guaranteed to have
> been completed by the time the device enters this state.
> 
> The purpose of this state is to support migration of multiple devices
> that might do P2P transactions between themselves.
> 
> Add support for P2P migration by transitioning all the devices to the
> P2P quiescent state before stopping or starting the devices. Use the new
> VMChangeStateHandler prepare_cb to achieve that behavior.
> 
> This will allow migration of multiple VFIO devices if all of them
> support P2P migration.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   docs/devel/vfio-migration.rst | 93 +++++++++++++++++++++--------------
>   hw/vfio/common.c              |  6 ++-
>   hw/vfio/migration.c           | 47 ++++++++++++++++--
>   hw/vfio/trace-events          |  1 +
>   4 files changed, 106 insertions(+), 41 deletions(-)
> 
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index b433cb5bb2..605fe60e96 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and loaded in the destination
>   before stopping the source VM. Enabling this migration capability will
>   guarantee that and thus, can potentially reduce downtime even further.
>   
> -Note that currently VFIO migration is supported only for a single device. This
> -is due to VFIO migration's lack of P2P support. However, P2P support is planned
> -to be added later on.
> +To support migration of multiple devices that might do P2P transactions between
> +themselves, VFIO migration uAPI defines an intermediate P2P quiescent state.
> +While in the P2P quiescent state, P2P DMA transactions cannot be initiated by
> +the device, but the device can respond to incoming ones. Additionally, all
> +outstanding P2P transactions are guaranteed to have been completed by the time
> +the device enters this state.
> +
> +All the devices that support P2P migration are first transitioned to the P2P
> +quiescent state and only then are they stopped or started. This makes migration
> +safe P2P-wise, since starting and stopping the devices is not done atomically
> +for all the devices together.
> +
> +Thus, multiple VFIO devices migration is allowed only if all the devices
> +support P2P migration. Single VFIO device migration is allowed regardless of
> +P2P migration support.
>   
>   A detailed description of the UAPI for VFIO device migration can be found in
>   the comment for the ``vfio_device_mig_state`` structure in the header file
> @@ -132,54 +144,63 @@ will be blocked.
>   Flow of state changes during Live migration
>   ===========================================
>   
> -Below is the flow of state change during live migration.
> +Below is the state change flow during live migration for a VFIO device that
> +supports both precopy and P2P migration. The flow for devices that don't
> +support it is similar, except that the relevant states for precopy and P2P are
> +skipped.
>   The values in the parentheses represent the VM state, the migration state, and
>   the VFIO device state, respectively.
> -The text in the square brackets represents the flow if the VFIO device supports
> -pre-copy.
>   
>   Live migration save path
>   ------------------------
>   
>   ::
>   
> -                        QEMU normal running state
> -                        (RUNNING, _NONE, _RUNNING)
> -                                  |
> +                           QEMU normal running state
> +                           (RUNNING, _NONE, _RUNNING)
> +                                      |
>                        migrate_init spawns migration_thread
> -                Migration thread then calls each device's .save_setup()
> -                  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
> -                                  |
> -                  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
> -      If device is active, get pending_bytes by .state_pending_{estimate,exact}()
> -          If total pending_bytes >= threshold_size, call .save_live_iterate()
> -                  [Data of VFIO device for pre-copy phase is copied]
> -        Iterate till total pending bytes converge and are less than threshold
> -                                  |
> -  On migration completion, vCPU stops and calls .save_live_complete_precopy for
> -  each active device. The VFIO device is then transitioned into _STOP_COPY state
> -                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
> -                                  |
> -     For the VFIO device, iterate in .save_live_complete_precopy until
> -                         pending data is 0
> -                   (FINISH_MIGRATE, _DEVICE, _STOP)
> -                                  |
> -                 (FINISH_MIGRATE, _COMPLETED, _STOP)
> -             Migraton thread schedules cleanup bottom half and exits
> +            Migration thread then calls each device's .save_setup()
> +                          (RUNNING, _SETUP, _PRE_COPY)
> +                                      |
> +                         (RUNNING, _ACTIVE, _PRE_COPY)
> +  If device is active, get pending_bytes by .state_pending_{estimate,exact}()
> +       If total pending_bytes >= threshold_size, call .save_live_iterate()
> +                Data of VFIO device for pre-copy phase is copied
> +      Iterate till total pending bytes converge and are less than threshold
> +                                      |
> +       On migration completion, the vCPUs and the VFIO device are stopped
> +              The VFIO device is first put in P2P quiescent state
> +                    (FINISH_MIGRATE, _ACTIVE, _PRE_COPY_P2P)
> +                                      |
> +                Then the VFIO device is put in _STOP_COPY state
> +                     (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
> +         .save_live_complete_precopy() is called for each active device
> +      For the VFIO device, iterate in .save_live_complete_precopy() until
> +                               pending data is 0
> +                                      |
> +                     (POSTMIGRATE, _COMPLETED, _STOP_COPY)
> +            Migraton thread schedules cleanup bottom half and exits
> +                                      |
> +                           .save_cleanup() is called
> +                        (POSTMIGRATE, _COMPLETED, _STOP)
>   
>   Live migration resume path
>   --------------------------
>   
>   ::
>   
> -              Incoming migration calls .load_setup for each device
> -                       (RESTORE_VM, _ACTIVE, _STOP)
> -                                 |
> -       For each device, .load_state is called for that device section data
> -                       (RESTORE_VM, _ACTIVE, _RESUMING)
> -                                 |
> -    At the end, .load_cleanup is called for each device and vCPUs are started
> -                       (RUNNING, _NONE, _RUNNING)
> +             Incoming migration calls .load_setup() for each device
> +                          (RESTORE_VM, _ACTIVE, _STOP)
> +                                      |
> +     For each device, .load_state() is called for that device section data
> +                        (RESTORE_VM, _ACTIVE, _RESUMING)
> +                                      |
> +  At the end, .load_cleanup() is called for each device and vCPUs are started
> +              The VFIO device is first put in P2P quiescent state
> +                        (RUNNING, _ACTIVE, _RUNNING_P2P)
> +                                      |
> +                           (RUNNING, _NONE, _RUNNING)
>   
>   Postcopy
>   ========
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 16cf79a76c..7c3d636025 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -441,14 +441,16 @@ bool vfio_device_state_is_running(VFIODevice *vbasedev)
>   {
>       VFIOMigration *migration = vbasedev->migration;
>   
> -    return migration->device_state == VFIO_DEVICE_STATE_RUNNING;
> +    return migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> +           migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P;
>   }
>   
>   bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
>   {
>       VFIOMigration *migration = vbasedev->migration;
>   
> -    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
> +    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
> +           migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
>   }
>   
>   static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 48f9c23cbe..05b1dac2ed 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -71,8 +71,12 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
>           return "STOP_COPY";
>       case VFIO_DEVICE_STATE_RESUMING:
>           return "RESUMING";
> +    case VFIO_DEVICE_STATE_RUNNING_P2P:
> +        return "RUNNING_P2P";
>       case VFIO_DEVICE_STATE_PRE_COPY:
>           return "PRE_COPY";
> +    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
> +        return "PRE_COPY_P2P";
>       default:
>           return "UNKNOWN STATE";
>       }
> @@ -652,6 +656,43 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>   
>   /* ---------------------------------------------------------------------- */
>   
> +static void vfio_vmstate_change_prepare(void *opaque, bool running,
> +                                        RunState state)
> +{
> +    VFIODevice *vbasedev = opaque;
> +    VFIOMigration *migration = vbasedev->migration;
> +    enum vfio_device_mig_state new_state;
> +    int ret;
> +
> +    if (!(migration->mig_flags & VFIO_MIGRATION_P2P)) {
> +        return;
> +    }
> +
> +    new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
> +                    VFIO_DEVICE_STATE_PRE_COPY_P2P :
> +                    VFIO_DEVICE_STATE_RUNNING_P2P;
> +
> +    /*
> +     * If setting the device in new_state fails, the device should be reset.
> +     * To do so, use ERROR state as a recover state.
> +     */
> +    ret = vfio_migration_set_state(vbasedev, new_state,
> +                                   VFIO_DEVICE_STATE_ERROR);
> +    if (ret) {
> +        /*
> +         * Migration should be aborted in this case, but vm_state_notify()
> +         * currently does not support reporting failures.
> +         */
> +        if (migrate_get_current()->to_dst_file) {
> +            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> +        }
> +    }
> +
> +    trace_vfio_vmstate_change_prepare(vbasedev->name, running,
> +                                      RunState_str(state),
> +                                      mig_state_to_str(new_state));
> +}
> +
>   static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>   {
>       VFIODevice *vbasedev = opaque;
> @@ -798,9 +839,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>       register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
>                            vbasedev);
>   
> -    migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
> -                                                           vfio_vmstate_change,
> -                                                           vbasedev);
> +    migration->vm_state = qdev_add_vm_change_state_handler_full(
> +        vbasedev->dev, vfio_vmstate_change, vfio_vmstate_change_prepare,
> +        vbasedev);


We could also avoid registering the prepare handler if !VFIO_MIGRATION_P2P.

Thanks,

C.



>       migration->migration_state.notify = vfio_migration_state_notifier;
>       add_migration_state_change_notifier(&migration->migration_state);
>   
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index ee7509e68e..329736a738 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -167,3 +167,4 @@ vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer
>   vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>   vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry
  2023-07-31 10:25 ` [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry Avihai Horon
@ 2023-08-01 12:03   ` Cédric Le Goater
  0 siblings, 0 replies; 12+ messages in thread
From: Cédric Le Goater @ 2023-08-01 12:03 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Paolo Bonzini, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 7/31/23 12:25, Avihai Horon wrote:
> Add prepare callback to struct VMChangeStateEntry.
> 
> The prepare callback is optional and can be set by the new function
> qemu_add_vm_change_state_handler_prio_full() that allows setting this
> callback in addition to the main callback.
> 
> The prepare callbacks and main callbacks are called in two separate
> phases: First all prepare callbacks are called and only then all main
> callbacks are called.
> 
> The purpose of the new prepare callback is to allow all devices to run a
> preliminary task before calling the devices' main callbacks.
> 
> This will facilitate adding P2P support for VFIO migration where all
> VFIO devices need to be put in an intermediate P2P quiescent state
> before being stopped or started by the main callback.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/sysemu/runstate.h |  4 ++++
>   softmmu/runstate.c        | 40 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 44 insertions(+)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 7beb29c2e2..764a0fc6a4 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -16,6 +16,10 @@ VMChangeStateEntry *qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
>                                                        void *opaque);
>   VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
>           VMChangeStateHandler *cb, void *opaque, int priority);
> +VMChangeStateEntry *
> +qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
> +                                           VMChangeStateHandler *prepare_cb,
> +                                           void *opaque, int priority);
>   VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
>                                                        VMChangeStateHandler *cb,
>                                                        void *opaque);
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index f3bd862818..1652ed0439 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -271,6 +271,7 @@ void qemu_system_vmstop_request(RunState state)
>   }
>   struct VMChangeStateEntry {
>       VMChangeStateHandler *cb;
> +    VMChangeStateHandler *prepare_cb;
>       void *opaque;
>       QTAILQ_ENTRY(VMChangeStateEntry) entries;
>       int priority;
> @@ -293,12 +294,39 @@ static QTAILQ_HEAD(, VMChangeStateEntry) vm_change_state_head =
>    */
>   VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
>           VMChangeStateHandler *cb, void *opaque, int priority)
> +{
> +    return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
> +                                                      priority);
> +}
> +
> +/**
> + * qemu_add_vm_change_state_handler_prio_full:
> + * @cb: the main callback to invoke
> + * @prepare_cb: a callback to invoke before the main callback
> + * @opaque: user data passed to the callbacks
> + * @priority: low priorities execute first when the vm runs and the reverse is
> + *            true when the vm stops
> + *
> + * Register a main callback function and an optional prepare callback function
> + * that are invoked when the vm starts or stops running. The main callback and
> + * the prepare callback are called in two separate phases: First all prepare
> + * callbacks are called and only then all main callbacks are called. As its
> + * name suggests, the prepare callback can be used to do some preparatory work
> + * before invoking the main callback.
> + *
> + * Returns: an entry to be freed using qemu_del_vm_change_state_handler()
> + */
> +VMChangeStateEntry *
> +qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
> +                                           VMChangeStateHandler *prepare_cb,
> +                                           void *opaque, int priority)
>   {
>       VMChangeStateEntry *e;
>       VMChangeStateEntry *other;
>   
>       e = g_malloc0(sizeof(*e));
>       e->cb = cb;
> +    e->prepare_cb = prepare_cb;
>       e->opaque = opaque;
>       e->priority = priority;
>   
> @@ -333,10 +361,22 @@ void vm_state_notify(bool running, RunState state)
>       trace_vm_state_notify(running, state, RunState_str(state));
>   
>       if (running) {
> +        QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
> +            if (e->prepare_cb) {
> +                e->prepare_cb(e->opaque, running, state);
> +            }
> +        }
> +
>           QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
>               e->cb(e->opaque, running, state);
>           }
>       } else {
> +        QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
> +            if (e->prepare_cb) {
> +                e->prepare_cb(e->opaque, running, state);
> +            }
> +        }
> +
>           QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
>               e->cb(e->opaque, running, state);
>           }



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices
  2023-07-31 10:25 ` [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices Avihai Horon
@ 2023-08-01 12:03   ` Cédric Le Goater
  0 siblings, 0 replies; 12+ messages in thread
From: Cédric Le Goater @ 2023-08-01 12:03 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Paolo Bonzini, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 7/31/23 12:25, Avihai Horon wrote:
> Now that P2P support has been added to VFIO migration, allow migration
> of multiple devices if all of them support P2P migration.
> 
> Single device migration is allowed regardless of P2P migration support.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/common.c | 26 ++++++++++++++++++--------
>   1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7c3d636025..8a8d074e18 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -363,21 +363,31 @@ bool vfio_mig_active(void)
>   
>   static Error *multiple_devices_migration_blocker;
>   
> -static unsigned int vfio_migratable_device_num(void)
> +/*
> + * Multiple devices migration is allowed only if all devices support P2P
> + * migration. Single device migration is allowed regardless of P2P migration
> + * support.
> + */
> +static bool vfio_multiple_devices_migration_is_supported(void)
>   {
>       VFIOGroup *group;
>       VFIODevice *vbasedev;
>       unsigned int device_num = 0;
> +    bool all_support_p2p = true;
>   
>       QLIST_FOREACH(group, &vfio_group_list, next) {
>           QLIST_FOREACH(vbasedev, &group->device_list, next) {
>               if (vbasedev->migration) {
>                   device_num++;
> +
> +                if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
> +                    all_support_p2p = false;
> +                }
>               }
>           }
>       }
>   
> -    return device_num;
> +    return all_support_p2p || device_num <= 1;
>   }
>   
>   int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
> @@ -385,19 +395,19 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>       int ret;
>   
>       if (multiple_devices_migration_blocker ||
> -        vfio_migratable_device_num() <= 1) {
> +        vfio_multiple_devices_migration_is_supported()) {
>           return 0;
>       }
>   
>       if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
> -        error_setg(errp, "Migration is currently not supported with multiple "
> -                         "VFIO devices");
> +        error_setg(errp, "Multiple VFIO devices migration is supported only if "
> +                         "all of them support P2P migration");
>           return -EINVAL;
>       }
>   
>       error_setg(&multiple_devices_migration_blocker,
> -               "Migration is currently not supported with multiple "
> -               "VFIO devices");
> +               "Multiple VFIO devices migration is supported only if all of "
> +               "them support P2P migration");
>       ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
>       if (ret < 0) {
>           error_free(multiple_devices_migration_blocker);
> @@ -410,7 +420,7 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>   void vfio_unblock_multiple_devices_migration(void)
>   {
>       if (!multiple_devices_migration_blocker ||
> -        vfio_migratable_device_num() > 1) {
> +        !vfio_multiple_devices_migration_is_supported()) {
>           return;
>       }
>   



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full()
  2023-07-31 10:25 ` [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full() Avihai Horon
@ 2023-08-01 12:04   ` Cédric Le Goater
  0 siblings, 0 replies; 12+ messages in thread
From: Cédric Le Goater @ 2023-08-01 12:04 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel
  Cc: Alex Williamson, Paolo Bonzini, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede, Tarun Gupta

On 7/31/23 12:25, Avihai Horon wrote:
> Add qdev_add_vm_change_state_handler_full() variant that allows setting
> a prepare callback in addition to the main callback.
> 
> This will facilitate adding P2P support for VFIO migration in the
> following patches.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/sysemu/runstate.h         |  3 +++
>   hw/core/vm-change-state-handler.c | 14 +++++++++++++-
>   2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 764a0fc6a4..08afb97695 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -23,6 +23,9 @@ qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
>   VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
>                                                        VMChangeStateHandler *cb,
>                                                        void *opaque);
> +VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
> +    DeviceState *dev, VMChangeStateHandler *cb,
> +    VMChangeStateHandler *prepare_cb, void *opaque);
>   void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
>   /**
>    * vm_state_notify: Notify the state of the VM
> diff --git a/hw/core/vm-change-state-handler.c b/hw/core/vm-change-state-handler.c
> index 1f3630986d..8e2639224e 100644
> --- a/hw/core/vm-change-state-handler.c
> +++ b/hw/core/vm-change-state-handler.c
> @@ -55,8 +55,20 @@ static int qdev_get_dev_tree_depth(DeviceState *dev)
>   VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
>                                                        VMChangeStateHandler *cb,
>                                                        void *opaque)
> +{
> +    return qdev_add_vm_change_state_handler_full(dev, cb, NULL, opaque);
> +}
> +
> +/*
> + * Exactly like qdev_add_vm_change_state_handler() but passes a prepare_cb
> + * argument too.
> + */
> +VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
> +    DeviceState *dev, VMChangeStateHandler *cb,
> +    VMChangeStateHandler *prepare_cb, void *opaque)
>   {
>       int depth = qdev_get_dev_tree_depth(dev);
>   
> -    return qemu_add_vm_change_state_handler_prio(cb, opaque, depth);
> +    return qemu_add_vm_change_state_handler_prio_full(cb, prepare_cb, opaque,
> +                                                      depth);
>   }



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration
  2023-08-01 12:02   ` Cédric Le Goater
@ 2023-08-01 13:24     ` Avihai Horon
  0 siblings, 0 replies; 12+ messages in thread
From: Avihai Horon @ 2023-08-01 13:24 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Alex Williamson, Paolo Bonzini, Joao Martins, Yishai Hadas,
	Jason Gunthorpe, Maor Gottlieb, Kirti Wankhede, Tarun Gupta


On 01/08/2023 15:02, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 7/31/23 12:25, Avihai Horon wrote:
>> VFIO migration uAPI defines an optional intermediate P2P quiescent
>> state. While in the P2P quiescent state, P2P DMA transactions cannot be
>> initiated by the device, but the device can respond to incoming ones.
>> Additionally, all outstanding P2P transactions are guaranteed to have
>> been completed by the time the device enters this state.
>>
>> The purpose of this state is to support migration of multiple devices
>> that might do P2P transactions between themselves.
>>
>> Add support for P2P migration by transitioning all the devices to the
>> P2P quiescent state before stopping or starting the devices. Use the new
>> VMChangeStateHandler prepare_cb to achieve that behavior.
>>
>> This will allow migration of multiple VFIO devices if all of them
>> support P2P migration.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   docs/devel/vfio-migration.rst | 93 +++++++++++++++++++++--------------
>>   hw/vfio/common.c              |  6 ++-
>>   hw/vfio/migration.c           | 47 ++++++++++++++++--
>>   hw/vfio/trace-events          |  1 +
>>   4 files changed, 106 insertions(+), 41 deletions(-)
>>
>> diff --git a/docs/devel/vfio-migration.rst 
>> b/docs/devel/vfio-migration.rst
>> index b433cb5bb2..605fe60e96 100644
>> --- a/docs/devel/vfio-migration.rst
>> +++ b/docs/devel/vfio-migration.rst
>> @@ -23,9 +23,21 @@ and recommends that the initial bytes are sent and 
>> loaded in the destination
>>   before stopping the source VM. Enabling this migration capability will
>>   guarantee that and thus, can potentially reduce downtime even further.
>>
>> -Note that currently VFIO migration is supported only for a single 
>> device. This
>> -is due to VFIO migration's lack of P2P support. However, P2P support 
>> is planned
>> -to be added later on.
>> +To support migration of multiple devices that might do P2P 
>> transactions between
>> +themselves, VFIO migration uAPI defines an intermediate P2P 
>> quiescent state.
>> +While in the P2P quiescent state, P2P DMA transactions cannot be 
>> initiated by
>> +the device, but the device can respond to incoming ones. 
>> Additionally, all
>> +outstanding P2P transactions are guaranteed to have been completed 
>> by the time
>> +the device enters this state.
>> +
>> +All the devices that support P2P migration are first transitioned to 
>> the P2P
>> +quiescent state and only then are they stopped or started. This 
>> makes migration
>> +safe P2P-wise, since starting and stopping the devices is not done 
>> atomically
>> +for all the devices together.
>> +
>> +Thus, multiple VFIO devices migration is allowed only if all the 
>> devices
>> +support P2P migration. Single VFIO device migration is allowed 
>> regardless of
>> +P2P migration support.
>>
>>   A detailed description of the UAPI for VFIO device migration can be 
>> found in
>>   the comment for the ``vfio_device_mig_state`` structure in the 
>> header file
>> @@ -132,54 +144,63 @@ will be blocked.
>>   Flow of state changes during Live migration
>>   ===========================================
>>
>> -Below is the flow of state change during live migration.
>> +Below is the state change flow during live migration for a VFIO 
>> device that
>> +supports both precopy and P2P migration. The flow for devices that 
>> don't
>> +support it is similar, except that the relevant states for precopy 
>> and P2P are
>> +skipped.
>>   The values in the parentheses represent the VM state, the migration 
>> state, and
>>   the VFIO device state, respectively.
>> -The text in the square brackets represents the flow if the VFIO 
>> device supports
>> -pre-copy.
>>
>>   Live migration save path
>>   ------------------------
>>
>>   ::
>>
>> -                        QEMU normal running state
>> -                        (RUNNING, _NONE, _RUNNING)
>> -                                  |
>> +                           QEMU normal running state
>> +                           (RUNNING, _NONE, _RUNNING)
>> +                                      |
>>                        migrate_init spawns migration_thread
>> -                Migration thread then calls each device's .save_setup()
>> -                  (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
>> -                                  |
>> -                  (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
>> -      If device is active, get pending_bytes by 
>> .state_pending_{estimate,exact}()
>> -          If total pending_bytes >= threshold_size, call 
>> .save_live_iterate()
>> -                  [Data of VFIO device for pre-copy phase is copied]
>> -        Iterate till total pending bytes converge and are less than 
>> threshold
>> -                                  |
>> -  On migration completion, vCPU stops and calls 
>> .save_live_complete_precopy for
>> -  each active device. The VFIO device is then transitioned into 
>> _STOP_COPY state
>> -                  (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
>> -                                  |
>> -     For the VFIO device, iterate in .save_live_complete_precopy until
>> -                         pending data is 0
>> -                   (FINISH_MIGRATE, _DEVICE, _STOP)
>> -                                  |
>> -                 (FINISH_MIGRATE, _COMPLETED, _STOP)
>> -             Migraton thread schedules cleanup bottom half and exits
>> +            Migration thread then calls each device's .save_setup()
>> +                          (RUNNING, _SETUP, _PRE_COPY)
>> +                                      |
>> +                         (RUNNING, _ACTIVE, _PRE_COPY)
>> +  If device is active, get pending_bytes by 
>> .state_pending_{estimate,exact}()
>> +       If total pending_bytes >= threshold_size, call 
>> .save_live_iterate()
>> +                Data of VFIO device for pre-copy phase is copied
>> +      Iterate till total pending bytes converge and are less than 
>> threshold
>> +                                      |
>> +       On migration completion, the vCPUs and the VFIO device are 
>> stopped
>> +              The VFIO device is first put in P2P quiescent state
>> +                    (FINISH_MIGRATE, _ACTIVE, _PRE_COPY_P2P)
>> +                                      |
>> +                Then the VFIO device is put in _STOP_COPY state
>> +                     (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
>> +         .save_live_complete_precopy() is called for each active device
>> +      For the VFIO device, iterate in .save_live_complete_precopy() 
>> until
>> +                               pending data is 0
>> +                                      |
>> +                     (POSTMIGRATE, _COMPLETED, _STOP_COPY)
>> +            Migraton thread schedules cleanup bottom half and exits
>> +                                      |
>> +                           .save_cleanup() is called
>> +                        (POSTMIGRATE, _COMPLETED, _STOP)
>>
>>   Live migration resume path
>>   --------------------------
>>
>>   ::
>>
>> -              Incoming migration calls .load_setup for each device
>> -                       (RESTORE_VM, _ACTIVE, _STOP)
>> -                                 |
>> -       For each device, .load_state is called for that device 
>> section data
>> -                       (RESTORE_VM, _ACTIVE, _RESUMING)
>> -                                 |
>> -    At the end, .load_cleanup is called for each device and vCPUs 
>> are started
>> -                       (RUNNING, _NONE, _RUNNING)
>> +             Incoming migration calls .load_setup() for each device
>> +                          (RESTORE_VM, _ACTIVE, _STOP)
>> +                                      |
>> +     For each device, .load_state() is called for that device 
>> section data
>> +                        (RESTORE_VM, _ACTIVE, _RESUMING)
>> +                                      |
>> +  At the end, .load_cleanup() is called for each device and vCPUs 
>> are started
>> +              The VFIO device is first put in P2P quiescent state
>> +                        (RUNNING, _ACTIVE, _RUNNING_P2P)
>> +                                      |
>> +                           (RUNNING, _NONE, _RUNNING)
>>
>>   Postcopy
>>   ========
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 16cf79a76c..7c3d636025 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -441,14 +441,16 @@ bool vfio_device_state_is_running(VFIODevice 
>> *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>
>> -    return migration->device_state == VFIO_DEVICE_STATE_RUNNING;
>> +    return migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
>> +           migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P;
>>   }
>>
>>   bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>
>> -    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
>> +    return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
>> +           migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
>>   }
>>
>>   static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 48f9c23cbe..05b1dac2ed 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -71,8 +71,12 @@ static const char *mig_state_to_str(enum 
>> vfio_device_mig_state state)
>>           return "STOP_COPY";
>>       case VFIO_DEVICE_STATE_RESUMING:
>>           return "RESUMING";
>> +    case VFIO_DEVICE_STATE_RUNNING_P2P:
>> +        return "RUNNING_P2P";
>>       case VFIO_DEVICE_STATE_PRE_COPY:
>>           return "PRE_COPY";
>> +    case VFIO_DEVICE_STATE_PRE_COPY_P2P:
>> +        return "PRE_COPY_P2P";
>>       default:
>>           return "UNKNOWN STATE";
>>       }
>> @@ -652,6 +656,43 @@ static const SaveVMHandlers savevm_vfio_handlers 
>> = {
>>
>>   /* 
>> ---------------------------------------------------------------------- 
>> */
>>
>> +static void vfio_vmstate_change_prepare(void *opaque, bool running,
>> +                                        RunState state)
>> +{
>> +    VFIODevice *vbasedev = opaque;
>> +    VFIOMigration *migration = vbasedev->migration;
>> +    enum vfio_device_mig_state new_state;
>> +    int ret;
>> +
>> +    if (!(migration->mig_flags & VFIO_MIGRATION_P2P)) {
>> +        return;
>> +    }
>> +
>> +    new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
>> +                    VFIO_DEVICE_STATE_PRE_COPY_P2P :
>> +                    VFIO_DEVICE_STATE_RUNNING_P2P;
>> +
>> +    /*
>> +     * If setting the device in new_state fails, the device should 
>> be reset.
>> +     * To do so, use ERROR state as a recover state.
>> +     */
>> +    ret = vfio_migration_set_state(vbasedev, new_state,
>> +                                   VFIO_DEVICE_STATE_ERROR);
>> +    if (ret) {
>> +        /*
>> +         * Migration should be aborted in this case, but 
>> vm_state_notify()
>> +         * currently does not support reporting failures.
>> +         */
>> +        if (migrate_get_current()->to_dst_file) {
>> + qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>> +        }
>> +    }
>> +
>> +    trace_vfio_vmstate_change_prepare(vbasedev->name, running,
>> +                                      RunState_str(state),
>> + mig_state_to_str(new_state));
>> +}
>> +
>>   static void vfio_vmstate_change(void *opaque, bool running, 
>> RunState state)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> @@ -798,9 +839,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
>>       register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, 
>> &savevm_vfio_handlers,
>>                            vbasedev);
>>
>> -    migration->vm_state = 
>> qdev_add_vm_change_state_handler(vbasedev->dev,
>> - vfio_vmstate_change,
>> - vbasedev);
>> +    migration->vm_state = qdev_add_vm_change_state_handler_full(
>> +        vbasedev->dev, vfio_vmstate_change, 
>> vfio_vmstate_change_prepare,
>> +        vbasedev);
>
>
> We could also avoid registering the prepare handler if 
> !VFIO_MIGRATION_P2P.

Oh, right.
I will remove the check from vfio_vmstate_change_prepare() and add it 
here instead.

Thanks.

>
>>       migration->migration_state.notify = vfio_migration_state_notifier;
>> add_migration_state_change_notifier(&migration->migration_state);
>>
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index ee7509e68e..329736a738 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -167,3 +167,4 @@ vfio_save_setup(const char *name, uint64_t 
>> data_buffer_size) " (%s) data buffer
>>   vfio_state_pending_estimate(const char *name, uint64_t precopy, 
>> uint64_t postcopy, uint64_t precopy_init_size, uint64_t 
>> precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" 
>> precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>>   vfio_state_pending_exact(const char *name, uint64_t precopy, 
>> uint64_t postcopy, uint64_t stopcopy_size, uint64_t 
>> precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 
>> 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy 
>> initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>>   vfio_vmstate_change(const char *name, int running, const char 
>> *reason, const char *dev_state) " (%s) running %d reason %s device 
>> state %s"
>> +vfio_vmstate_change_prepare(const char *name, int running, const 
>> char *reason, const char *dev_state) " (%s) running %d reason %s 
>> device state %s"
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-01 13:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-31 10:25 [PATCH for-8.2 v2 0/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
2023-07-31 10:25 ` [PATCH for-8.2 v2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() Avihai Horon
2023-07-31 10:25 ` [PATCH for-8.2 v2 2/6] sysemu: Add prepare callback to struct VMChangeStateEntry Avihai Horon
2023-08-01 12:03   ` Cédric Le Goater
2023-07-31 10:25 ` [PATCH for-8.2 v2 3/6] qdev: Add qdev_add_vm_change_state_handler_full() Avihai Horon
2023-08-01 12:04   ` Cédric Le Goater
2023-07-31 10:25 ` [PATCH for-8.2 v2 4/6] vfio/migration: Refactor PRE_COPY and RUNNING state checks Avihai Horon
2023-07-31 10:25 ` [PATCH for-8.2 v2 5/6] vfio/migration: Add P2P support for VFIO migration Avihai Horon
2023-08-01 12:02   ` Cédric Le Goater
2023-08-01 13:24     ` Avihai Horon
2023-07-31 10:25 ` [PATCH for-8.2 v2 6/6] vfio/migration: Allow migration of multiple P2P supporting devices Avihai Horon
2023-08-01 12:03   ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).