All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 00/14] allow cpr-reboot for vfio
@ 2024-02-22 17:28 Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 01/14] notify: pass error to notifier with return Steve Sistare
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.  The
user is responsible for suspending the guest before initiating cpr, such as
by issuing guest-suspend-ram to the qemu guest agent.

Most of the patches in this series enhance migration notifiers so they can
return an error status and message.  The last few patches register a notifier
for vfio that returns an error if the guest is not suspended.

Changes in V3:
  * update to tip, add RB's
  * replace MigrationStatus with new enum MigrationEventType
  * simplify migrate_fd_connect error recovery
  * support vfio iommufd containers
  * add patches:
      migration: stop vm for cpr
      migration: update cpr-reboot description

Changes in V4:
  * rebase to tip, add RB's
  * add patch to prevent options such as precopy from being used with cpr.
      migration: options incompatible with cpr
  * refactor subroutines in "stop vm for cpr"
  * simplify "notifier error checking" patch by restricting that only
    notifiers for MIG_EVENT_PRECOPY_SETUP may return an error.
  * doc that a fail event may be sent without a prior setup event

Steve Sistare (14):
  notify: pass error to notifier with return
  migration: remove error from notifier data
  migration: convert to NotifierWithReturn
  migration: MigrationEvent for notifiers
  migration: remove postcopy_after_devices
  migration: MigrationNotifyFunc
  migration: per-mode notifiers
  migration: refactor migrate_fd_connect failures
  migration: notifier error checking
  migration: stop vm for cpr
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended
  migration: update cpr-reboot description
  migration: options incompatible with cpr

 hw/net/virtio-net.c                   |  13 +--
 hw/vfio/common.c                      |   2 +-
 hw/vfio/container.c                   |  11 ++-
 hw/vfio/cpr.c                         |  39 +++++++++
 hw/vfio/iommufd.c                     |   6 ++
 hw/vfio/meson.build                   |   1 +
 hw/vfio/migration.c                   |  15 ++--
 hw/vfio/trace-events                  |   2 +-
 hw/virtio/vhost-user.c                |  10 +--
 hw/virtio/virtio-balloon.c            |   3 +-
 include/hw/vfio/vfio-common.h         |   5 +-
 include/hw/vfio/vfio-container-base.h |   1 +
 include/hw/virtio/virtio-net.h        |   2 +-
 include/migration/misc.h              |  47 +++++++++--
 include/qemu/notify.h                 |   8 +-
 migration/migration.c                 | 148 +++++++++++++++++++++++-----------
 migration/migration.h                 |   4 -
 migration/postcopy-ram.c              |   3 +-
 migration/postcopy-ram.h              |   1 -
 migration/ram.c                       |   3 +-
 net/vhost-vdpa.c                      |  14 ++--
 qapi/migration.json                   |  37 ++++++---
 roms/seabios-hppa                     |   2 +-
 ui/spice-core.c                       |  17 ++--
 util/notify.c                         |   5 +-
 25 files changed, 275 insertions(+), 124 deletions(-)
 create mode 100644 hw/vfio/cpr.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH V4 01/14] notify: pass error to notifier with return
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 02/14] migration: remove error from notifier data Steve Sistare
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Pass an error object as the third parameter to "notifier with return"
notifiers, so clients no longer need to bundle an error object in the
opaque data.  The new parameter is used in a later patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/vhost-user.c     | 2 +-
 hw/virtio/virtio-balloon.c | 3 ++-
 include/qemu/notify.h      | 7 +++++--
 migration/postcopy-ram.c   | 2 +-
 migration/ram.c            | 2 +-
 util/notify.c              | 5 +++--
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f214df8..f502345 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2084,7 +2084,7 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
 }
 
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
-                                        void *opaque)
+                                        void *opaque, Error **errp)
 {
     struct PostcopyNotifyData *pnd = opaque;
     struct vhost_user *u = container_of(notifier, struct vhost_user,
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 486fe3d..89f853f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -633,7 +633,8 @@ static void virtio_balloon_free_page_done(VirtIOBalloon *s)
 }
 
 static int
-virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data,
+                                     Error **errp)
 {
     VirtIOBalloon *dev = container_of(n, VirtIOBalloon, free_page_hint_notify);
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
diff --git a/include/qemu/notify.h b/include/qemu/notify.h
index bcfa70f..9a85631 100644
--- a/include/qemu/notify.h
+++ b/include/qemu/notify.h
@@ -45,12 +45,15 @@ bool notifier_list_empty(NotifierList *list);
 /* Same as Notifier but allows .notify() to return errors */
 typedef struct NotifierWithReturn NotifierWithReturn;
 
+typedef int (*NotifierWithReturnFunc)(NotifierWithReturn *notifier, void *data,
+                                      Error **errp);
+
 struct NotifierWithReturn {
     /**
      * Return 0 on success (next notifier will be invoked), otherwise
      * notifier_with_return_list_notify() will stop and return the value.
      */
-    int (*notify)(NotifierWithReturn *notifier, void *data);
+    NotifierWithReturnFunc notify;
     QLIST_ENTRY(NotifierWithReturn) node;
 };
 
@@ -69,6 +72,6 @@ void notifier_with_return_list_add(NotifierWithReturnList *list,
 void notifier_with_return_remove(NotifierWithReturn *notifier);
 
 int notifier_with_return_list_notify(NotifierWithReturnList *list,
-                                     void *data);
+                                     void *data, Error **errp);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 893ec8f..3ab2f6b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -80,7 +80,7 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
     pnd.errp = errp;
 
     return notifier_with_return_list_notify(&postcopy_notifier_list,
-                                            &pnd);
+                                            &pnd, errp);
 }
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index 4649a81..5b6b09e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -428,7 +428,7 @@ int precopy_notify(PrecopyNotifyReason reason, Error **errp)
     pnd.reason = reason;
     pnd.errp = errp;
 
-    return notifier_with_return_list_notify(&precopy_notifier_list, &pnd);
+    return notifier_with_return_list_notify(&precopy_notifier_list, &pnd, errp);
 }
 
 uint64_t ram_bytes_remaining(void)
diff --git a/util/notify.c b/util/notify.c
index 76bab21..c6e158f 100644
--- a/util/notify.c
+++ b/util/notify.c
@@ -61,13 +61,14 @@ void notifier_with_return_remove(NotifierWithReturn *notifier)
     QLIST_REMOVE(notifier, node);
 }
 
-int notifier_with_return_list_notify(NotifierWithReturnList *list, void *data)
+int notifier_with_return_list_notify(NotifierWithReturnList *list, void *data,
+                                     Error **errp)
 {
     NotifierWithReturn *notifier, *next;
     int ret = 0;
 
     QLIST_FOREACH_SAFE(notifier, &list->notifiers, node, next) {
-        ret = notifier->notify(notifier, data);
+        ret = notifier->notify(notifier, data, errp);
         if (ret != 0) {
             break;
         }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 02/14] migration: remove error from notifier data
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 01/14] notify: pass error to notifier with return Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 03/14] migration: convert to NotifierWithReturn Steve Sistare
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Remove the error object from opaque data passed to notifiers.
Use the new error parameter passed to the notifier instead.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/vhost-user.c   | 8 ++++----
 include/migration/misc.h | 1 -
 migration/postcopy-ram.c | 1 -
 migration/postcopy-ram.h | 1 -
 migration/ram.c          | 1 -
 5 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index f502345..a1eea85 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2096,20 +2096,20 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
         if (!virtio_has_feature(dev->protocol_features,
                                 VHOST_USER_PROTOCOL_F_PAGEFAULT)) {
             /* TODO: Get the device name into this error somehow */
-            error_setg(pnd->errp,
+            error_setg(errp,
                        "vhost-user backend not capable of postcopy");
             return -ENOENT;
         }
         break;
 
     case POSTCOPY_NOTIFY_INBOUND_ADVISE:
-        return vhost_user_postcopy_advise(dev, pnd->errp);
+        return vhost_user_postcopy_advise(dev, errp);
 
     case POSTCOPY_NOTIFY_INBOUND_LISTEN:
-        return vhost_user_postcopy_listen(dev, pnd->errp);
+        return vhost_user_postcopy_listen(dev, errp);
 
     case POSTCOPY_NOTIFY_INBOUND_END:
-        return vhost_user_postcopy_end(dev, pnd->errp);
+        return vhost_user_postcopy_end(dev, errp);
 
     default:
         /* We ignore notifications we don't know */
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 1bc8902..5e65c18 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -31,7 +31,6 @@ typedef enum PrecopyNotifyReason {
 
 typedef struct PrecopyNotifyData {
     enum PrecopyNotifyReason reason;
-    Error **errp;
 } PrecopyNotifyData;
 
 void precopy_infrastructure_init(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3ab2f6b..0273dc6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -77,7 +77,6 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
 {
     struct PostcopyNotifyData pnd;
     pnd.reason = reason;
-    pnd.errp = errp;
 
     return notifier_with_return_list_notify(&postcopy_notifier_list,
                                             &pnd, errp);
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 442ab89..ecae941 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -128,7 +128,6 @@ enum PostcopyNotifyReason {
 
 struct PostcopyNotifyData {
     enum PostcopyNotifyReason reason;
-    Error **errp;
 };
 
 void postcopy_add_notifier(NotifierWithReturn *nn);
diff --git a/migration/ram.c b/migration/ram.c
index 5b6b09e..45a00b4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -426,7 +426,6 @@ int precopy_notify(PrecopyNotifyReason reason, Error **errp)
 {
     PrecopyNotifyData pnd;
     pnd.reason = reason;
-    pnd.errp = errp;
 
     return notifier_with_return_list_notify(&precopy_notifier_list, &pnd, errp);
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 03/14] migration: convert to NotifierWithReturn
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 01/14] notify: pass error to notifier with return Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 02/14] migration: remove error from notifier data Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 04/14] migration: MigrationEvent for notifiers Steve Sistare
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Change all migration notifiers to type NotifierWithReturn, so notifiers
can return an error status in a future patch.  For now, pass NULL for the
notifier error parameter, and do not check the return value.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/net/virtio-net.c            |  4 +++-
 hw/vfio/migration.c            |  4 +++-
 include/hw/vfio/vfio-common.h  |  2 +-
 include/hw/virtio/virtio-net.h |  2 +-
 include/migration/misc.h       |  6 +++---
 include/qemu/notify.h          |  1 +
 migration/migration.c          | 16 ++++++++--------
 net/vhost-vdpa.c               |  6 ++++--
 roms/seabios-hppa              |  2 +-
 ui/spice-core.c                |  8 +++++---
 10 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5a79bc3..75f4e86 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3534,11 +3534,13 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
     }
 }
 
-static void virtio_net_migration_state_notifier(Notifier *notifier, void *data)
+static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
+                                               void *data, Error **errp)
 {
     MigrationState *s = data;
     VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
     virtio_net_handle_migration_primary(n, s);
+    return 0;
 }
 
 static bool failover_hide_primary_device(DeviceListener *listener,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 70e6b1a..6b6acc4 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -754,7 +754,8 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
                               mig_state_to_str(new_state));
 }
 
-static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
+                                         void *data, Error **errp)
 {
     MigrationState *s = data;
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
@@ -770,6 +771,7 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_FAILED:
         vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
     }
+    return 0;
 }
 
 static void vfio_migration_free(VFIODevice *vbasedev)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b7ef7d..4a6c262 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
     struct VFIODevice *vbasedev;
     VMChangeStateEntry *vm_state;
-    Notifier migration_state;
+    NotifierWithReturn migration_state;
     uint32_t device_state;
     int data_fd;
     void *data_buffer;
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 55977f0..eaee8f4 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -221,7 +221,7 @@ struct VirtIONet {
     DeviceListener primary_listener;
     QDict *primary_opts;
     bool primary_opts_from_json;
-    Notifier migration_state;
+    NotifierWithReturn migration_state;
     VirtioNetRssData rss_data;
     struct NetRxPkt *rx_pkt;
     struct EBPFRSSContext ebpf_rss;
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5e65c18..b62e351 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,9 +60,9 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
-void migration_add_notifier(Notifier *notify,
-                            void (*func)(Notifier *notifier, void *data));
-void migration_remove_notifier(Notifier *notify);
+void migration_add_notifier(NotifierWithReturn *notify,
+                            NotifierWithReturnFunc func);
+void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
diff --git a/include/qemu/notify.h b/include/qemu/notify.h
index 9a85631..abf18db 100644
--- a/include/qemu/notify.h
+++ b/include/qemu/notify.h
@@ -45,6 +45,7 @@ bool notifier_list_empty(NotifierList *list);
 /* Same as Notifier but allows .notify() to return errors */
 typedef struct NotifierWithReturn NotifierWithReturn;
 
+/* Return int to allow for different failure modes and recovery actions */
 typedef int (*NotifierWithReturnFunc)(NotifierWithReturn *notifier, void *data,
                                       Error **errp);
 
diff --git a/migration/migration.c b/migration/migration.c
index ab21de2..6d4072e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -69,8 +69,8 @@
 #include "qemu/sockets.h"
 #include "sysemu/kvm.h"
 
-static NotifierList migration_state_notifiers =
-    NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
+static NotifierWithReturnList migration_state_notifiers =
+    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(migration_state_notifiers);
 
 /* Messages sent on the return path from destination to source */
 enum mig_rp_message_type {
@@ -1459,24 +1459,24 @@ static void migrate_fd_cancel(MigrationState *s)
     }
 }
 
-void migration_add_notifier(Notifier *notify,
-                            void (*func)(Notifier *notifier, void *data))
+void migration_add_notifier(NotifierWithReturn *notify,
+                            NotifierWithReturnFunc func)
 {
     notify->notify = func;
-    notifier_list_add(&migration_state_notifiers, notify);
+    notifier_with_return_list_add(&migration_state_notifiers, notify);
 }
 
-void migration_remove_notifier(Notifier *notify)
+void migration_remove_notifier(NotifierWithReturn *notify)
 {
     if (notify->notify) {
-        notifier_remove(notify);
+        notifier_with_return_remove(notify);
         notify->notify = NULL;
     }
 }
 
 void migration_call_notifiers(MigrationState *s)
 {
-    notifier_list_notify(&migration_state_notifiers, s);
+    notifier_with_return_list_notify(&migration_state_notifiers, s, 0);
 }
 
 bool migration_in_setup(MigrationState *s)
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5..1c00519 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -34,7 +34,7 @@
 typedef struct VhostVDPAState {
     NetClientState nc;
     struct vhost_vdpa vhost_vdpa;
-    Notifier migration_state;
+    NotifierWithReturn migration_state;
     VHostNetState *vhost_net;
 
     /* Control commands shadow buffers */
@@ -322,7 +322,8 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
     }
 }
 
-static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
+static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
+                                             void *data, Error **errp)
 {
     MigrationState *migration = data;
     VhostVDPAState *s = container_of(notifier, VhostVDPAState,
@@ -333,6 +334,7 @@ static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
     } else if (migration_has_failed(migration)) {
         vhost_vdpa_net_log_global_enable(s, false);
     }
+    return 0;
 }
 
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
diff --git a/roms/seabios-hppa b/roms/seabios-hppa
index 03774ed..e4eac85 160000
--- a/roms/seabios-hppa
+++ b/roms/seabios-hppa
@@ -1 +1 @@
-Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
+Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 37b277f..b3cd229 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -42,7 +42,7 @@
 /* core bits */
 
 static SpiceServer *spice_server;
-static Notifier migration_state;
+static NotifierWithReturn migration_state;
 static const char *auth = "spice";
 static char *auth_passwd;
 static time_t auth_expires = TIME_MAX;
@@ -568,12 +568,13 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
     return info;
 }
 
-static void migration_state_notifier(Notifier *notifier, void *data)
+static int migration_state_notifier(NotifierWithReturn *notifier,
+                                    void *data, Error **errp)
 {
     MigrationState *s = data;
 
     if (!spice_have_target_host) {
-        return;
+        return 0;
     }
 
     if (migration_in_setup(s)) {
@@ -586,6 +587,7 @@ static void migration_state_notifier(Notifier *notifier, void *data)
         spice_server_migrate_end(spice_server, false);
         spice_have_target_host = false;
     }
+    return 0;
 }
 
 int qemu_spice_migrate_info(const char *hostname, int port, int tls_port,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 04/14] migration: MigrationEvent for notifiers
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (2 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 03/14] migration: convert to NotifierWithReturn Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 05/14] migration: remove postcopy_after_devices Steve Sistare
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Passing MigrationState to notifiers is unsound because they could access
unstable migration state internals or even modify the state.  Instead, pass
the minimal info needed in a new MigrationEvent struct, which could be
extended in the future if needed.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/net/virtio-net.c      | 11 ++++++-----
 hw/vfio/migration.c      | 10 +++-------
 hw/vfio/trace-events     |  2 +-
 include/migration/misc.h | 23 ++++++++++++++++++++++-
 migration/migration.c    | 17 ++++++++++++-----
 net/vhost-vdpa.c         |  6 +++---
 ui/spice-core.c          |  9 ++++-----
 7 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 75f4e86..e803f98 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3504,7 +3504,7 @@ out:
     return !err;
 }
 
-static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
+static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
 {
     bool should_be_hidden;
     Error *err = NULL;
@@ -3516,7 +3516,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
 
     should_be_hidden = qatomic_read(&n->failover_primary_hidden);
 
-    if (migration_in_setup(s) && !should_be_hidden) {
+    if (e->type == MIG_EVENT_PRECOPY_SETUP && !should_be_hidden) {
         if (failover_unplug_primary(n, dev)) {
             vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
             qapi_event_send_unplug_primary(dev->id);
@@ -3524,7 +3524,7 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
         } else {
             warn_report("couldn't unplug primary device");
         }
-    } else if (migration_has_failed(s)) {
+    } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
         /* We already unplugged the device let's plug it back */
         if (!failover_replug_primary(n, dev, &err)) {
             if (err) {
@@ -3537,9 +3537,10 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
 static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
                                                void *data, Error **errp)
 {
-    MigrationState *s = data;
+    MigrationEvent *e = data;
+
     VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
-    virtio_net_handle_migration_primary(n, s);
+    virtio_net_handle_migration_primary(n, e);
     return 0;
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 6b6acc4..869d841 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -757,18 +757,14 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
                                          void *data, Error **errp)
 {
-    MigrationState *s = data;
+    MigrationEvent *e = data;
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
                                             migration_state);
     VFIODevice *vbasedev = migration->vbasedev;
 
-    trace_vfio_migration_state_notifier(vbasedev->name,
-                                        MigrationStatus_str(s->state));
+    trace_vfio_migration_state_notifier(vbasedev->name, e->type);
 
-    switch (s->state) {
-    case MIGRATION_STATUS_CANCELLING:
-    case MIGRATION_STATUS_CANCELLED:
-    case MIGRATION_STATUS_FAILED:
+    if (e->type == MIG_EVENT_PRECOPY_FAILED) {
         vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
     }
     return 0;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8fdde54..f0474b2 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -153,7 +153,7 @@ vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
 vfio_migration_realize(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
-vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
+vfio_migration_state_notifier(const char *name, int state) " (%s) state %d"
 vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
diff --git a/include/migration/misc.h b/include/migration/misc.h
index b62e351..9e4abae 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,10 +60,31 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
+
+typedef enum MigrationEventType {
+    MIG_EVENT_PRECOPY_SETUP,
+    MIG_EVENT_PRECOPY_DONE,
+    MIG_EVENT_PRECOPY_FAILED,
+    MIG_EVENT_MAX
+} MigrationEventType;
+
+typedef struct MigrationEvent {
+    MigrationEventType type;
+} MigrationEvent;
+
+/*
+ * Register the notifier @notify to be called when a migration event occurs
+ * for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func.
+ * Notifiers may receive events in any of the following orders:
+ *    - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_DONE
+ *    - MIG_EVENT_PRECOPY_SETUP -> MIG_EVENT_PRECOPY_FAILED
+ *    - MIG_EVENT_PRECOPY_FAILED
+ */
 void migration_add_notifier(NotifierWithReturn *notify,
                             NotifierWithReturnFunc func);
+
 void migration_remove_notifier(NotifierWithReturn *notify);
-void migration_call_notifiers(MigrationState *s);
+void migration_call_notifiers(MigrationState *s, MigrationEventType type);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 6d4072e..4650c21 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1319,6 +1319,8 @@ void migrate_set_state(int *state, int old_state, int new_state)
 
 static void migrate_fd_cleanup(MigrationState *s)
 {
+    MigrationEventType type;
+
     g_free(s->hostname);
     s->hostname = NULL;
     json_writer_free(s->vmdesc);
@@ -1367,7 +1369,9 @@ static void migrate_fd_cleanup(MigrationState *s)
         /* It is used on info migrate.  We can't free it */
         error_report_err(error_copy(s->error));
     }
-    migration_call_notifiers(s);
+    type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
+                                     MIG_EVENT_PRECOPY_DONE;
+    migration_call_notifiers(s, type);
     block_cleanup_parameters();
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
@@ -1474,9 +1478,12 @@ void migration_remove_notifier(NotifierWithReturn *notify)
     }
 }
 
-void migration_call_notifiers(MigrationState *s)
+void migration_call_notifiers(MigrationState *s, MigrationEventType type)
 {
-    notifier_with_return_list_notify(&migration_state_notifiers, s, 0);
+    MigrationEvent e;
+
+    e.type = type;
+    notifier_with_return_list_notify(&migration_state_notifiers, &e, 0);
 }
 
 bool migration_in_setup(MigrationState *s)
@@ -2537,7 +2544,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
      * spice needs to trigger a transition now
      */
     ms->postcopy_after_devices = true;
-    migration_call_notifiers(ms);
+    migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
 
     migration_downtime_end(ms);
 
@@ -3601,7 +3608,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         rate_limit = migrate_max_bandwidth();
 
         /* Notify before starting migration thread */
-        migration_call_notifiers(s);
+        migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP);
     }
 
     migration_rate_set(rate_limit);
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1c00519..a29d18a 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -325,13 +325,13 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
 static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
                                              void *data, Error **errp)
 {
-    MigrationState *migration = data;
+    MigrationEvent *e = data;
     VhostVDPAState *s = container_of(notifier, VhostVDPAState,
                                      migration_state);
 
-    if (migration_in_setup(migration)) {
+    if (e->type == MIG_EVENT_PRECOPY_SETUP) {
         vhost_vdpa_net_log_global_enable(s, true);
-    } else if (migration_has_failed(migration)) {
+    } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
         vhost_vdpa_net_log_global_enable(s, false);
     }
     return 0;
diff --git a/ui/spice-core.c b/ui/spice-core.c
index b3cd229..0a59876 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -571,19 +571,18 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
 static int migration_state_notifier(NotifierWithReturn *notifier,
                                     void *data, Error **errp)
 {
-    MigrationState *s = data;
+    MigrationEvent *e = data;
 
     if (!spice_have_target_host) {
         return 0;
     }
 
-    if (migration_in_setup(s)) {
+    if (e->type == MIG_EVENT_PRECOPY_SETUP) {
         spice_server_migrate_start(spice_server);
-    } else if (migration_has_finished(s) ||
-               migration_in_postcopy_after_devices(s)) {
+    } else if (e->type == MIG_EVENT_PRECOPY_DONE) {
         spice_server_migrate_end(spice_server, true);
         spice_have_target_host = false;
-    } else if (migration_has_failed(s)) {
+    } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
         spice_server_migrate_end(spice_server, false);
         spice_have_target_host = false;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 05/14] migration: remove postcopy_after_devices
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (3 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 04/14] migration: MigrationEvent for notifiers Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 06/14] migration: MigrationNotifyFunc Steve Sistare
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

postcopy_after_devices and migration_in_postcopy_after_devices are no
longer used, so delete them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/misc.h | 1 -
 migration/migration.c    | 7 -------
 migration/migration.h    | 2 --
 3 files changed, 10 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 9e4abae..e615000 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -89,7 +89,6 @@ bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
-bool migration_in_postcopy_after_devices(MigrationState *);
 /* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
 /* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
diff --git a/migration/migration.c b/migration/migration.c
index 4650c21..8f7f2d9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1527,11 +1527,6 @@ bool migration_postcopy_is_alive(int state)
     }
 }
 
-bool migration_in_postcopy_after_devices(MigrationState *s)
-{
-    return migration_in_postcopy() && s->postcopy_after_devices;
-}
-
 bool migration_in_incoming_postcopy(void)
 {
     PostcopyState ps = postcopy_state_get();
@@ -1613,7 +1608,6 @@ int migrate_init(MigrationState *s, Error **errp)
     s->expected_downtime = 0;
     s->setup_time = 0;
     s->start_postcopy = false;
-    s->postcopy_after_devices = false;
     s->migration_thread_running = false;
     error_free(s->error);
     s->error = NULL;
@@ -2543,7 +2537,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
      * at the transition to postcopy and after the device state; in particular
      * spice needs to trigger a transition now
      */
-    ms->postcopy_after_devices = true;
     migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
 
     migration_downtime_end(ms);
diff --git a/migration/migration.h b/migration/migration.h
index f2c8b8f..aef8afb 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -348,8 +348,6 @@ struct MigrationState {
 
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
-    /* Flag set after postcopy has sent the device state */
-    bool postcopy_after_devices;
 
     /* Flag set once the migration thread is running (and needs joining) */
     bool migration_thread_running;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 06/14] migration: MigrationNotifyFunc
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (4 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 05/14] migration: remove postcopy_after_devices Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 07/14] migration: per-mode notifiers Steve Sistare
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Define MigrationNotifyFunc to improve type safety and simplify migration
notifiers.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 hw/net/virtio-net.c      | 4 +---
 hw/vfio/migration.c      | 3 +--
 include/migration/misc.h | 5 ++++-
 migration/migration.c    | 4 ++--
 net/vhost-vdpa.c         | 6 ++----
 ui/spice-core.c          | 4 +---
 6 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e803f98..a3c711b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3535,10 +3535,8 @@ static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationEvent *e)
 }
 
 static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
-                                               void *data, Error **errp)
+                                               MigrationEvent *e, Error **errp)
 {
-    MigrationEvent *e = data;
-
     VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
     virtio_net_handle_migration_primary(n, e);
     return 0;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 869d841..50140ed 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -755,9 +755,8 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 }
 
 static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
-                                         void *data, Error **errp)
+                                         MigrationEvent *e, Error **errp)
 {
-    MigrationEvent *e = data;
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
                                             migration_state);
     VFIODevice *vbasedev = migration->vbasedev;
diff --git a/include/migration/misc.h b/include/migration/misc.h
index e615000..e36a1f3 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -72,6 +72,9 @@ typedef struct MigrationEvent {
     MigrationEventType type;
 } MigrationEvent;
 
+typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
+                                   MigrationEvent *e, Error **errp);
+
 /*
  * Register the notifier @notify to be called when a migration event occurs
  * for MIG_MODE_NORMAL, as specified by the MigrationEvent passed to @func.
@@ -81,7 +84,7 @@ typedef struct MigrationEvent {
  *    - MIG_EVENT_PRECOPY_FAILED
  */
 void migration_add_notifier(NotifierWithReturn *notify,
-                            NotifierWithReturnFunc func);
+                            MigrationNotifyFunc func);
 
 void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s, MigrationEventType type);
diff --git a/migration/migration.c b/migration/migration.c
index 8f7f2d9..33149c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1464,9 +1464,9 @@ static void migrate_fd_cancel(MigrationState *s)
 }
 
 void migration_add_notifier(NotifierWithReturn *notify,
-                            NotifierWithReturnFunc func)
+                            MigrationNotifyFunc func)
 {
-    notify->notify = func;
+    notify->notify = (NotifierWithReturnFunc)func;
     notifier_with_return_list_add(&migration_state_notifiers, notify);
 }
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a29d18a..e6bdb45 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -323,11 +323,9 @@ static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)
 }
 
 static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
-                                             void *data, Error **errp)
+                                             MigrationEvent *e, Error **errp)
 {
-    MigrationEvent *e = data;
-    VhostVDPAState *s = container_of(notifier, VhostVDPAState,
-                                     migration_state);
+    VhostVDPAState *s = container_of(notifier, VhostVDPAState, migration_state);
 
     if (e->type == MIG_EVENT_PRECOPY_SETUP) {
         vhost_vdpa_net_log_global_enable(s, true);
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 0a59876..15be640 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -569,10 +569,8 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
 }
 
 static int migration_state_notifier(NotifierWithReturn *notifier,
-                                    void *data, Error **errp)
+                                    MigrationEvent *e, Error **errp)
 {
-    MigrationEvent *e = data;
-
     if (!spice_have_target_host) {
         return 0;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 07/14] migration: per-mode notifiers
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (5 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 06/14] migration: MigrationNotifyFunc Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 08/14] migration: refactor migrate_fd_connect failures Steve Sistare
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Keep a separate list of migration notifiers for each migration mode.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/migration/misc.h |  6 ++++++
 migration/migration.c    | 22 +++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e36a1f3..4dc06a9 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -86,6 +86,12 @@ typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
 void migration_add_notifier(NotifierWithReturn *notify,
                             MigrationNotifyFunc func);
 
+/*
+ * Same as migration_add_notifier, but applies to be specified @mode.
+ */
+void migration_add_notifier_mode(NotifierWithReturn *notify,
+                                 MigrationNotifyFunc func, MigMode mode);
+
 void migration_remove_notifier(NotifierWithReturn *notify);
 void migration_call_notifiers(MigrationState *s, MigrationEventType type);
 bool migration_in_setup(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 33149c4..925103b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -69,8 +69,13 @@
 #include "qemu/sockets.h"
 #include "sysemu/kvm.h"
 
-static NotifierWithReturnList migration_state_notifiers =
-    NOTIFIER_WITH_RETURN_LIST_INITIALIZER(migration_state_notifiers);
+#define NOTIFIER_ELEM_INIT(array, elem)    \
+    [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
+
+static NotifierWithReturnList migration_state_notifiers[] = {
+    NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
+    NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
+};
 
 /* Messages sent on the return path from destination to source */
 enum mig_rp_message_type {
@@ -1463,11 +1468,17 @@ static void migrate_fd_cancel(MigrationState *s)
     }
 }
 
+void migration_add_notifier_mode(NotifierWithReturn *notify,
+                                 MigrationNotifyFunc func, MigMode mode)
+{
+    notify->notify = (NotifierWithReturnFunc)func;
+    notifier_with_return_list_add(&migration_state_notifiers[mode], notify);
+}
+
 void migration_add_notifier(NotifierWithReturn *notify,
                             MigrationNotifyFunc func)
 {
-    notify->notify = (NotifierWithReturnFunc)func;
-    notifier_with_return_list_add(&migration_state_notifiers, notify);
+    migration_add_notifier_mode(notify, func, MIG_MODE_NORMAL);
 }
 
 void migration_remove_notifier(NotifierWithReturn *notify)
@@ -1480,10 +1491,11 @@ void migration_remove_notifier(NotifierWithReturn *notify)
 
 void migration_call_notifiers(MigrationState *s, MigrationEventType type)
 {
+    MigMode mode = s->parameters.mode;
     MigrationEvent e;
 
     e.type = type;
-    notifier_with_return_list_notify(&migration_state_notifiers, &e, 0);
+    notifier_with_return_list_notify(&migration_state_notifiers[mode], &e, 0);
 }
 
 bool migration_in_setup(MigrationState *s)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 08/14] migration: refactor migrate_fd_connect failures
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (6 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 07/14] migration: per-mode notifiers Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-22 17:28 ` [PATCH V4 09/14] migration: notifier error checking Steve Sistare
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Move common code for the error path in migrate_fd_connect to a shared
fail label.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 migration/migration.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 925103b..6a115d2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3627,11 +3627,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     if (migrate_postcopy_ram() || migrate_return_path()) {
         if (open_return_path_on_source(s)) {
             error_setg(&local_err, "Unable to open return-path for postcopy");
-            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
-            migrate_set_error(s, local_err);
-            error_report_err(local_err);
-            migrate_fd_cleanup(s);
-            return;
+            goto fail;
         }
     }
 
@@ -3660,6 +3656,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
                 migration_thread, s, QEMU_THREAD_JOINABLE);
     }
     s->migration_thread_running = true;
+    return;
+
+fail:
+    migrate_set_error(s, local_err);
+    migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    error_report_err(local_err);
+    migrate_fd_cleanup(s);
 }
 
 static void migration_class_init(ObjectClass *klass, void *data)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 09/14] migration: notifier error checking
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (7 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 08/14] migration: refactor migrate_fd_connect failures Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-26  2:03   ` Peter Xu
  2024-02-22 17:28 ` [PATCH V4 10/14] migration: stop vm for cpr Steve Sistare
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Check the status returned by migration notifiers for event type
MIG_EVENT_PRECOPY_SETUP, and report errors.  None of the notifiers
return an error status at this time.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/migration/misc.h |  8 +++++++-
 migration/migration.c    | 25 ++++++++++++++++---------
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4dc06a9..e4933b8 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -72,6 +72,11 @@ typedef struct MigrationEvent {
     MigrationEventType type;
 } MigrationEvent;
 
+/*
+ * A MigrationNotifyFunc may return an error code and an Error object,
+ * but only when @e->type is MIG_EVENT_PRECOPY_SETUP.  The code is an int
+ * to allow for different failure modes and recovery actions.
+ */
 typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
                                    MigrationEvent *e, Error **errp);
 
@@ -93,7 +98,8 @@ void migration_add_notifier_mode(NotifierWithReturn *notify,
                                  MigrationNotifyFunc func, MigMode mode);
 
 void migration_remove_notifier(NotifierWithReturn *notify);
-void migration_call_notifiers(MigrationState *s, MigrationEventType type);
+int migration_call_notifiers(MigrationState *s, MigrationEventType type,
+                             Error **errp);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 6a115d2..37c836b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1376,7 +1376,7 @@ static void migrate_fd_cleanup(MigrationState *s)
     }
     type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
                                      MIG_EVENT_PRECOPY_DONE;
-    migration_call_notifiers(s, type);
+    migration_call_notifiers(s, type, NULL);
     block_cleanup_parameters();
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
@@ -1489,13 +1489,18 @@ void migration_remove_notifier(NotifierWithReturn *notify)
     }
 }
 
-void migration_call_notifiers(MigrationState *s, MigrationEventType type)
+int migration_call_notifiers(MigrationState *s, MigrationEventType type,
+                             Error **errp)
 {
     MigMode mode = s->parameters.mode;
     MigrationEvent e;
+    int ret;
 
     e.type = type;
-    notifier_with_return_list_notify(&migration_state_notifiers[mode], &e, 0);
+    ret = notifier_with_return_list_notify(&migration_state_notifiers[mode],
+                                           &e, errp);
+    assert(!ret || type == MIG_EVENT_PRECOPY_SETUP);
+    return ret;
 }
 
 bool migration_in_setup(MigrationState *s)
@@ -2549,7 +2554,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
      * at the transition to postcopy and after the device state; in particular
      * spice needs to trigger a transition now
      */
-    migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
+    migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, NULL);
 
     migration_downtime_end(ms);
 
@@ -2569,11 +2574,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     ret = qemu_file_get_error(ms->to_dst_file);
     if (ret) {
-        error_setg(errp, "postcopy_start: Migration stream errored");
-        migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                              MIGRATION_STATUS_FAILED);
+        error_setg_errno(errp, -ret, "postcopy_start: Migration stream error");
+        bql_lock();
+        goto fail;
     }
-
     trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
 
     return ret;
@@ -2594,6 +2598,7 @@ fail:
             error_report_err(local_err);
         }
     }
+    migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
     bql_unlock();
     return -1;
 }
@@ -3613,7 +3618,9 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         rate_limit = migrate_max_bandwidth();
 
         /* Notify before starting migration thread */
-        migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP);
+        if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
+            goto fail;
+        }
     }
 
     migration_rate_set(rate_limit);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 10/14] migration: stop vm for cpr
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (8 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 09/14] migration: notifier error checking Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-26  2:08   ` Peter Xu
  2024-02-22 17:28 ` [PATCH V4 11/14] vfio: register container " Steve Sistare
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 include/migration/misc.h |  1 +
 migration/migration.c    | 51 +++++++++++++++++++++++++++++-------------------
 migration/migration.h    |  2 --
 3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b8..5d1aa59 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
 void migration_shutdown(void);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
 
 typedef enum MigrationEventType {
     MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b..90a9094 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
     return (a > b) - (a < b);
 }
 
-int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
 {
-    int ret = vm_stop_force_state(state);
+    int ret;
+
+    migration_downtime_start(s);
+
+    s->vm_old_state = runstate_get();
+    global_state_store();
+
+    ret = vm_stop_force_state(state);
 
     trace_vmstate_downtime_checkpoint("src-vm-stopped");
+    trace_migration_completion_vm_stop(ret);
 
     return ret;
 }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
             s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migrate_mode_is_cpr(MigrationState *s)
+{
+    return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
 int migrate_init(MigrationState *s, Error **errp)
 {
     int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     bql_lock();
     trace_postcopy_start_set_run();
 
-    migration_downtime_start(ms);
-
-    global_state_store();
-    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
+    ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
         goto fail;
     }
@@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState *s,
     int ret;
 
     bql_lock();
-    migration_downtime_start(s);
 
-    s->vm_old_state = runstate_get();
-    global_state_store();
-
-    ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
-    trace_migration_completion_vm_stop(ret);
-    if (ret < 0) {
-        goto out_unlock;
+    if (!migrate_mode_is_cpr(s)) {
+        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+        if (ret < 0) {
+            goto out_unlock;
+        }
     }
 
     ret = migration_maybe_pause(s, current_active_state,
@@ -3500,15 +3507,10 @@ static void *bg_migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 
     trace_migration_thread_setup_complete();
-    migration_downtime_start(s);
 
     bql_lock();
 
-    s->vm_old_state = runstate_get();
-
-    global_state_store();
-    /* Forcibly stop VM before saving state of vCPUs and devices */
-    if (migration_stop_vm(RUN_STATE_PAUSED)) {
+    if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
         goto fail;
     }
     /*
@@ -3584,6 +3586,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     Error *local_err = NULL;
     uint64_t rate_limit;
     bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
+    int ret;
 
     /*
      * If there's a previous error, free it and prepare for another one.
@@ -3655,6 +3658,14 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         return;
     }
 
+    if (migrate_mode_is_cpr(s)) {
+        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+        if (ret < 0) {
+            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
+            goto fail;
+        }
+    }
+
     if (migrate_background_snapshot()) {
         qemu_thread_create(&s->thread, "bg_snapshot",
                 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
diff --git a/migration/migration.h b/migration/migration.h
index aef8afb..65c0b61 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
  */
 void migration_rp_kick(MigrationState *s);
 
-int migration_stop_vm(RunState state);
-
 #endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 11/14] vfio: register container for cpr
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (9 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 10/14] migration: stop vm for cpr Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-29  8:35   ` Cédric Le Goater
  2024-02-22 17:28 ` [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended Steve Sistare
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Define entry points to perform per-container cpr-specific initialization
and teardown.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/container.c           | 11 ++++++++++-
 hw/vfio/cpr.c                 | 19 +++++++++++++++++++
 hw/vfio/iommufd.c             |  6 ++++++
 hw/vfio/meson.build           |  1 +
 include/hw/vfio/vfio-common.h |  3 +++
 5 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 hw/vfio/cpr.c

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index bd25b9f..096d77e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -621,10 +621,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto free_container_exit;
     }
 
+    ret = vfio_cpr_register_container(bcontainer, errp);
+    if (ret) {
+        goto free_container_exit;
+    }
+
     ret = vfio_ram_block_discard_disable(container, true);
     if (ret) {
         error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
-        goto free_container_exit;
+        goto unregister_container_exit;
     }
 
     assert(bcontainer->ops->setup);
@@ -667,6 +672,9 @@ listener_release_exit:
 enable_discards_exit:
     vfio_ram_block_discard_disable(container, false);
 
+unregister_container_exit:
+    vfio_cpr_unregister_container(bcontainer);
+
 free_container_exit:
     g_free(container);
 
@@ -710,6 +718,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
         vfio_container_destroy(bcontainer);
 
         trace_vfio_disconnect_container(container->fd);
+        vfio_cpr_unregister_container(bcontainer);
         close(container->fd);
         g_free(container);
 
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
new file mode 100644
index 0000000..3bede54
--- /dev/null
+++ b/hw/vfio/cpr.c
@@ -0,0 +1,19 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/vfio/vfio-common.h"
+#include "qapi/error.h"
+
+int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
+{
+    return 0;
+}
+
+void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
+{
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9bfddc1..e1be224 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -411,6 +411,11 @@ found_container:
         goto err_listener_register;
     }
 
+    ret = vfio_cpr_register_container(bcontainer, errp);
+    if (ret) {
+        goto err_listener_register;
+    }
+
     /*
      * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
      * for discarding incompatibility check as well?
@@ -461,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
         iommufd_cdev_ram_block_discard_disable(false);
     }
 
+    vfio_cpr_unregister_container(bcontainer);
     iommufd_cdev_detach_container(vbasedev, container);
     iommufd_cdev_container_destroy(container);
     vfio_put_address_space(space);
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index bb98493..bba776f 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -5,6 +5,7 @@ vfio_ss.add(files(
   'container-base.c',
   'container.c',
   'migration.c',
+  'cpr.c',
 ))
 vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
 vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 4a6c262..b9da6c0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -205,6 +205,9 @@ void vfio_detach_device(VFIODevice *vbasedev);
 int vfio_kvm_device_add_fd(int fd, Error **errp);
 int vfio_kvm_device_del_fd(int fd, Error **errp);
 
+int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp);
+void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
+
 extern const MemoryRegionOps vfio_region_ops;
 typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (10 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 11/14] vfio: register container " Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-29  8:44   ` Cédric Le Goater
  2024-02-22 17:28 ` [PATCH V4 13/14] migration: update cpr-reboot description Steve Sistare
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.  The
user is responsible for suspending the guest before initiating cpr, such as
by issuing guest-suspend-ram to the qemu guest agent.

Relax the vfio blocker so it does not apply to cpr, and add a notifier that
verifies the guest is suspended.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 hw/vfio/common.c                      |  2 +-
 hw/vfio/cpr.c                         | 20 ++++++++++++++++++++
 hw/vfio/migration.c                   |  2 +-
 include/hw/vfio/vfio-container-base.h |  1 +
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 059bfdc..ff88c3f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
     error_setg(&multiple_devices_migration_blocker,
                "Multiple VFIO devices migration is supported only if all of "
                "them support P2P migration");
-    ret = migrate_add_blocker(&multiple_devices_migration_blocker, errp);
+    ret = migrate_add_blocker_normal(&multiple_devices_migration_blocker, errp);
 
     return ret;
 }
diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
index 3bede54..392c2dd 100644
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -7,13 +7,33 @@
 
 #include "qemu/osdep.h"
 #include "hw/vfio/vfio-common.h"
+#include "migration/misc.h"
 #include "qapi/error.h"
+#include "sysemu/runstate.h"
+
+static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
+                                    MigrationEvent *e, Error **errp)
+{
+    if (e->type == MIG_EVENT_PRECOPY_SETUP &&
+        !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
+
+        error_setg(errp,
+            "VFIO device only supports cpr-reboot for runstate suspended");
+
+        return -1;
+    }
+    return 0;
+}
 
 int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
 {
+    migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
+                                vfio_cpr_reboot_notifier,
+                                MIG_MODE_CPR_REBOOT);
     return 0;
 }
 
 void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
 {
+    migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
 }
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 50140ed..2050ac8 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -889,7 +889,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
     vbasedev->migration_blocker = error_copy(err);
     error_free(err);
 
-    return migrate_add_blocker(&vbasedev->migration_blocker, errp);
+    return migrate_add_blocker_normal(&vbasedev->migration_blocker, errp);
 }
 
 /* ---------------------------------------------------------------------- */
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index b2813b0..3582d5f 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -49,6 +49,7 @@ typedef struct VFIOContainerBase {
     QLIST_ENTRY(VFIOContainerBase) next;
     QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
+    NotifierWithReturn cpr_reboot_notifier;
 } VFIOContainerBase;
 
 typedef struct VFIOGuestIOMMU {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 13/14] migration: update cpr-reboot description
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (11 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-26  2:10   ` Peter Xu
  2024-02-28  7:12   ` Markus Armbruster
  2024-02-22 17:28 ` [PATCH V4 14/14] migration: options incompatible with cpr Steve Sistare
  2024-02-22 17:33 ` [PATCH V4 00/14] allow cpr-reboot for vfio Steven Sistare
  14 siblings, 2 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Clarify qapi for cpr-reboot migration mode, and add vfio support.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 qapi/migration.json | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 5a565d9..0990297 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -636,19 +636,28 @@
 #
 # @normal: the original form of migration. (since 8.2)
 #
-# @cpr-reboot: The migrate command saves state to a file, allowing one to
-#              quit qemu, reboot to an updated kernel, and restart an updated
-#              version of qemu.  The caller must specify a migration URI
-#              that writes to and reads from a file.  Unlike normal mode,
-#              the use of certain local storage options does not block the
-#              migration, but the caller must not modify guest block devices
-#              between the quit and restart.  To avoid saving guest RAM to the
-#              file, the memory backend must be shared, and the @x-ignore-shared
-#              migration capability must be set.  Guest RAM must be non-volatile
-#              across reboot, such as by backing it with a dax device, but this
-#              is not enforced.  The restarted qemu arguments must match those
-#              used to initially start qemu, plus the -incoming option.
-#              (since 8.2)
+# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
+#     After quitting qemu, the user resumes by running qemu -incoming.
+#
+#     This mode allows the user to quit qemu, and restart an updated version
+#     of qemu.  The user may even update and reboot the OS before restarting,
+#     as long as the URI persists across a reboot.
+#
+#     Unlike normal mode, the use of certain local storage options does not
+#     block the migration, but the user must not modify guest block devices
+#     between the quit and restart.
+#
+#     This mode supports vfio devices provided the user first puts the guest
+#     in the suspended runstate, such as by issuing guest-suspend-ram to the
+#     qemu guest agent.
+#
+#     Best performance is achieved when the memory backend is shared and the
+#     @x-ignore-shared migration capability is set, but this is not required.
+#     Further, if the user reboots before restarting such a configuration, the
+#     shared backend must be be non-volatile across reboot, such as by backing
+#     it with a dax device.
+#
+#     (since 8.2)
 ##
 { 'enum': 'MigMode',
   'data': [ 'normal', 'cpr-reboot' ] }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (12 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 13/14] migration: update cpr-reboot description Steve Sistare
@ 2024-02-22 17:28 ` Steve Sistare
  2024-02-26  2:10   ` Peter Xu
  2024-02-28  7:21   ` Markus Armbruster
  2024-02-22 17:33 ` [PATCH V4 00/14] allow cpr-reboot for vfio Steven Sistare
  14 siblings, 2 replies; 43+ messages in thread
From: Steve Sistare @ 2024-02-22 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand, Steve Sistare

Fail the migration request if options are set that are incompatible
with cpr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/migration.c | 17 +++++++++++++++++
 qapi/migration.json   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 90a9094..7652fd4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
         return false;
     }
 
+    if (migrate_mode_is_cpr(s)) {
+        const char *conflict = NULL;
+
+        if (migrate_postcopy()) {
+            conflict = "postcopy";
+        } else if (migrate_background_snapshot()) {
+            conflict = "background snapshot";
+        } else if (migrate_colo()) {
+            conflict = "COLO";
+        }
+
+        if (conflict) {
+            error_setg(errp, "Cannot use %s with CPR", conflict);
+            return false;
+        }
+    }
+
     if (blk || blk_inc) {
         if (migrate_colo()) {
             error_setg(errp, "No disk migration is required in COLO mode");
diff --git a/qapi/migration.json b/qapi/migration.json
index 0990297..c6bfe2e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -657,6 +657,8 @@
 #     shared backend must be be non-volatile across reboot, such as by backing
 #     it with a dax device.
 #
+#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
+#
 #     (since 8.2)
 ##
 { 'enum': 'MigMode',
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
                   ` (13 preceding siblings ...)
  2024-02-22 17:28 ` [PATCH V4 14/14] migration: options incompatible with cpr Steve Sistare
@ 2024-02-22 17:33 ` Steven Sistare
  2024-02-26  2:14   ` Peter Xu
  2024-03-08 16:52   ` Cédric Le Goater
  14 siblings, 2 replies; 43+ messages in thread
From: Steven Sistare @ 2024-02-22 17:33 UTC (permalink / raw)
  To: qemu-devel, Peter Xu, David Hildenbrand, Alex Williamson
  Cc: Fabiano Rosas, Michael S. Tsirkin, Jason Wang, Cedric Le Goater,
	Gerd Hoffmann, Marc-Andre Lureau

Peter (and David if interested): these patches still need RB:
  migration: notifier error checking
  migration: stop vm for cpr
  migration: update cpr-reboot description
  migration: options incompatible with cpr

Alex, these patches still need RB:
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended

Thanks!

- Steve

On 2/22/2024 12:28 PM, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Most of the patches in this series enhance migration notifiers so they can
> return an error status and message.  The last few patches register a notifier
> for vfio that returns an error if the guest is not suspended.
> 
> Changes in V3:
>   * update to tip, add RB's
>   * replace MigrationStatus with new enum MigrationEventType
>   * simplify migrate_fd_connect error recovery
>   * support vfio iommufd containers
>   * add patches:
>       migration: stop vm for cpr
>       migration: update cpr-reboot description
> 
> Changes in V4:
>   * rebase to tip, add RB's
>   * add patch to prevent options such as precopy from being used with cpr.
>       migration: options incompatible with cpr
>   * refactor subroutines in "stop vm for cpr"
>   * simplify "notifier error checking" patch by restricting that only
>     notifiers for MIG_EVENT_PRECOPY_SETUP may return an error.
>   * doc that a fail event may be sent without a prior setup event
> 
> Steve Sistare (14):
>   notify: pass error to notifier with return
>   migration: remove error from notifier data
>   migration: convert to NotifierWithReturn
>   migration: MigrationEvent for notifiers
>   migration: remove postcopy_after_devices
>   migration: MigrationNotifyFunc
>   migration: per-mode notifiers
>   migration: refactor migrate_fd_connect failures
>   migration: notifier error checking
>   migration: stop vm for cpr
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended
>   migration: update cpr-reboot description
>   migration: options incompatible with cpr
> 
>  hw/net/virtio-net.c                   |  13 +--
>  hw/vfio/common.c                      |   2 +-
>  hw/vfio/container.c                   |  11 ++-
>  hw/vfio/cpr.c                         |  39 +++++++++
>  hw/vfio/iommufd.c                     |   6 ++
>  hw/vfio/meson.build                   |   1 +
>  hw/vfio/migration.c                   |  15 ++--
>  hw/vfio/trace-events                  |   2 +-
>  hw/virtio/vhost-user.c                |  10 +--
>  hw/virtio/virtio-balloon.c            |   3 +-
>  include/hw/vfio/vfio-common.h         |   5 +-
>  include/hw/vfio/vfio-container-base.h |   1 +
>  include/hw/virtio/virtio-net.h        |   2 +-
>  include/migration/misc.h              |  47 +++++++++--
>  include/qemu/notify.h                 |   8 +-
>  migration/migration.c                 | 148 +++++++++++++++++++++++-----------
>  migration/migration.h                 |   4 -
>  migration/postcopy-ram.c              |   3 +-
>  migration/postcopy-ram.h              |   1 -
>  migration/ram.c                       |   3 +-
>  net/vhost-vdpa.c                      |  14 ++--
>  qapi/migration.json                   |  37 ++++++---
>  roms/seabios-hppa                     |   2 +-
>  ui/spice-core.c                       |  17 ++--
>  util/notify.c                         |   5 +-
>  25 files changed, 275 insertions(+), 124 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 09/14] migration: notifier error checking
  2024-02-22 17:28 ` [PATCH V4 09/14] migration: notifier error checking Steve Sistare
@ 2024-02-26  2:03   ` Peter Xu
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2024-02-26  2:03 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 22, 2024 at 09:28:35AM -0800, Steve Sistare wrote:
> Check the status returned by migration notifiers for event type
> MIG_EVENT_PRECOPY_SETUP, and report errors.  None of the notifiers
> return an error status at this time.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-02-22 17:28 ` [PATCH V4 10/14] migration: stop vm for cpr Steve Sistare
@ 2024-02-26  2:08   ` Peter Xu
  2024-02-29 15:21     ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2024-02-26  2:08 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
> When migration for cpr is initiated, stop the vm and set state
> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
> possibility of ram and device state being out of sync, and guarantees
> that a guest in the suspended state remains suspended, because qmp_cont
> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

cpr-reboot mode keeps changing behavior.

Could we declare it "experimental" until it's solid?  Maybe a patch to
document this?

Normally IMHO we shouldn't merge a feature if it's not complete, however
cpr-reboot is so special that the mode itself is already merged in 8.2
before I started to merge patches, and it keeps changing things.  I don't
know what else we can do here besides declaring it experimental and not
declare it a stable feature.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 13/14] migration: update cpr-reboot description
  2024-02-22 17:28 ` [PATCH V4 13/14] migration: update cpr-reboot description Steve Sistare
@ 2024-02-26  2:10   ` Peter Xu
  2024-02-28  7:12   ` Markus Armbruster
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Xu @ 2024-02-26  2:10 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 22, 2024 at 09:28:39AM -0800, Steve Sistare wrote:
> Clarify qapi for cpr-reboot migration mode, and add vfio support.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-22 17:28 ` [PATCH V4 14/14] migration: options incompatible with cpr Steve Sistare
@ 2024-02-26  2:10   ` Peter Xu
  2024-02-28  7:21   ` Markus Armbruster
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Xu @ 2024-02-26  2:10 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 22, 2024 at 09:28:40AM -0800, Steve Sistare wrote:
> Fail the migration request if options are set that are incompatible
> with cpr.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-22 17:33 ` [PATCH V4 00/14] allow cpr-reboot for vfio Steven Sistare
@ 2024-02-26  2:14   ` Peter Xu
  2024-02-26  8:49     ` Cédric Le Goater
  2024-03-08 16:52   ` Cédric Le Goater
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Xu @ 2024-02-26  2:14 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, David Hildenbrand, Alex Williamson, Fabiano Rosas,
	Michael S. Tsirkin, Jason Wang, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau

On Thu, Feb 22, 2024 at 12:33:42PM -0500, Steven Sistare wrote:
> Peter (and David if interested): these patches still need RB:
>   migration: notifier error checking
>   migration: stop vm for cpr
>   migration: update cpr-reboot description
>   migration: options incompatible with cpr

These all look fine to me.

> 
> Alex, these patches still need RB:
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended

I'll need to wait for comment from either Alex/Cedric on these.

As I asked in the other thread, afaict crp-reboot keeps changing behavior,
maybe I can merge migration patches first, then keep vfio patches
separately merged / discussed?  I always see cpr-reboot mode experimental
from this regard.  Please consider adding a patch to declare cpr-reboot
mode experimental if that matches your expectation, until all relevant
patches are merged, to make sure the ABI becomes stable.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-26  2:14   ` Peter Xu
@ 2024-02-26  8:49     ` Cédric Le Goater
  2024-02-26  9:01       ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Cédric Le Goater @ 2024-02-26  8:49 UTC (permalink / raw)
  To: Peter Xu, Steven Sistare
  Cc: qemu-devel, David Hildenbrand, Alex Williamson, Fabiano Rosas,
	Michael S. Tsirkin, Jason Wang, Gerd Hoffmann, Marc-Andre Lureau

On 2/26/24 03:14, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 12:33:42PM -0500, Steven Sistare wrote:
>> Peter (and David if interested): these patches still need RB:
>>    migration: notifier error checking
>>    migration: stop vm for cpr
>>    migration: update cpr-reboot description
>>    migration: options incompatible with cpr
> 
> These all look fine to me.
> 
>>
>> Alex, these patches still need RB:
>>    vfio: register container for cpr
>>    vfio: allow cpr-reboot migration if suspended
> 
> I'll need to wait for comment from either Alex/Cedric on these.

Yes. It's on my list.

> As I asked in the other thread, afaict crp-reboot keeps changing behavior,
> maybe I can merge migration patches first, 

Go ahead. It will help me for the changes I am doing on error reporting
for VFIO migration. I will rebase on top.

> then keep vfio patches separately merged / discussed?  

Sure.

> I always see cpr-reboot mode experimental from this regard.  

This makes sense to me also.

Thanks,

C.



> Please consider adding a patch to declare cpr-reboot
> mode experimental if that matches your expectation, until all relevant
> patches are merged, to make sure the ABI becomes stable.
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-26  8:49     ` Cédric Le Goater
@ 2024-02-26  9:01       ` Peter Xu
  2024-02-26 20:21         ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2024-02-26  9:01 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Steven Sistare, qemu-devel, David Hildenbrand, Alex Williamson,
	Fabiano Rosas, Michael S. Tsirkin, Jason Wang, Gerd Hoffmann,
	Marc-Andre Lureau

On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
> Go ahead. It will help me for the changes I am doing on error reporting
> for VFIO migration. I will rebase on top.

Thanks for confirming.  I queued the migration patches then, but leave the
two vfio one for further discussion.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-26  9:01       ` Peter Xu
@ 2024-02-26 20:21         ` Steven Sistare
  2024-02-26 22:08           ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-02-26 20:21 UTC (permalink / raw)
  To: Peter Xu, Cédric Le Goater
  Cc: qemu-devel, David Hildenbrand, Alex Williamson, Fabiano Rosas,
	Michael S. Tsirkin, Jason Wang, Gerd Hoffmann, Marc-Andre Lureau

On 2/26/2024 4:01 AM, Peter Xu wrote:
> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>> Go ahead. It will help me for the changes I am doing on error reporting
>> for VFIO migration. I will rebase on top.
> 
> Thanks for confirming.  I queued the migration patches then, but leave the
> two vfio one for further discussion.

Very good, thanks.  I am always happy to move the ball a few yards closer to
the goal line :)

- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-26 20:21         ` Steven Sistare
@ 2024-02-26 22:08           ` Steven Sistare
  2024-02-27  4:07             ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-02-26 22:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, David Hildenbrand, Alex Williamson, Fabiano Rosas,
	Michael S. Tsirkin, Jason Wang, Gerd Hoffmann, Marc-Andre Lureau,
	Cédric Le Goater

On 2/26/2024 3:21 PM, Steven Sistare wrote:
> On 2/26/2024 4:01 AM, Peter Xu wrote:
>> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>>> Go ahead. It will help me for the changes I am doing on error reporting
>>> for VFIO migration. I will rebase on top.
>>
>> Thanks for confirming.  I queued the migration patches then, but leave the
>> two vfio one for further discussion.
> 
> Very good, thanks.  I am always happy to move the ball a few yards closer to
> the goal line :)

Peter, beware that patch 3 needs an edit before being queued.
This hunk snuck in and should be deleted:

[PATCH V4 03/14] migration: convert to NotifierWithReturn
diff --git a/roms/seabios-hppa b/roms/seabios-hppa
index 03774ed..e4eac85 160000
--- a/roms/seabios-hppa
+++ b/roms/seabios-hppa
@@ -1 +1 @@
-Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
+Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f


- Steve


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-26 22:08           ` Steven Sistare
@ 2024-02-27  4:07             ` Peter Xu
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2024-02-27  4:07 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, David Hildenbrand, Alex Williamson, Fabiano Rosas,
	Michael S. Tsirkin, Jason Wang, Gerd Hoffmann, Marc-Andre Lureau,
	Cédric Le Goater

On Mon, Feb 26, 2024 at 05:08:05PM -0500, Steven Sistare wrote:
> On 2/26/2024 3:21 PM, Steven Sistare wrote:
> > On 2/26/2024 4:01 AM, Peter Xu wrote:
> >> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
> >>> Go ahead. It will help me for the changes I am doing on error reporting
> >>> for VFIO migration. I will rebase on top.
> >>
> >> Thanks for confirming.  I queued the migration patches then, but leave the
> >> two vfio one for further discussion.
> > 
> > Very good, thanks.  I am always happy to move the ball a few yards closer to
> > the goal line :)
> 
> Peter, beware that patch 3 needs an edit before being queued.
> This hunk snuck in and should be deleted:
> 
> [PATCH V4 03/14] migration: convert to NotifierWithReturn
> diff --git a/roms/seabios-hppa b/roms/seabios-hppa
> index 03774ed..e4eac85 160000
> --- a/roms/seabios-hppa
> +++ b/roms/seabios-hppa
> @@ -1 +1 @@
> -Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
> +Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f

I see, I dropped this change in the patch.

https://gitlab.com/peterx/qemu/-/commit/ccea71f8f222593c47670366d7d86554586e596e

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 13/14] migration: update cpr-reboot description
  2024-02-22 17:28 ` [PATCH V4 13/14] migration: update cpr-reboot description Steve Sistare
  2024-02-26  2:10   ` Peter Xu
@ 2024-02-28  7:12   ` Markus Armbruster
  1 sibling, 0 replies; 43+ messages in thread
From: Markus Armbruster @ 2024-02-28  7:12 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

Steve Sistare <steven.sistare@oracle.com> writes:

> Clarify qapi for cpr-reboot migration mode, and add vfio support.

The patch only affects documentation, but that's less than clear from
the commit message.  Suggest

  Improve documentation for migration mode @cpr-reboot.  In particular,
  document VFIO support.

> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  qapi/migration.json | 35 ++++++++++++++++++++++-------------
>  1 file changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5a565d9..0990297 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -636,19 +636,28 @@
>  #
>  # @normal: the original form of migration. (since 8.2)
>  #
> -# @cpr-reboot: The migrate command saves state to a file, allowing one to
> -#              quit qemu, reboot to an updated kernel, and restart an updated
> -#              version of qemu.  The caller must specify a migration URI
> -#              that writes to and reads from a file.  Unlike normal mode,
> -#              the use of certain local storage options does not block the
> -#              migration, but the caller must not modify guest block devices
> -#              between the quit and restart.  To avoid saving guest RAM to the
> -#              file, the memory backend must be shared, and the @x-ignore-shared
> -#              migration capability must be set.  Guest RAM must be non-volatile
> -#              across reboot, such as by backing it with a dax device, but this
> -#              is not enforced.  The restarted qemu arguments must match those
> -#              used to initially start qemu, plus the -incoming option.
> -#              (since 8.2)
> +# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
> +#     After quitting qemu, the user resumes by running qemu -incoming.

These two sentences apply to any migration mode, don't they?  Just
checking I understand.

> +#
> +#     This mode allows the user to quit qemu, and restart an updated version
> +#     of qemu.  The user may even update and reboot the OS before restarting,
> +#     as long as the URI persists across a reboot.

Hmm, doesn't normal migration also support migrating to a newer QEMU?

> +#
> +#     Unlike normal mode, the use of certain local storage options does not
> +#     block the migration, but the user must not modify guest block devices
> +#     between the quit and restart.

"Must not modify the contents of the guest block devices"?

> +#
> +#     This mode supports vfio devices provided the user first puts the guest

"VFIO devices"

> +#     in the suspended runstate, such as by issuing guest-suspend-ram to the
> +#     qemu guest agent.
> +#
> +#     Best performance is achieved when the memory backend is shared and the
> +#     @x-ignore-shared migration capability is set, but this is not required.
> +#     Further, if the user reboots before restarting such a configuration, the
> +#     shared backend must be be non-volatile across reboot, such as by backing

Typo: "be be non-volatile"

Suggest "the shared memory must persist"

> +#     it with a dax device.
> +#
> +#     (since 8.2)
>  ##
>  { 'enum': 'MigMode',
>    'data': [ 'normal', 'cpr-reboot' ] }

Thanks for adjusting indentation to conform to conventions.  Please
additionally reflow the text to limit line length to 70 characters.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-22 17:28 ` [PATCH V4 14/14] migration: options incompatible with cpr Steve Sistare
  2024-02-26  2:10   ` Peter Xu
@ 2024-02-28  7:21   ` Markus Armbruster
  2024-02-28 13:23     ` Steven Sistare
  1 sibling, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2024-02-28  7:21 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

Steve Sistare <steven.sistare@oracle.com> writes:

> Fail the migration request if options are set that are incompatible
> with cpr.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  migration/migration.c | 17 +++++++++++++++++
>  qapi/migration.json   |  2 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 90a9094..7652fd4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>          return false;
>      }
>  
> +    if (migrate_mode_is_cpr(s)) {
> +        const char *conflict = NULL;
> +
> +        if (migrate_postcopy()) {
> +            conflict = "postcopy";
> +        } else if (migrate_background_snapshot()) {
> +            conflict = "background snapshot";
> +        } else if (migrate_colo()) {
> +            conflict = "COLO";
> +        }
> +
> +        if (conflict) {
> +            error_setg(errp, "Cannot use %s with CPR", conflict);
> +            return false;
> +        }
> +    }
> +
>      if (blk || blk_inc) {
>          if (migrate_colo()) {
>              error_setg(errp, "No disk migration is required in COLO mode");
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 0990297..c6bfe2e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -657,6 +657,8 @@
>  #     shared backend must be be non-volatile across reboot, such as by backing
>  #     it with a dax device.
>  #
> +#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
> +#

@cpr-reboot

COLO

Wrap the line:

   #     @cpr-reboot may not be used with postcopy, COLO, or
   #     background-snapshot.

This doesn't tell the reader what settings exactly do not work with
@cpr-reboot.

For instance "background-snapshot" is about enabling migration
capability @background-snapshot.  We could write something like "is
incompatible with enabling migration capability @background-snapshot".

Same for the other two.  Worthwhile?

>  #     (since 8.2)
>  ##
>  { 'enum': 'MigMode',



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-28  7:21   ` Markus Armbruster
@ 2024-02-28 13:23     ` Steven Sistare
  2024-02-28 16:05       ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-02-28 13:23 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On 2/28/2024 2:21 AM, Markus Armbruster wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> Fail the migration request if options are set that are incompatible
>> with cpr.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  migration/migration.c | 17 +++++++++++++++++
>>  qapi/migration.json   |  2 ++
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 90a9094..7652fd4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>>          return false;
>>      }
>>  
>> +    if (migrate_mode_is_cpr(s)) {
>> +        const char *conflict = NULL;
>> +
>> +        if (migrate_postcopy()) {
>> +            conflict = "postcopy";
>> +        } else if (migrate_background_snapshot()) {
>> +            conflict = "background snapshot";
>> +        } else if (migrate_colo()) {
>> +            conflict = "COLO";
>> +        }
>> +
>> +        if (conflict) {
>> +            error_setg(errp, "Cannot use %s with CPR", conflict);
>> +            return false;
>> +        }
>> +    }
>> +
>>      if (blk || blk_inc) {
>>          if (migrate_colo()) {
>>              error_setg(errp, "No disk migration is required in COLO mode");
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 0990297..c6bfe2e 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -657,6 +657,8 @@
>>  #     shared backend must be be non-volatile across reboot, such as by backing
>>  #     it with a dax device.
>>  #
>> +#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
>> +#
> 
> @cpr-reboot
> 
> COLO
> 
> Wrap the line:
> 
>    #     @cpr-reboot may not be used with postcopy, COLO, or
>    #     background-snapshot.
> 
> This doesn't tell the reader what settings exactly do not work with
> @cpr-reboot.
> 
> For instance "background-snapshot" is about enabling migration
> capability @background-snapshot.  We could write something like "is
> incompatible with enabling migration capability @background-snapshot".
> 
> Same for the other two.  Worthwhile?

I initially was more precise exactly as you suggest, but I realized that
postcopy encompasses multiple capabilities, and I did not want to enumerate
them, nor require new capabilities related to these 3 to be listed here 
if/when they are created, so I generalized.  A keyword search in this file 
gives unambiguous matches.

Peter pushed the patch a few hours before you sent this.

- Steve


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-28 13:23     ` Steven Sistare
@ 2024-02-28 16:05       ` Markus Armbruster
  2024-02-28 16:31         ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2024-02-28 16:05 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

Steven Sistare <steven.sistare@oracle.com> writes:

> On 2/28/2024 2:21 AM, Markus Armbruster wrote:
>> Steve Sistare <steven.sistare@oracle.com> writes:
>> 
>>> Fail the migration request if options are set that are incompatible
>>> with cpr.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

[...]

>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 0990297..c6bfe2e 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -657,6 +657,8 @@
>>>  #     shared backend must be be non-volatile across reboot, such as by backing
>>>  #     it with a dax device.
>>>  #
>>> +#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
>>> +#
>> 
>> @cpr-reboot
>> 
>> COLO
>> 
>> Wrap the line:
>> 
>>    #     @cpr-reboot may not be used with postcopy, COLO, or
>>    #     background-snapshot.
>> 
>> This doesn't tell the reader what settings exactly do not work with
>> @cpr-reboot.
>> 
>> For instance "background-snapshot" is about enabling migration
>> capability @background-snapshot.  We could write something like "is
>> incompatible with enabling migration capability @background-snapshot".
>> 
>> Same for the other two.  Worthwhile?
>
> I initially was more precise exactly as you suggest, but I realized that
> postcopy encompasses multiple capabilities, and I did not want to enumerate
> them, nor require new capabilities related to these 3 to be listed here 
> if/when they are created, so I generalized.  A keyword search in this file 
> gives unambiguous matches.
>
> Peter pushed the patch a few hours before you sent this.

Okay.

A followup patch to correct @cpr-reboot, COLO and line wrapping would be
nice.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-28 16:05       ` Markus Armbruster
@ 2024-02-28 16:31         ` Steven Sistare
  2024-02-29  5:31           ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-02-28 16:31 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On 2/28/2024 11:05 AM, Markus Armbruster wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
> 
>> On 2/28/2024 2:21 AM, Markus Armbruster wrote:
>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>
>>>> Fail the migration request if options are set that are incompatible
>>>> with cpr.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> [...]
> 
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 0990297..c6bfe2e 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -657,6 +657,8 @@
>>>>  #     shared backend must be be non-volatile across reboot, such as by backing
>>>>  #     it with a dax device.
>>>>  #
>>>> +#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
>>>> +#
>>>
>>> @cpr-reboot
>>>
>>> COLO
>>>
>>> Wrap the line:
>>>
>>>    #     @cpr-reboot may not be used with postcopy, COLO, or
>>>    #     background-snapshot.
>>>
>>> This doesn't tell the reader what settings exactly do not work with
>>> @cpr-reboot.
>>>
>>> For instance "background-snapshot" is about enabling migration
>>> capability @background-snapshot.  We could write something like "is
>>> incompatible with enabling migration capability @background-snapshot".
>>>
>>> Same for the other two.  Worthwhile?
>>
>> I initially was more precise exactly as you suggest, but I realized that
>> postcopy encompasses multiple capabilities, and I did not want to enumerate
>> them, nor require new capabilities related to these 3 to be listed here 
>> if/when they are created, so I generalized.  A keyword search in this file 
>> gives unambiguous matches.
>>
>> Peter pushed the patch a few hours before you sent this.
> 
> Okay.
> 
> A followup patch to correct @cpr-reboot, COLO and line wrapping would be
> nice.

Will do - steve 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-28 16:31         ` Steven Sistare
@ 2024-02-29  5:31           ` Markus Armbruster
  2024-02-29  5:40             ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2024-02-29  5:31 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

Steven Sistare <steven.sistare@oracle.com> writes:

> On 2/28/2024 11:05 AM, Markus Armbruster wrote:
>> Steven Sistare <steven.sistare@oracle.com> writes:
>> 
>>> On 2/28/2024 2:21 AM, Markus Armbruster wrote:
>>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>>
>>>>> Fail the migration request if options are set that are incompatible
>>>>> with cpr.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> 
>> [...]
>> 
>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>> index 0990297..c6bfe2e 100644
>>>>> --- a/qapi/migration.json
>>>>> +++ b/qapi/migration.json
>>>>> @@ -657,6 +657,8 @@
>>>>>  #     shared backend must be be non-volatile across reboot, such as by backing
>>>>>  #     it with a dax device.
>>>>>  #
>>>>> +#     cpr-reboot may not be used with postcopy, colo, or background-snapshot.
>>>>> +#
>>>>
>>>> @cpr-reboot
>>>>
>>>> COLO
>>>>
>>>> Wrap the line:
>>>>
>>>>    #     @cpr-reboot may not be used with postcopy, COLO, or
>>>>    #     background-snapshot.
>>>>
>>>> This doesn't tell the reader what settings exactly do not work with
>>>> @cpr-reboot.
>>>>
>>>> For instance "background-snapshot" is about enabling migration
>>>> capability @background-snapshot.  We could write something like "is
>>>> incompatible with enabling migration capability @background-snapshot".
>>>>
>>>> Same for the other two.  Worthwhile?
>>>
>>> I initially was more precise exactly as you suggest, but I realized that
>>> postcopy encompasses multiple capabilities, and I did not want to enumerate
>>> them, nor require new capabilities related to these 3 to be listed here 
>>> if/when they are created, so I generalized.  A keyword search in this file 
>>> gives unambiguous matches.
>>>
>>> Peter pushed the patch a few hours before you sent this.
>> 
>> Okay.
>> 
>> A followup patch to correct @cpr-reboot, COLO and line wrapping would be
>> nice.
>
> Will do - steve 

Hmm, perhaps Peter can still squash in the corrections before posting
his PR.  Peter?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-29  5:31           ` Markus Armbruster
@ 2024-02-29  5:40             ` Peter Xu
  2024-02-29 14:59               ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2024-02-29  5:40 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Steven Sistare, qemu-devel, Fabiano Rosas, Michael S. Tsirkin,
	Jason Wang, Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 29, 2024 at 06:31:26AM +0100, Markus Armbruster wrote:
> Hmm, perhaps Peter can still squash in the corrections before posting
> his PR.  Peter?

The PR was sent yesterday, it's already in PeterM's -staging.  I worry it's
a bit late to call a stop now.

https://lore.kernel.org/qemu-devel/20240228051315.400759-23-peterx@redhat.com

Steve, could you provide a standalone patch for the update?  Then I'll do
the best accordingly (e.g. if the PR failed to apply I'll squash when
resend v2; or I'll pick it up for the next).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 11/14] vfio: register container for cpr
  2024-02-22 17:28 ` [PATCH V4 11/14] vfio: register container " Steve Sistare
@ 2024-02-29  8:35   ` Cédric Le Goater
  2024-02-29 13:40     ` Steven Sistare
  0 siblings, 1 reply; 43+ messages in thread
From: Cédric Le Goater @ 2024-02-29  8:35 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Gerd Hoffmann, Marc-Andre Lureau,
	David Hildenbrand

Hello Steve,

On 2/22/24 18:28, Steve Sistare wrote:
> Define entry points to perform per-container cpr-specific initialization
> and teardown.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>   hw/vfio/container.c           | 11 ++++++++++-
>   hw/vfio/cpr.c                 | 19 +++++++++++++++++++
>   hw/vfio/iommufd.c             |  6 ++++++
>   hw/vfio/meson.build           |  1 +
>   include/hw/vfio/vfio-common.h |  3 +++
>   5 files changed, 39 insertions(+), 1 deletion(-)
>   create mode 100644 hw/vfio/cpr.c
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index bd25b9f..096d77e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -621,10 +621,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>           goto free_container_exit;
>       }
>   
> +    ret = vfio_cpr_register_container(bcontainer, errp);
> +    if (ret) {
> +        goto free_container_exit;
> +    }
> +
>       ret = vfio_ram_block_discard_disable(container, true);
>       if (ret) {
>           error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
> -        goto free_container_exit;
> +        goto unregister_container_exit;
>       }
>   
>       assert(bcontainer->ops->setup);
> @@ -667,6 +672,9 @@ listener_release_exit:
>   enable_discards_exit:
>       vfio_ram_block_discard_disable(container, false);
>   
> +unregister_container_exit:
> +    vfio_cpr_unregister_container(bcontainer);
> +
>   free_container_exit:
>       g_free(container);
>   
> @@ -710,6 +718,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>           vfio_container_destroy(bcontainer);
>   
>           trace_vfio_disconnect_container(container->fd);
> +        vfio_cpr_unregister_container(bcontainer);
>           close(container->fd);
>           g_free(container);
>   
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> new file mode 100644
> index 0000000..3bede54
> --- /dev/null
> +++ b/hw/vfio/cpr.c
> @@ -0,0 +1,19 @@
> +/*
> + * Copyright (c) 2021-2024 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "qapi/error.h"
> +
> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
> +{
> +    return 0;
> +}
> +
> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
> +{
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 9bfddc1..e1be224 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -411,6 +411,11 @@ found_container:
>           goto err_listener_register;
>       }
>   
> +    ret = vfio_cpr_register_container(bcontainer, errp);
> +    if (ret) {
> +        goto err_listener_register;
> +    }
> +
>       /*
>        * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
>        * for discarding incompatibility check as well?
> @@ -461,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
>           iommufd_cdev_ram_block_discard_disable(false);
>       }
>   
> +    vfio_cpr_unregister_container(bcontainer);
>       iommufd_cdev_detach_container(vbasedev, container);
>       iommufd_cdev_container_destroy(container);
>       vfio_put_address_space(space);
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index bb98493..bba776f 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -5,6 +5,7 @@ vfio_ss.add(files(
>     'container-base.c',
>     'container.c',
>     'migration.c',
> +  'cpr.c',
>   ))
>   vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
>   vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 4a6c262..b9da6c0 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -205,6 +205,9 @@ void vfio_detach_device(VFIODevice *vbasedev);
>   int vfio_kvm_device_add_fd(int fd, Error **errp);
>   int vfio_kvm_device_del_fd(int fd, Error **errp);
>   
> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp);

Should we return bool since we have an errp ? the returned value
is not an errno AFAICT.

Anyhow,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
> +
>   extern const MemoryRegionOps vfio_region_ops;
>   typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>   typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended
  2024-02-22 17:28 ` [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended Steve Sistare
@ 2024-02-29  8:44   ` Cédric Le Goater
  0 siblings, 0 replies; 43+ messages in thread
From: Cédric Le Goater @ 2024-02-29  8:44 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Gerd Hoffmann, Marc-Andre Lureau,
	David Hildenbrand

On 2/22/24 18:28, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
> verifies the guest is suspended.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.




> ---
>   hw/vfio/common.c                      |  2 +-
>   hw/vfio/cpr.c                         | 20 ++++++++++++++++++++
>   hw/vfio/migration.c                   |  2 +-
>   include/hw/vfio/vfio-container-base.h |  1 +
>   4 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 059bfdc..ff88c3f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
>       error_setg(&multiple_devices_migration_blocker,
>                  "Multiple VFIO devices migration is supported only if all of "
>                  "them support P2P migration");
> -    ret = migrate_add_blocker(&multiple_devices_migration_blocker, errp);
> +    ret = migrate_add_blocker_normal(&multiple_devices_migration_blocker, errp);
>   
>       return ret;
>   }
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 3bede54..392c2dd 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -7,13 +7,33 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/vfio/vfio-common.h"
> +#include "migration/misc.h"
>   #include "qapi/error.h"
> +#include "sysemu/runstate.h"
> +
> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> +                                    MigrationEvent *e, Error **errp)
> +{
> +    if (e->type == MIG_EVENT_PRECOPY_SETUP &&
> +        !runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
> +
> +        error_setg(errp,
> +            "VFIO device only supports cpr-reboot for runstate suspended");
> +
> +        return -1;
> +    }
> +    return 0;
> +}
>   
>   int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
>   {
> +    migration_add_notifier_mode(&bcontainer->cpr_reboot_notifier,
> +                                vfio_cpr_reboot_notifier,
> +                                MIG_MODE_CPR_REBOOT);
>       return 0;
>   }
>   
>   void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
>   {
> +    migration_remove_notifier(&bcontainer->cpr_reboot_notifier);
>   }
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 50140ed..2050ac8 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -889,7 +889,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, Error *err, Error **errp)
>       vbasedev->migration_blocker = error_copy(err);
>       error_free(err);
>   
> -    return migrate_add_blocker(&vbasedev->migration_blocker, errp);
> +    return migrate_add_blocker_normal(&vbasedev->migration_blocker, errp);
>   }
>   
>   /* ---------------------------------------------------------------------- */
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index b2813b0..3582d5f 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -49,6 +49,7 @@ typedef struct VFIOContainerBase {
>       QLIST_ENTRY(VFIOContainerBase) next;
>       QLIST_HEAD(, VFIODevice) device_list;
>       GList *iova_ranges;
> +    NotifierWithReturn cpr_reboot_notifier;
>   } VFIOContainerBase;
>   
>   typedef struct VFIOGuestIOMMU {



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 11/14] vfio: register container for cpr
  2024-02-29  8:35   ` Cédric Le Goater
@ 2024-02-29 13:40     ` Steven Sistare
  0 siblings, 0 replies; 43+ messages in thread
From: Steven Sistare @ 2024-02-29 13:40 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Gerd Hoffmann, Marc-Andre Lureau,
	David Hildenbrand



On 2/29/2024 3:35 AM, Cédric Le Goater wrote:
> Hello Steve,
> 
> On 2/22/24 18:28, Steve Sistare wrote:
>> Define entry points to perform per-container cpr-specific initialization
>> and teardown.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   hw/vfio/container.c           | 11 ++++++++++-
>>   hw/vfio/cpr.c                 | 19 +++++++++++++++++++
>>   hw/vfio/iommufd.c             |  6 ++++++
>>   hw/vfio/meson.build           |  1 +
>>   include/hw/vfio/vfio-common.h |  3 +++
>>   5 files changed, 39 insertions(+), 1 deletion(-)
>>   create mode 100644 hw/vfio/cpr.c
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index bd25b9f..096d77e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -621,10 +621,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>           goto free_container_exit;
>>       }
>>   +    ret = vfio_cpr_register_container(bcontainer, errp);
>> +    if (ret) {
>> +        goto free_container_exit;
>> +    }
>> +
>>       ret = vfio_ram_block_discard_disable(container, true);
>>       if (ret) {
>>           error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
>> -        goto free_container_exit;
>> +        goto unregister_container_exit;
>>       }
>>         assert(bcontainer->ops->setup);
>> @@ -667,6 +672,9 @@ listener_release_exit:
>>   enable_discards_exit:
>>       vfio_ram_block_discard_disable(container, false);
>>   +unregister_container_exit:
>> +    vfio_cpr_unregister_container(bcontainer);
>> +
>>   free_container_exit:
>>       g_free(container);
>>   @@ -710,6 +718,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>>           vfio_container_destroy(bcontainer);
>>             trace_vfio_disconnect_container(container->fd);
>> +        vfio_cpr_unregister_container(bcontainer);
>>           close(container->fd);
>>           g_free(container);
>>   diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> new file mode 100644
>> index 0000000..3bede54
>> --- /dev/null
>> +++ b/hw/vfio/cpr.c
>> @@ -0,0 +1,19 @@
>> +/*
>> + * Copyright (c) 2021-2024 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "hw/vfio/vfio-common.h"
>> +#include "qapi/error.h"
>> +
>> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
>> +{
>> +    return 0;
>> +}
>> +
>> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
>> +{
>> +}
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 9bfddc1..e1be224 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -411,6 +411,11 @@ found_container:
>>           goto err_listener_register;
>>       }
>>   +    ret = vfio_cpr_register_container(bcontainer, errp);
>> +    if (ret) {
>> +        goto err_listener_register;
>> +    }
>> +
>>       /*
>>        * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
>>        * for discarding incompatibility check as well?
>> @@ -461,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
>>           iommufd_cdev_ram_block_discard_disable(false);
>>       }
>>   +    vfio_cpr_unregister_container(bcontainer);
>>       iommufd_cdev_detach_container(vbasedev, container);
>>       iommufd_cdev_container_destroy(container);
>>       vfio_put_address_space(space);
>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>> index bb98493..bba776f 100644
>> --- a/hw/vfio/meson.build
>> +++ b/hw/vfio/meson.build
>> @@ -5,6 +5,7 @@ vfio_ss.add(files(
>>     'container-base.c',
>>     'container.c',
>>     'migration.c',
>> +  'cpr.c',
>>   ))
>>   vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
>>   vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 4a6c262..b9da6c0 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -205,6 +205,9 @@ void vfio_detach_device(VFIODevice *vbasedev);
>>   int vfio_kvm_device_add_fd(int fd, Error **errp);
>>   int vfio_kvm_device_del_fd(int fd, Error **errp);
>>   +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp);
> 
> Should we return bool since we have an errp ? the returned value
> is not an errno AFAICT.
> 
> Anyhow,
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>

Hi Cedric,  vfio_cpr_register_container sometimes returns the value returned from 
migrate_add_blocker_modes, which is an errno.

Thanks for reviewing these!

- Steve
 
>> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
>> +
>>   extern const MemoryRegionOps vfio_region_ops;
>>   typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>>   typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 14/14] migration: options incompatible with cpr
  2024-02-29  5:40             ` Peter Xu
@ 2024-02-29 14:59               ` Steven Sistare
  0 siblings, 0 replies; 43+ messages in thread
From: Steven Sistare @ 2024-02-29 14:59 UTC (permalink / raw)
  To: Peter Xu, Markus Armbruster; +Cc: qemu-devel, Fabiano Rosas

On 2/29/2024 12:40 AM, Peter Xu wrote:
> On Thu, Feb 29, 2024 at 06:31:26AM +0100, Markus Armbruster wrote:
>> Hmm, perhaps Peter can still squash in the corrections before posting
>> his PR.  Peter?
> 
> The PR was sent yesterday, it's already in PeterM's -staging.  I worry it's
> a bit late to call a stop now.
> 
> https://lore.kernel.org/qemu-devel/20240228051315.400759-23-peterx@redhat.com
> 
> Steve, could you provide a standalone patch for the update?  Then I'll do
> the best accordingly (e.g. if the PR failed to apply I'll squash when
> resend v2; or I'll pick it up for the next).

I sent the patch.  I also re-wrapped all cpr-reboot paragraphs to 70 columns
and addressed Markus' other comments on "migration: update cpr-reboot description".

Peter, if you squash it, the last sentence "cpr-reboot may not be used ..." 
squashes into
   migration: options incompatible with cpr
and everything else squashes into
   migration: update cpr-reboot description

- Steve


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-02-26  2:08   ` Peter Xu
@ 2024-02-29 15:21     ` Steven Sistare
  2024-03-01  1:28       ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-02-29 15:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On 2/25/2024 9:08 PM, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
>> When migration for cpr is initiated, stop the vm and set state
>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>> possibility of ram and device state being out of sync, and guarantees
>> that a guest in the suspended state remains suspended, because qmp_cont
>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> cpr-reboot mode keeps changing behavior.
> 
> Could we declare it "experimental" until it's solid?  Maybe a patch to
> document this?
> 
> Normally IMHO we shouldn't merge a feature if it's not complete, however
> cpr-reboot is so special that the mode itself is already merged in 8.2
> before I started to merge patches, and it keeps changing things.  I don't
> know what else we can do here besides declaring it experimental and not
> declare it a stable feature.

Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in:
    migration: stop vm for cpr

Suspension to support vfio is an enhancement which adds to the basic functionality,
it does not change it.  This was planned all along, but submitted as a separate 
series to manage complexity, as I outlined in my qemu community presentation,
which I emailed you at the time.

Other "changes" that arose during review were just clarifications and explanations.

So, I don't think cpr-reboot deserves to be condemned to experimental limbo.

- Steve



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-02-29 15:21     ` Steven Sistare
@ 2024-03-01  1:28       ` Peter Xu
  2024-03-01 10:41         ` Cédric Le Goater
  2024-03-13 14:18         ` Steven Sistare
  0 siblings, 2 replies; 43+ messages in thread
From: Peter Xu @ 2024-03-01  1:28 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:
> On 2/25/2024 9:08 PM, Peter Xu wrote:
> > On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
> >> When migration for cpr is initiated, stop the vm and set state
> >> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
> >> possibility of ram and device state being out of sync, and guarantees
> >> that a guest in the suspended state remains suspended, because qmp_cont
> >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > 
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > 
> > cpr-reboot mode keeps changing behavior.
> > 
> > Could we declare it "experimental" until it's solid?  Maybe a patch to
> > document this?
> > 
> > Normally IMHO we shouldn't merge a feature if it's not complete, however
> > cpr-reboot is so special that the mode itself is already merged in 8.2
> > before I started to merge patches, and it keeps changing things.  I don't
> > know what else we can do here besides declaring it experimental and not
> > declare it a stable feature.
> 
> Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in:
>     migration: stop vm for cpr
> 
> Suspension to support vfio is an enhancement which adds to the basic functionality,
> it does not change it.  This was planned all along, but submitted as a separate 

If VFIO used to migrate without suspension and now it won't, it's a
behavior change?

> series to manage complexity, as I outlined in my qemu community presentation,
> which I emailed you at the time.
> 
> Other "changes" that arose during review were just clarifications and explanations.
> 
> So, I don't think cpr-reboot deserves to be condemned to experimental limbo.

IMHO it's not about a feature being condemned, it's about a kindful
heads-up to the users that one needs to take risk on using it until it
becomes stable, it also makes developers easier because of no limitation on
behavior change.  If all the changes are landing, then there's no need for
such a patch.

If so, please propose the planned complete document patch. I had a feeling
that cpr is still not fully understood by even many developers on the list.
It'll be great that such document will contain all the details one needs to
know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
when to use it, how to use it, how it differs from "normal" mode
(etc. lifted limitations on some devices that used to block migration?),
what is enforced (vm stop, suspension, etc.) and what is optionally offered
(VFIO, shared-mem, etc.), the expected behaviors, etc.

When send it, please copy relevant people (Alex & Cedric for VFIO, while
Markus could also be a good candidate considering the QMP involvement).

Thanks a lot,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-03-01  1:28       ` Peter Xu
@ 2024-03-01 10:41         ` Cédric Le Goater
  2024-03-13 14:18         ` Steven Sistare
  1 sibling, 0 replies; 43+ messages in thread
From: Cédric Le Goater @ 2024-03-01 10:41 UTC (permalink / raw)
  To: Peter Xu, Steven Sistare
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Gerd Hoffmann, Marc-Andre Lureau,
	David Hildenbrand

On 3/1/24 02:28, Peter Xu wrote:
> On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:
>> On 2/25/2024 9:08 PM, Peter Xu wrote:
>>> On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
>>>> When migration for cpr is initiated, stop the vm and set state
>>>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>>>> possibility of ram and device state being out of sync, and guarantees
>>>> that a guest in the suspended state remains suspended, because qmp_cont
>>>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>
>>> cpr-reboot mode keeps changing behavior.
>>>
>>> Could we declare it "experimental" until it's solid?  Maybe a patch to
>>> document this?
>>>
>>> Normally IMHO we shouldn't merge a feature if it's not complete, however
>>> cpr-reboot is so special that the mode itself is already merged in 8.2
>>> before I started to merge patches, and it keeps changing things.  I don't
>>> know what else we can do here besides declaring it experimental and not
>>> declare it a stable feature.
>>
>> Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in:
>>      migration: stop vm for cpr
>>
>> Suspension to support vfio is an enhancement which adds to the basic functionality,
>> it does not change it.  This was planned all along, but submitted as a separate
> 
> If VFIO used to migrate without suspension and now it won't, it's a
> behavior change?
> 
>> series to manage complexity, as I outlined in my qemu community presentation,
>> which I emailed you at the time.
>>
>> Other "changes" that arose during review were just clarifications and explanations.
>>
>> So, I don't think cpr-reboot deserves to be condemned to experimental limbo.
> 
> IMHO it's not about a feature being condemned, it's about a kindful
> heads-up to the users that one needs to take risk on using it until it
> becomes stable, it also makes developers easier because of no limitation on
> behavior change.  If all the changes are landing, then there's no need for
> such a patch.
> 
> If so, please propose the planned complete document patch. I had a feeling
> that cpr is still not fully understood by even many developers on the list.
> It'll be great that such document will contain all the details one needs to
> know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
> when to use it, how to use it, how it differs from "normal" mode
> (etc. lifted limitations on some devices that used to block migration?),
> what is enforced (vm stop, suspension, etc.) and what is optionally offered
> (VFIO, shared-mem, etc.), the expected behaviors, etc.
> 
> When send it, please copy relevant people (Alex & Cedric for VFIO, while
> Markus could also be a good candidate considering the QMP involvement).

I second that. If we could have a file under docs/, it would be great.

Thanks,

C.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 00/14] allow cpr-reboot for vfio
  2024-02-22 17:33 ` [PATCH V4 00/14] allow cpr-reboot for vfio Steven Sistare
  2024-02-26  2:14   ` Peter Xu
@ 2024-03-08 16:52   ` Cédric Le Goater
  1 sibling, 0 replies; 43+ messages in thread
From: Cédric Le Goater @ 2024-03-08 16:52 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel, Peter Xu, David Hildenbrand, Alex Williamson
  Cc: Fabiano Rosas, Michael S. Tsirkin, Jason Wang, Gerd Hoffmann,
	Marc-Andre Lureau

On 2/22/24 18:33, Steven Sistare wrote:
> Peter (and David if interested): these patches still need RB:
>    migration: notifier error checking
>    migration: stop vm for cpr
>    migration: update cpr-reboot description
>    migration: options incompatible with cpr
> 
> Alex, these patches still need RB:
>    vfio: register container for cpr
>    vfio: allow cpr-reboot migration if suspended

Applied to vfio-next.

Thanks,

C.




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-03-01  1:28       ` Peter Xu
  2024-03-01 10:41         ` Cédric Le Goater
@ 2024-03-13 14:18         ` Steven Sistare
  2024-03-13 21:22           ` Cédric Le Goater
  1 sibling, 1 reply; 43+ messages in thread
From: Steven Sistare @ 2024-03-13 14:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Cedric Le Goater, Gerd Hoffmann,
	Marc-Andre Lureau, David Hildenbrand

On 2/29/2024 8:28 PM, Peter Xu wrote:
> On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:
>> On 2/25/2024 9:08 PM, Peter Xu wrote:
>>> On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
>>>> When migration for cpr is initiated, stop the vm and set state
>>>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>>>> possibility of ram and device state being out of sync, and guarantees
>>>> that a guest in the suspended state remains suspended, because qmp_cont
>>>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>
>>> cpr-reboot mode keeps changing behavior.
>>>
>>> Could we declare it "experimental" until it's solid?  Maybe a patch to
>>> document this?
>>>
>>> Normally IMHO we shouldn't merge a feature if it's not complete, however
>>> cpr-reboot is so special that the mode itself is already merged in 8.2
>>> before I started to merge patches, and it keeps changing things.  I don't
>>> know what else we can do here besides declaring it experimental and not
>>> declare it a stable feature.
>>
>> Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in:
>>      migration: stop vm for cpr
>>
>> Suspension to support vfio is an enhancement which adds to the basic functionality,
>> it does not change it.  This was planned all along, but submitted as a separate
> 
> If VFIO used to migrate without suspension and now it won't, it's a
> behavior change?

VFIO could not cpr-reboot migrate without suspension.  The existing vfio 
migration blockers applied to all modes:
   Error: 0000:3a:10.0: VFIO migration is not supported in kernel

Now, with suspension, it will.  An addition, not a change.

>> series to manage complexity, as I outlined in my qemu community presentation,
>> which I emailed you at the time.
>>
>> Other "changes" that arose during review were just clarifications and explanations.
>>
>> So, I don't think cpr-reboot deserves to be condemned to experimental limbo.
> 
> IMHO it's not about a feature being condemned, it's about a kindful
> heads-up to the users that one needs to take risk on using it until it
> becomes stable, it also makes developers easier because of no limitation on
> behavior change.  If all the changes are landing, then there's no need for
> such a patch.
> 
> If so, please propose the planned complete document patch. I had a feeling
> that cpr is still not fully understood by even many developers on the list.
> It'll be great that such document will contain all the details one needs to
> know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
> when to use it, how to use it, how it differs from "normal" mode
> (etc. lifted limitations on some devices that used to block migration?),
> what is enforced (vm stop, suspension, etc.) and what is optionally offered
> (VFIO, shared-mem, etc.), the expected behaviors, etc.
> 
> When send it, please copy relevant people (Alex & Cedric for VFIO, while
> Markus could also be a good candidate considering the QMP involvement).

Submitted.

- Steve


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH V4 10/14] migration: stop vm for cpr
  2024-03-13 14:18         ` Steven Sistare
@ 2024-03-13 21:22           ` Cédric Le Goater
  0 siblings, 0 replies; 43+ messages in thread
From: Cédric Le Goater @ 2024-03-13 21:22 UTC (permalink / raw)
  To: Steven Sistare, Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Michael S. Tsirkin, Jason Wang,
	Alex Williamson, Gerd Hoffmann, Marc-Andre Lureau,
	David Hildenbrand

On 3/13/24 15:18, Steven Sistare wrote:
> On 2/29/2024 8:28 PM, Peter Xu wrote:
>> On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:
>>> On 2/25/2024 9:08 PM, Peter Xu wrote:
>>>> On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
>>>>> When migration for cpr is initiated, stop the vm and set state
>>>>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>>>>> possibility of ram and device state being out of sync, and guarantees
>>>>> that a guest in the suspended state remains suspended, because qmp_cont
>>>>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>>>>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>
>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>
>>>> cpr-reboot mode keeps changing behavior.
>>>>
>>>> Could we declare it "experimental" until it's solid?  Maybe a patch to
>>>> document this?
>>>>
>>>> Normally IMHO we shouldn't merge a feature if it's not complete, however
>>>> cpr-reboot is so special that the mode itself is already merged in 8.2
>>>> before I started to merge patches, and it keeps changing things.  I don't
>>>> know what else we can do here besides declaring it experimental and not
>>>> declare it a stable feature.
>>>
>>> Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in:
>>>      migration: stop vm for cpr
>>>
>>> Suspension to support vfio is an enhancement which adds to the basic functionality,
>>> it does not change it.  This was planned all along, but submitted as a separate
>>
>> If VFIO used to migrate without suspension and now it won't, it's a
>> behavior change?
> 
> VFIO could not cpr-reboot migrate without suspension.  The existing vfio migration blockers applied to all modes:
>    Error: 0000:3a:10.0: VFIO migration is not supported in kernel
> 
> Now, with suspension, it will.  An addition, not a change.

Still, I wonder if we should not have a per-device toggle to block
migration for CPR_REBOOT mode. This to maintain the pre-9.0 behavior
and to manage possible incompatibilities we haven't thought of yet.

A config option to deactivate CPR_REBOOT mode in downstream could be
useful too.

Thanks,

C.




> 
>>> series to manage complexity, as I outlined in my qemu community presentation,
>>> which I emailed you at the time.
>>>
>>> Other "changes" that arose during review were just clarifications and explanations.
>>>
>>> So, I don't think cpr-reboot deserves to be condemned to experimental limbo.
>>
>> IMHO it's not about a feature being condemned, it's about a kindful
>> heads-up to the users that one needs to take risk on using it until it
>> becomes stable, it also makes developers easier because of no limitation on
>> behavior change.  If all the changes are landing, then there's no need for
>> such a patch.
>>
>> If so, please propose the planned complete document patch. I had a feeling
>> that cpr is still not fully understood by even many developers on the list.
>> It'll be great that such document will contain all the details one needs to
>> know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
>> when to use it, how to use it, how it differs from "normal" mode
>> (etc. lifted limitations on some devices that used to block migration?),
>> what is enforced (vm stop, suspension, etc.) and what is optionally offered
>> (VFIO, shared-mem, etc.), the expected behaviors, etc.
>>
>> When send it, please copy relevant people (Alex & Cedric for VFIO, while
>> Markus could also be a good candidate considering the QMP involvement).
> 
> Submitted.
> 
> - Steve
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2024-03-13 21:23 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22 17:28 [PATCH V4 00/14] allow cpr-reboot for vfio Steve Sistare
2024-02-22 17:28 ` [PATCH V4 01/14] notify: pass error to notifier with return Steve Sistare
2024-02-22 17:28 ` [PATCH V4 02/14] migration: remove error from notifier data Steve Sistare
2024-02-22 17:28 ` [PATCH V4 03/14] migration: convert to NotifierWithReturn Steve Sistare
2024-02-22 17:28 ` [PATCH V4 04/14] migration: MigrationEvent for notifiers Steve Sistare
2024-02-22 17:28 ` [PATCH V4 05/14] migration: remove postcopy_after_devices Steve Sistare
2024-02-22 17:28 ` [PATCH V4 06/14] migration: MigrationNotifyFunc Steve Sistare
2024-02-22 17:28 ` [PATCH V4 07/14] migration: per-mode notifiers Steve Sistare
2024-02-22 17:28 ` [PATCH V4 08/14] migration: refactor migrate_fd_connect failures Steve Sistare
2024-02-22 17:28 ` [PATCH V4 09/14] migration: notifier error checking Steve Sistare
2024-02-26  2:03   ` Peter Xu
2024-02-22 17:28 ` [PATCH V4 10/14] migration: stop vm for cpr Steve Sistare
2024-02-26  2:08   ` Peter Xu
2024-02-29 15:21     ` Steven Sistare
2024-03-01  1:28       ` Peter Xu
2024-03-01 10:41         ` Cédric Le Goater
2024-03-13 14:18         ` Steven Sistare
2024-03-13 21:22           ` Cédric Le Goater
2024-02-22 17:28 ` [PATCH V4 11/14] vfio: register container " Steve Sistare
2024-02-29  8:35   ` Cédric Le Goater
2024-02-29 13:40     ` Steven Sistare
2024-02-22 17:28 ` [PATCH V4 12/14] vfio: allow cpr-reboot migration if suspended Steve Sistare
2024-02-29  8:44   ` Cédric Le Goater
2024-02-22 17:28 ` [PATCH V4 13/14] migration: update cpr-reboot description Steve Sistare
2024-02-26  2:10   ` Peter Xu
2024-02-28  7:12   ` Markus Armbruster
2024-02-22 17:28 ` [PATCH V4 14/14] migration: options incompatible with cpr Steve Sistare
2024-02-26  2:10   ` Peter Xu
2024-02-28  7:21   ` Markus Armbruster
2024-02-28 13:23     ` Steven Sistare
2024-02-28 16:05       ` Markus Armbruster
2024-02-28 16:31         ` Steven Sistare
2024-02-29  5:31           ` Markus Armbruster
2024-02-29  5:40             ` Peter Xu
2024-02-29 14:59               ` Steven Sistare
2024-02-22 17:33 ` [PATCH V4 00/14] allow cpr-reboot for vfio Steven Sistare
2024-02-26  2:14   ` Peter Xu
2024-02-26  8:49     ` Cédric Le Goater
2024-02-26  9:01       ` Peter Xu
2024-02-26 20:21         ` Steven Sistare
2024-02-26 22:08           ` Steven Sistare
2024-02-27  4:07             ` Peter Xu
2024-03-08 16:52   ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.