All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V7 00/12] fix migration of suspended runstate
@ 2023-12-06 17:23 Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 01/12] cpus: vm_was_suspended Steve Sistare
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
      cpus: refactor vm_stop
      cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
      cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
    vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
    migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Changes in V7:
  * rebase to tip, add RB-s
  * fix backwards compatibility for global_state.vm_was_suspended
  * delete vm_prepare_start state argument, and rename patch
    "pass runstate to vm_prepare_start" to
    "check running not RUN_STATE_RUNNING"
  * drop patches:
      tests/qtest: bootfile per vm
      tests/qtest: background migration with suspend
  * rename runstate_is_started to runstate_is_live
  * move wait_for_suspend in tests

Steve Sistare (12):
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: check running not RUN_STATE_RUNNING
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

 backends/tpm/tpm_emulator.c          |   2 +-
 hw/usb/hcd-ehci.c                    |   2 +-
 hw/usb/redirect.c                    |   2 +-
 hw/xen/xen-hvm-common.c              |   2 +-
 include/migration/snapshot.h         |   7 ++
 include/sysemu/runstate.h            |  16 ++++
 migration/global_state.c             |  35 ++++++-
 migration/migration-hmp-cmds.c       |   8 +-
 migration/migration.c                |  15 +--
 migration/savevm.c                   |  23 +++--
 qapi/misc.json                       |  10 +-
 system/cpus.c                        |  47 +++++++--
 system/runstate.c                    |   9 ++
 system/vl.c                          |   2 +
 tests/migration/i386/Makefile        |   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +++++++++-
 tests/migration/i386/a-b-bootblock.h |  26 +++--
 tests/qtest/migration-helpers.c      |  27 ++----
 tests/qtest/migration-helpers.h      |  11 ++-
 tests/qtest/migration-test.c         | 181 +++++++++++++++++++++++++----------
 20 files changed, 354 insertions(+), 126 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V7 01/12] cpus: vm_was_suspended
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 02/12] cpus: stop vm in suspended runstate Steve Sistare
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Add a state variable to remember if a vm previously transitioned into a
suspended state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/sysemu/runstate.h |  2 ++
 system/cpus.c             | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index c8c2bd8..88a67e2 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -51,6 +51,8 @@ int vm_prepare_start(bool step_pending);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
+void vm_set_suspended(bool suspended);
+bool vm_get_suspended(void);
 
 typedef enum WakeupReason {
     /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/system/cpus.c b/system/cpus.c
index a444a74..9f631ab 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask)
     }
 }
 
+/*
+ * True if the vm was previously suspended, and has not been woken or reset.
+ */
+static int vm_was_suspended;
+
+void vm_set_suspended(bool suspended)
+{
+    vm_was_suspended = suspended;
+}
+
+bool vm_get_suspended(void)
+{
+    return vm_was_suspended;
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
     int ret = 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 01/12] cpus: vm_was_suspended Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 18:45   ` Philippe Mathieu-Daudé
  2023-12-11 13:37   ` Steven Sistare
  2023-12-06 17:23 ` [PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING Steve Sistare
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

    (qemu) info status
    VM status: paused (suspended)

    (qemu) stop
    (qemu) info status
    VM status: paused

    (qemu) system_wakeup
    Error: Unable to wake up: guest is not in suspended state

    (qemu) cont
    (qemu) info status
    VM status: paused (suspended)

    (qemu) system_wakeup
    (qemu) info status
    VM status: running

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/sysemu/runstate.h |  5 +++++
 qapi/misc.json            | 10 ++++++++--
 system/cpus.c             | 23 +++++++++++++++--------
 system/runstate.c         |  3 +++
 4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 88a67e2..867e53c 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause)
     return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
 }
 
+static inline bool runstate_is_live(RunState state)
+{
+    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}
+
 void vm_start(void);
 
 /**
diff --git a/qapi/misc.json b/qapi/misc.json
index cda2eff..efb8d44 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -134,7 +134,7 @@
 ##
 # @stop:
 #
-# Stop all guest VCPU execution.
+# Stop all guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -143,6 +143,9 @@
 #     the guest remains paused once migration finishes, as if the -S
 #     option was passed on the command line.
 #
+#     In the "suspended" state, it will completely stop the VM and
+#     cause a transition to the "paused" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "stop" }
@@ -153,7 +156,7 @@
 ##
 # @cont:
 #
-# Resume guest VCPU execution.
+# Resume guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -165,6 +168,9 @@
 #     guest starts once migration finishes, removing the effect of the
 #     -S command line option if it was passed.
 #
+#     If the VM was previously suspended, and not been reset or woken,
+#     this command will transition back to the "suspended" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "cont" }
diff --git a/system/cpus.c b/system/cpus.c
index 9f631ab..f162435 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -277,11 +277,15 @@ bool vm_get_suspended(void)
 static int do_vm_stop(RunState state, bool send_stop)
 {
     int ret = 0;
+    RunState oldstate = runstate_get();
 
-    if (runstate_is_running()) {
+    if (runstate_is_live(oldstate)) {
+        vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
         runstate_set(state);
         cpu_disable_ticks();
-        pause_all_vcpus();
+        if (oldstate == RUN_STATE_RUNNING) {
+            pause_all_vcpus();
+        }
         vm_state_notify(0, state);
         if (send_stop) {
             qapi_event_send_stop();
@@ -694,11 +698,13 @@ int vm_stop(RunState state)
 
 /**
  * Prepare for (re)starting the VM.
- * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
- * running or in case of an error condition), 0 otherwise.
+ * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
+ * and 1 otherwise.
  */
 int vm_prepare_start(bool step_pending)
 {
+    int ret = vm_was_suspended ? 1 : 0;
+    RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING;
     RunState requested;
 
     qemu_vmstop_requested(&requested);
@@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
     qapi_event_send_resume();
 
     cpu_enable_ticks();
-    runstate_set(RUN_STATE_RUNNING);
-    vm_state_notify(1, RUN_STATE_RUNNING);
-    return 0;
+    runstate_set(state);
+    vm_state_notify(1, state);
+    vm_was_suspended = false;
+    return ret;
 }
 
 void vm_start(void)
@@ -745,7 +752,7 @@ void vm_start(void)
    current state is forgotten forever */
 int vm_stop_force_state(RunState state)
 {
-    if (runstate_is_running()) {
+    if (runstate_is_live(runstate_get())) {
         return vm_stop(state);
     } else {
         int ret;
diff --git a/system/runstate.c b/system/runstate.c
index ea9d6c2..e2fa204 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
     { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
     { RUN_STATE_PAUSED, RUN_STATE_COLO},
+    { RUN_STATE_PAUSED, RUN_STATE_SUSPENDED},
 
     { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -161,6 +162,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
     { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
+    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
@@ -502,6 +504,7 @@ void qemu_system_reset(ShutdownCause reason)
         qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
     }
     cpu_synchronize_all_post_reset();
+    vm_set_suspended(false);
 }
 
 /*
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 01/12] cpus: vm_was_suspended Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 02/12] cpus: stop vm in suspended runstate Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 04/12] cpus: vm_resume Steve Sistare
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running.  Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1.  However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.

No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 backends/tpm/tpm_emulator.c | 2 +-
 hw/usb/hcd-ehci.c           | 2 +-
 hw/usb/redirect.c           | 2 +-
 hw/xen/xen-hvm-common.c     | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index f7f1b4a..254fce7 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool running,
 
     trace_tpm_emulator_vm_state_change(running, state);
 
-    if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) {
+    if (!running || !tpm_emu->relock_storage) {
         return;
     }
 
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 19b4534..10c82ce 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool running, RunState state)
      * USB-devices which have async handled packages have a packet in the
      * ep queue to match the completion with.
      */
-    if (state == RUN_STATE_RUNNING) {
+    if (running) {
         ehci_advance_async_state(ehci);
     }
 
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index c9893df..3785bb0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool running, RunState state)
 {
     USBRedirDevice *dev = priv;
 
-    if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+    if (running && dev->parser != NULL) {
         usbredirparser_do_write(dev->parser); /* Flush any pending writes */
     }
 }
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39..47e6cb1 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool running,
 
     xen_set_ioreq_server_state(xen_domid,
                                state->ioservid,
-                               (rstate == RUN_STATE_RUNNING));
+                               running);
 }
 
 void xen_exit_notifier(Notifier *n, void *data)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 04/12] cpus: vm_resume
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (2 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 05/12] migration: propagate suspended runstate Steve Sistare
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Define the vm_resume helper, for use in subsequent patches.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/sysemu/runstate.h | 9 +++++++++
 system/cpus.c             | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 867e53c..315046b 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -53,6 +53,15 @@ void vm_start(void);
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
  */
 int vm_prepare_start(bool step_pending);
+
+/**
+ * vm_resume: If @state is a live state, start the vm and set the state,
+ * else just set the state.
+ *
+ * @state: the state to restore
+ */
+void vm_resume(RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index f162435..7d2c28b 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -748,6 +748,15 @@ void vm_start(void)
     }
 }
 
+void vm_resume(RunState state)
+{
+    if (runstate_is_live(state)) {
+        vm_start();
+    } else {
+        runstate_set(state);
+    }
+}
+
 /* does a state transition even if the VM is already stopped,
    current state is forgotten forever */
 int vm_stop_force_state(RunState state)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 05/12] migration: propagate suspended runstate
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (3 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 04/12] cpus: vm_resume Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-08 16:37   ` Fabiano Rosas
  2023-12-11  6:46   ` Peter Xu
  2023-12-06 17:23 ` [PATCH V7 06/12] migration: preserve " Steve Sistare
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 migration/global_state.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..d4f61a1 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -22,7 +22,16 @@
 
 typedef struct {
     uint32_t size;
-    uint8_t runstate[100];
+
+    /*
+     * runstate was 100 bytes, zero padded, but we trimmed it to add a
+     * few fields and maintain backwards compatibility.
+     */
+    uint8_t runstate[32];
+    uint8_t has_vm_was_suspended;
+    uint8_t vm_was_suspended;
+    uint8_t unused[66];
+
     RunState state;
     bool received;
 } GlobalState;
@@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
     assert(strlen(state_str) < sizeof(global_state.runstate));
     strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
               state_str, '\0');
+    global_state.has_vm_was_suspended = true;
+    global_state.vm_was_suspended = vm_get_suspended();
+
+    memset(global_state.unused, 0, sizeof(global_state.unused));
 }
 
 void global_state_store(void)
@@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
         return true;
     }
 
+    /* If the suspended state must be remembered, it is needed */
+
+    if (vm_get_suspended()) {
+        return true;
+    }
+
     /* If state is running or paused, it is not needed */
 
     if (strcmp(runstate, "running") == 0 ||
@@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id)
     Error *local_err = NULL;
     int r;
     char *runstate = (char *)s->runstate;
+    bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
 
     s->received = true;
     trace_migrate_global_state_post_load(runstate);
@@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id)
                 sizeof(s->runstate)) == sizeof(s->runstate)) {
         /*
          * This condition should never happen during migration, because
-         * all runstate names are shorter than 100 bytes (the size of
+         * all runstate names are shorter than 32 bytes (the size of
          * s->runstate). However, a malicious stream could overflow
          * the qapi_enum_parse() call, so we force the last character
          * to a NUL byte.
@@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id)
     }
     s->state = r;
 
+    /*
+     * global_state is saved on the outgoing side before forcing a stopped
+     * state, so it may have saved state=suspended and vm_was_suspended=0.
+     * Now we are in a paused state, and when we later call vm_start, it must
+     * restore the suspended state, so we must set vm_was_suspended=1 here.
+     */
+    vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
+
     return 0;
 }
 
@@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(size, GlobalState),
         VMSTATE_BUFFER(runstate, GlobalState),
+        VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
+        VMSTATE_UINT8(vm_was_suspended, GlobalState),
+        VMSTATE_BUFFER(unused, GlobalState),
         VMSTATE_END_OF_LIST()
     },
 };
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 06/12] migration: preserve suspended runstate
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (4 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 05/12] migration: propagate suspended runstate Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 07/12] migration: preserve suspended for snapshot Steve Sistare
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3ce04b2..8124811 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -604,7 +604,7 @@ static void process_incoming_migration_bh(void *opaque)
      */
     if (!migrate_late_block_activate() ||
          (autostart && (!global_state_received() ||
-            global_state_get_runstate() == RUN_STATE_RUNNING))) {
+            runstate_is_live(global_state_get_runstate())))) {
         /* Make sure all file formats throw away their mutable metadata.
          * If we get an error here, just don't restart the VM yet. */
         bdrv_activate_all(&local_err);
@@ -628,7 +628,7 @@ static void process_incoming_migration_bh(void *opaque)
     dirty_bitmap_mig_before_vm_start();
 
     if (!global_state_received() ||
-        global_state_get_runstate() == RUN_STATE_RUNNING) {
+        runstate_is_live(global_state_get_runstate())) {
         if (autostart) {
             vm_start();
         } else {
@@ -2416,7 +2416,6 @@ static int postcopy_start(MigrationState *ms, Error **errp)
 
     migration_downtime_start(ms);
 
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     global_state_store();
     ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
     if (ret < 0) {
@@ -2615,7 +2614,6 @@ static int migration_completion_precopy(MigrationState *s,
 
     qemu_mutex_lock_iothread();
     migration_downtime_start(s);
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
     s->vm_old_state = runstate_get();
     global_state_store();
@@ -3136,7 +3134,7 @@ static void migration_iteration_finish(MigrationState *s)
     case MIGRATION_STATUS_FAILED:
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_CANCELLING:
-        if (s->vm_old_state == RUN_STATE_RUNNING) {
+        if (runstate_is_live(s->vm_old_state)) {
             if (!runstate_check(RUN_STATE_SHUTDOWN)) {
                 vm_start();
             }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 07/12] migration: preserve suspended for snapshot
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (5 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 06/12] migration: preserve " Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 08/12] migration: preserve suspended for bg_migration Steve Sistare
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 include/migration/snapshot.h   |  7 +++++++
 migration/migration-hmp-cmds.c |  8 +++++---
 migration/savevm.c             | 23 +++++++++++++----------
 system/runstate.c              |  5 +++++
 system/vl.c                    |  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b..9e4dcaa 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -16,6 +16,7 @@
 #define QEMU_MIGRATION_SNAPSHOT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-run-state.h"
 
 /**
  * save_snapshot: Save an internal snapshot.
@@ -61,4 +62,10 @@ bool delete_snapshot(const char *name,
                     bool has_devices, strList *devices,
                     Error **errp);
 
+/**
+ * load_snapshot_resume: Restore runstate after loading snapshot.
+ * @state: state to restore
+ */
+void load_snapshot_resume(RunState state);
+
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832..c8d70bc 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-    int saved_vm_running  = runstate_is_running();
+    RunState saved_state = runstate_get();
+
     const char *name = qdict_get_str(qdict, "name");
     Error *err = NULL;
 
     vm_stop(RUN_STATE_RESTORE_VM);
 
-    if (load_snapshot(name, NULL, false, NULL, &err) && saved_vm_running) {
-        vm_start();
+    if (load_snapshot(name, NULL, false, NULL, &err)) {
+        load_snapshot_resume(saved_state);
     }
+
     hmp_handle_error(mon, err);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503..26866f0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
     QEMUSnapshotInfo sn1, *sn = &sn1;
     int ret = -1, ret2;
     QEMUFile *f;
-    int saved_vm_running;
+    RunState saved_state = runstate_get();
     uint64_t vm_state_size;
     g_autoptr(GDateTime) now = g_date_time_new_now_local();
     AioContext *aio_context;
@@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
     }
     aio_context = bdrv_get_aio_context(bs);
 
-    saved_vm_running = runstate_is_running();
-
     global_state_store();
     vm_stop(RUN_STATE_SAVE_VM);
 
@@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
 
     bdrv_drain_all_end();
 
-    if (saved_vm_running) {
-        vm_start();
-    }
+    vm_resume(saved_state);
     return ret == 0;
 }
 
@@ -3339,6 +3335,14 @@ err_drain:
     return false;
 }
 
+void load_snapshot_resume(RunState state)
+{
+    vm_resume(state);
+    if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) {
+        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, &error_abort);
+    }
+}
+
 bool delete_snapshot(const char *name, bool has_devices,
                      strList *devices, Error **errp)
 {
@@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque)
 {
     Job *job = opaque;
     SnapshotJob *s = container_of(job, SnapshotJob, common);
-    int orig_vm_running;
+    RunState orig_state = runstate_get();
 
     job_progress_set_remaining(&s->common, 1);
 
-    orig_vm_running = runstate_is_running();
     vm_stop(RUN_STATE_RESTORE_VM);
 
     s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-    if (s->ret && orig_vm_running) {
-        vm_start();
+    if (s->ret) {
+        load_snapshot_resume(orig_state);
     }
 
     job_progress_update(&s->common, 1);
diff --git a/system/runstate.c b/system/runstate.c
index e2fa204..ca9eb54 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -77,6 +77,7 @@ typedef struct {
 
 static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
 
     { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
     { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
@@ -132,6 +133,7 @@ static const RunStateTransition runstate_transitions_def[] = {
 
     { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_RESTORE_VM, RUN_STATE_SUSPENDED },
 
     { RUN_STATE_COLO, RUN_STATE_RUNNING },
     { RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
@@ -150,6 +152,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RUNNING, RUN_STATE_COLO},
 
     { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
+    { RUN_STATE_SAVE_VM, RUN_STATE_SUSPENDED },
 
     { RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
     { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
@@ -163,6 +166,8 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
     { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
     { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
+    { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
+    { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
diff --git a/system/vl.c b/system/vl.c
index 2bcd9ef..2fbbbba 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2706,7 +2706,9 @@ void qmp_x_exit_preconfig(Error **errp)
     qemu_machine_creation_done();
 
     if (loadvm) {
+        RunState state = autostart ? RUN_STATE_RUNNING : runstate_get();
         load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
+        load_snapshot_resume(state);
     }
     if (replay_mode != REPLAY_MODE_NONE) {
         replay_vmstate_init();
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 08/12] migration: preserve suspended for bg_migration
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (6 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 07/12] migration: preserve suspended for snapshot Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 09/12] tests/qtest: migration events Steve Sistare
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 7 +------
 system/runstate.c     | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8124811..2cc7e2a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3390,7 +3390,7 @@ static void bg_migration_vm_start_bh(void *opaque)
     qemu_bh_delete(s->vm_start_bh);
     s->vm_start_bh = NULL;
 
-    vm_start();
+    vm_resume(s->vm_old_state);
     migration_downtime_end(s);
 }
 
@@ -3462,11 +3462,6 @@ static void *bg_migration_thread(void *opaque)
 
     qemu_mutex_lock_iothread();
 
-    /*
-     * If VM is currently in suspended state, then, to make a valid runstate
-     * transition in vm_stop_force_state() we need to wakeup it up.
-     */
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
     s->vm_old_state = runstate_get();
 
     global_state_store();
diff --git a/system/runstate.c b/system/runstate.c
index ca9eb54..621a023 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
     { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
     { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
+    { RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN },
 
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 09/12] tests/qtest: migration events
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (7 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 08/12] migration: preserve suspended for bg_migration Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 10/12] tests/qtest: option to suspend during migration Steve Sistare
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 24 ++++-------
 tests/qtest/migration-helpers.h |  9 ++--
 tests/qtest/migration-test.c    | 91 +++++++++++++++++++----------------------
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 24fb7b3..fd3b94e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,26 +24,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-                            QDict *event, void *opaque)
-{
-    bool *seen = opaque;
-
-    if (g_str_equal(name, "STOP")) {
-        *seen = true;
-        return true;
-    }
-
-    return false;
-}
-
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+bool migrate_watch_for_events(QTestState *who, const char *name,
                               QDict *event, void *opaque)
 {
-    bool *seen = opaque;
+    QTestMigrationState *state = opaque;
 
-    if (g_str_equal(name, "RESUME")) {
-        *seen = true;
+    if (g_str_equal(name, "STOP")) {
+        state->stop_seen = true;
+        return true;
+    } else if (g_str_equal(name, "RESUME")) {
+        state->resume_seen = true;
         return true;
     }
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e31dc85..3d32699 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,9 +15,12 @@
 
 #include "libqtest.h"
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-                            QDict *event, void *opaque);
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+typedef struct QTestMigrationState {
+    bool stop_seen;
+    bool resume_seen;
+} QTestMigrationState;
+
+bool migrate_watch_for_events(QTestState *who, const char *name,
                               QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0fbaa6a..05c0740 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,8 +43,8 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
-static bool got_src_stop;
-static bool got_dst_resume;
+static QTestMigrationState src_state;
+static QTestMigrationState dst_state;
 
 /*
  * An initial 3 MB offset is used as that corresponds
@@ -230,6 +230,20 @@ static void wait_for_serial(const char *side)
     } while (true);
 }
 
+static void wait_for_stop(QTestState *who, QTestMigrationState *state)
+{
+    if (!state->stop_seen) {
+        qtest_qmp_eventwait(who, "STOP");
+    }
+}
+
+static void wait_for_resume(QTestState *who, QTestMigrationState *state)
+{
+    if (!state->resume_seen) {
+        qtest_qmp_eventwait(who, "RESUME");
+    }
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who)
     qobject_unref(rsp_return);
 }
 
+/*
+ * Wait for two changes in the migration pass count, but bail if we stop.
+ */
 static void wait_for_migration_pass(QTestState *who)
 {
-    uint64_t initial_pass = get_migration_pass(who);
-    uint64_t pass;
+    uint64_t pass, prev_pass = 0, changes = 0;
 
-    /* Wait for the 1st sync */
-    while (!got_src_stop && !initial_pass) {
-        usleep(1000);
-        initial_pass = get_migration_pass(who);
-    }
-
-    do {
+    while (changes < 2 && !src_state.stop_seen) {
         usleep(1000);
         pass = get_migration_pass(who);
-    } while (pass == initial_pass && !got_src_stop);
+        changes += (pass != prev_pass);
+        prev_pass = pass;
+    }
 }
 
 static void check_guests_ram(QTestState *who)
@@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, QTestState *to)
 {
     qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
-    if (!got_src_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
-
+    wait_for_stop(from, &src_state);
     qtest_qmp_eventwait(to, "RESUME");
 }
 
@@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
         }
     }
 
-    got_src_stop = false;
-    got_dst_resume = false;
+    dst_state = (QTestMigrationState) { };
+    src_state = (QTestMigrationState) { };
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         memory_size = "150M";
 
@@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
     if (!args->only_target) {
         *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
         qtest_qmp_set_event_callback(*from,
-                                     migrate_watch_for_stop,
-                                     &got_src_stop);
+                                     migrate_watch_for_events,
+                                     &src_state);
     }
 
     cmd_target = g_strdup_printf("-accel kvm%s -accel tcg "
@@ -869,8 +878,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
                                  ignore_stderr);
     *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target);
     qtest_qmp_set_event_callback(*to,
-                                 migrate_watch_for_resume,
-                                 &got_dst_resume);
+                                 migrate_watch_for_events,
+                                 &dst_state);
 
     /*
      * Remove shmem file immediately to avoid memory leak in test failed case.
@@ -1717,9 +1726,7 @@ static void test_precopy_common(MigrateCommon *args)
          */
         if (args->result == MIG_TEST_SUCCEED) {
             qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
-            if (!got_src_stop) {
-                qtest_qmp_eventwait(from, "STOP");
-            }
+            wait_for_stop(from, &src_state);
             migrate_ensure_converge(from);
         }
     }
@@ -1765,9 +1772,8 @@ static void test_precopy_common(MigrateCommon *args)
              */
             wait_for_migration_complete(from);
 
-            if (!got_src_stop) {
-                qtest_qmp_eventwait(from, "STOP");
-            }
+            wait_for_stop(from, &src_state);
+
         } else {
             wait_for_migration_complete(from);
             /*
@@ -1780,9 +1786,7 @@ static void test_precopy_common(MigrateCommon *args)
             qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
         }
 
-        if (!got_dst_resume) {
-            qtest_qmp_eventwait(to, "RESUME");
-        }
+        wait_for_resume(to, &dst_state);
 
         wait_for_serial("dest_serial");
     }
@@ -1821,9 +1825,7 @@ static void test_file_common(MigrateCommon *args, bool stop_src)
 
     if (stop_src) {
         qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
-        if (!got_src_stop) {
-            qtest_qmp_eventwait(from, "STOP");
-        }
+        wait_for_stop(from, &src_state);
     }
 
     if (args->result == MIG_TEST_QMP_ERROR) {
@@ -1844,10 +1846,7 @@ static void test_file_common(MigrateCommon *args, bool stop_src)
     if (stop_src) {
         qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
     }
-
-    if (!got_dst_resume) {
-        qtest_qmp_eventwait(to, "RESUME");
-    }
+    wait_for_resume(to, &dst_state);
 
     wait_for_serial("dest_serial");
 
@@ -1966,9 +1965,7 @@ static void test_ignore_shared(void)
 
     migrate_wait_for_dirty_mem(from, to);
 
-    if (!got_src_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
+    wait_for_stop(from, &src_state);
 
     qtest_qmp_eventwait(to, "RESUME");
 
@@ -2503,7 +2500,7 @@ static void test_migrate_auto_converge(void)
             break;
         }
         usleep(20);
-        g_assert_false(got_src_stop);
+        g_assert_false(src_state.stop_seen);
     } while (true);
     /* The first percentage of throttling should be at least init_pct */
     g_assert_cmpint(percentage, >=, init_pct);
@@ -2842,9 +2839,7 @@ static void test_multifd_tcp_cancel(void)
 
     migrate_ensure_converge(from);
 
-    if (!got_src_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
+    wait_for_stop(from, &src_state);
     qtest_qmp_eventwait(to2, "RESUME");
 
     wait_for_serial("dest_serial");
@@ -3177,7 +3172,7 @@ static void test_migrate_dirty_limit(void)
         throttle_us_per_full =
         read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
         usleep(100);
-        g_assert_false(got_src_stop);
+        g_assert_false(src_state.stop_seen);
     }
 
     /* Now cancel migrate and wait for dirty limit throttle switch off */
@@ -3189,7 +3184,7 @@ static void test_migrate_dirty_limit(void)
         throttle_us_per_full =
         read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
         usleep(100);
-        g_assert_false(got_src_stop);
+        g_assert_false(src_state.stop_seen);
     } while (throttle_us_per_full != 0 && --max_try_count);
 
     /* Assert dirty limit is not in service */
@@ -3218,7 +3213,7 @@ static void test_migrate_dirty_limit(void)
         throttle_us_per_full =
         read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
         usleep(100);
-        g_assert_false(got_src_stop);
+        g_assert_false(src_state.stop_seen);
     }
 
     /*
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 10/12] tests/qtest: option to suspend during migration
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (8 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 09/12] tests/qtest: migration events Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-06 17:23 ` [PATCH V7 11/12] tests/qtest: precopy migration with suspend Steve Sistare
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.  Generate the bootblock for each test, because suspend_me
may differ for each.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
 tests/migration/i386/Makefile        |  5 ++--
 tests/migration/i386/a-b-bootblock.S | 50 +++++++++++++++++++++++++++++++++---
 tests/migration/i386/a-b-bootblock.h | 26 +++++++++++++------
 tests/qtest/migration-test.c         | 12 ++++++---
 4 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
 	echo "$$__note" > header.tmp
 	xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+	nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
 	mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
 	$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-	$(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+	$(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
 	@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S b/tests/migration/i386/a-b-bootblock.S
index 6bb9999..6f39eb6 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,23 @@
 #
 # Author: dgilbert@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE         0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE        0x600
+#define PM1A_CNT_OFFSET     4
+
+#define ACPI_SCI_ENABLE     0x0001
+#define ACPI_SLEEP_TYPE     0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDR            X86_TEST_MEM_START
+#define HIGH_ADDR           X86_TEST_MEM_END
+
+/* Save the suspended status at an address that is not written in the loop. */
+#define suspended           (X86_TEST_MEM_START + 4)
 
 .code16
 .org 0x7c00
@@ -35,8 +52,8 @@ start:             # at 0x7c00 ?
         mov %eax,%ds
 
 # Start from 1MB
-.set TEST_MEM_START, (1024*1024)
-.set TEST_MEM_END, (100*1024*1024)
+.set TEST_MEM_START, X86_TEST_MEM_START
+.set TEST_MEM_END, X86_TEST_MEM_END
 
         mov $65,%ax
         mov $0x3f8,%dx
@@ -69,7 +86,30 @@ innerloop:
         mov $0x3f8,%dx
         outb %al,%dx
 
-        jmp mainloop
+        # should this test suspend?
+        mov (suspend_me),%eax
+        cmp $0,%eax
+        je mainloop
+
+        # are we waking after suspend?  do not suspend again.
+        mov $suspended,%eax
+        mov (%eax),%eax
+        cmp $1,%eax
+        je mainloop
+
+        # enable acpi
+        mov $ACPI_ENABLE,%al
+        outb %al,$ACPI_PORT_SMI_CMD
+
+        # suspend to ram
+        mov $suspended,%eax
+        movl $1,(%eax)
+        mov $SLEEP,%ax
+        mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+        outw %ax,%dx
+        # not reached.  The wakeup causes reset and restart at 0x7c00, and we
+        # do not save and restore registers as a real kernel would do.
+
 
         # GDT magic from old (GPLv2)  Grub startup.S
         .p2align        2       /* force 4-byte alignment */
@@ -95,6 +135,10 @@ gdtdesc:
         .word   0x27                    /* limit */
         .long   gdt                     /* addr */
 
+        /* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+        .int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
         .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h b/tests/migration/i386/a-b-bootblock.h
index 5b52391..c83f871 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,7 +4,7 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
@@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = {
   0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
   0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
   0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
-  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
-  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x74, 0x7c,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0xf8, 0x03, 0xee, 0xa1, 0xbe, 0x7c, 0x00, 0x00, 0x83, 0xf8, 0x00, 0x74,
+  0xd3, 0xb8, 0x04, 0x00, 0x10, 0x00, 0x8b, 0x00, 0x83, 0xf8, 0x01, 0x74,
+  0xc7, 0xb0, 0xf1, 0xe6, 0xb2, 0xb8, 0x04, 0x00, 0x10, 0x00, 0xc7, 0x00,
+  0x01, 0x00, 0x00, 0x00, 0x66, 0xb8, 0x01, 0x24, 0x66, 0xba, 0x04, 0x06,
+  0x66, 0xef, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00,
+  0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0xa0, 0x7c, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
@@ -49,3 +49,13 @@ unsigned char x86_bootsect[] = {
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0xaa
 };
 
+#define SYM_do_zero 0x00007c3d
+#define SYM_gdt 0x00007ca0
+#define SYM_gdtdesc 0x00007cb8
+#define SYM_innerloop 0x00007c51
+#define SYM_mainloop 0x00007c4c
+#define SYM_pre_zero 0x00007c38
+#define SYM_start 0x00007c00
+#define SYM_suspend_me 0x00007cbe
+#define SYM_TEST_MEM_END 0x06400000
+#define SYM_TEST_MEM_START 0x00100000
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 05c0740..e10d5a4 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -133,7 +133,7 @@ static char *bootpath;
 #include "tests/migration/aarch64/a-b-kernel.h"
 #include "tests/migration/s390x/a-b-bios.h"
 
-static void bootfile_create(char *dir)
+static void bootfile_create(char *dir, bool suspend_me)
 {
     const char *arch = qtest_get_arch();
     unsigned char *content;
@@ -143,6 +143,7 @@ static void bootfile_create(char *dir)
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         /* the assembled x86 boot sector should be exactly one sector large */
         g_assert(sizeof(x86_bootsect) == 512);
+        x86_bootsect[SYM_suspend_me - SYM_start] = suspend_me;
         content = x86_bootsect;
         len = sizeof(x86_bootsect);
     } else if (g_str_equal(arch, "s390x")) {
@@ -646,6 +647,8 @@ typedef struct {
     bool use_dirty_ring;
     const char *opts_source;
     const char *opts_target;
+    /* suspend the src before migrating to dest. */
+    bool suspend_me;
 } MigrateStart;
 
 /*
@@ -767,6 +770,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
 
     dst_state = (QTestMigrationState) { };
     src_state = (QTestMigrationState) { };
+    bootfile_create(tmpfs, args->suspend_me);
+
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         memory_size = "150M";
 
@@ -2980,7 +2985,9 @@ static int64_t get_limit_rate(QTestState *who)
 static QTestState *dirtylimit_start_vm(void)
 {
     QTestState *vm = NULL;
-    g_autofree gchar *
+    g_autofree gchar *cmd = NULL;
+
+    bootfile_create(tmpfs, false);
     cmd = g_strdup_printf("-accel kvm,dirty-ring-size=4096 "
                           "-name dirtylimit-test,debug-threads=on "
                           "-m 150M -smp 1 "
@@ -3329,7 +3336,6 @@ int main(int argc, char **argv)
                        g_get_tmp_dir(), err->message);
     }
     g_assert(tmpfs);
-    bootfile_create(tmpfs);
 
     module_call_init(MODULE_INIT_QOM);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 11/12] tests/qtest: precopy migration with suspend
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (9 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 10/12] tests/qtest: option to suspend during migration Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-08 16:37   ` Fabiano Rosas
  2023-12-11  6:54   ` Peter Xu
  2023-12-06 17:23 ` [PATCH V7 12/12] tests/qtest: postcopy " Steve Sistare
  2023-12-06 17:30 ` [PATCH V7 00/12] fix migration of suspended runstate Steven Sistare
  12 siblings, 2 replies; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Add a test case to verify that the suspended state is handled correctly
during live migration precopy.  The test suspends the src, migrates, then
wakes the dest.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 tests/qtest/migration-helpers.c |  3 ++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c    | 62 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index fd3b94e..37e8e81 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char *name,
     if (g_str_equal(name, "STOP")) {
         state->stop_seen = true;
         return true;
+    } else if (g_str_equal(name, "SUSPEND")) {
+        state->suspend_seen = true;
+        return true;
     } else if (g_str_equal(name, "RESUME")) {
         state->resume_seen = true;
         return true;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3d32699..b478549 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,6 +18,8 @@
 typedef struct QTestMigrationState {
     bool stop_seen;
     bool resume_seen;
+    bool suspend_seen;
+    bool suspend_me;
 } QTestMigrationState;
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e10d5a4..f57a978 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -178,7 +178,7 @@ static void bootfile_delete(void)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, QTestMigrationState *state)
     }
 }
 
+static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
+{
+    if (state->suspend_me && !state->suspend_seen) {
+        qtest_qmp_eventwait(who, "SUSPEND");
+    }
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
 {
     uint64_t pass, prev_pass = 0, changes = 0;
 
-    while (changes < 2 && !src_state.stop_seen) {
+    while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
         usleep(1000);
         pass = get_migration_pass(who);
         changes += (pass != prev_pass);
@@ -584,6 +591,12 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
         usleep(1000 * 10);
     } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
 
+
+    /* If suspended, src only iterates once, and watch_byte may never change */
+    if (src_state.suspend_me) {
+        return;
+    }
+
     /*
      * Now ensure that already transferred bytes are
      * dirty again from the guest workload. Note the
@@ -771,6 +784,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
     dst_state = (QTestMigrationState) { };
     src_state = (QTestMigrationState) { };
     bootfile_create(tmpfs, args->suspend_me);
+    src_state.suspend_me = args->suspend_me;
 
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         memory_size = "150M";
@@ -1717,6 +1731,7 @@ static void test_precopy_common(MigrateCommon *args)
     /* Wait for the first serial output from the source */
     if (args->result == MIG_TEST_SUCCEED) {
         wait_for_serial("src_serial");
+        wait_for_suspend(from, &src_state);
     }
 
     if (args->live) {
@@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args)
 
         wait_for_resume(to, &dst_state);
 
+        if (args->start.suspend_me) {
+            /* wakeup succeeds only if guest is suspended */
+            qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+        }
+
         wait_for_serial("dest_serial");
     }
 
@@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void)
     test_precopy_common(&args);
 }
 
+static void test_precopy_unix_suspend_live(void)
+{
+    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+        /*
+         * despite being live, the test is fast because the src
+         * suspends immediately.
+         */
+        .live = true,
+        .start.suspend_me = true,
+    };
+
+    test_precopy_common(&args);
+}
+
+static void test_precopy_unix_suspend_notlive(void)
+{
+    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    MigrateCommon args = {
+        .listen_uri = uri,
+        .connect_uri = uri,
+        .start.suspend_me = true,
+    };
+
+    test_precopy_common(&args);
+}
 
 static void test_precopy_unix_dirty_ring(void)
 {
@@ -3279,7 +3327,7 @@ static bool kvm_dirty_ring_supported(void)
 int main(int argc, char **argv)
 {
     bool has_kvm, has_tcg;
-    bool has_uffd;
+    bool has_uffd, is_x86;
     const char *arch;
     g_autoptr(GError) err = NULL;
     const char *qemu_src = getenv(QEMU_ENV_SRC);
@@ -3309,6 +3357,7 @@ int main(int argc, char **argv)
 
     has_uffd = ufd_version_check();
     arch = qtest_get_arch();
+    is_x86 = !strcmp(arch, "i386") || !strcmp(arch, "x86_64");
 
     /*
      * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
@@ -3339,6 +3388,13 @@ int main(int argc, char **argv)
 
     module_call_init(MODULE_INIT_QOM);
 
+    if (is_x86) {
+        qtest_add_func("/migration/precopy/unix/suspend/live",
+                       test_precopy_unix_suspend_live);
+        qtest_add_func("/migration/precopy/unix/suspend/notlive",
+                       test_precopy_unix_suspend_notlive);
+    }
+
     if (has_uffd) {
         qtest_add_func("/migration/postcopy/plain", test_postcopy);
         qtest_add_func("/migration/postcopy/recovery/plain",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V7 12/12] tests/qtest: postcopy migration with suspend
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (10 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 11/12] tests/qtest: precopy migration with suspend Steve Sistare
@ 2023-12-06 17:23 ` Steve Sistare
  2023-12-11  6:54   ` Peter Xu
  2023-12-06 17:30 ` [PATCH V7 00/12] fix migration of suspended runstate Steven Sistare
  12 siblings, 1 reply; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Add a test case to verify that the suspended state is handled correctly by
live migration postcopy.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 tests/qtest/migration-test.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f57a978..b7c094e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1347,6 +1347,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
+    wait_for_suspend(from, &src_state);
 
     g_autofree char *uri = migrate_get_socket_address(to, "socket-address");
     migrate_qmp(from, uri, "{}");
@@ -1364,6 +1365,11 @@ static void migrate_postcopy_complete(QTestState *from, QTestState *to,
 {
     wait_for_migration_complete(from);
 
+    if (args->start.suspend_me) {
+        /* wakeup succeeds only if guest is suspended */
+        qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+    }
+
     /* Make sure we get at least one "B" on destination */
     wait_for_serial("dest_serial");
 
@@ -1397,6 +1403,15 @@ static void test_postcopy(void)
     test_postcopy_common(&args);
 }
 
+static void test_postcopy_suspend(void)
+{
+    MigrateCommon args = {
+        .start.suspend_me = true,
+    };
+
+    test_postcopy_common(&args);
+}
+
 static void test_postcopy_compress(void)
 {
     MigrateCommon args = {
@@ -3412,7 +3427,10 @@ int main(int argc, char **argv)
         qtest_add_func("/migration/postcopy/recovery/double-failures",
                        test_postcopy_recovery_double_fail);
 #endif /* _WIN32 */
-
+        if (is_x86) {
+            qtest_add_func("/migration/postcopy/suspend",
+                           test_postcopy_suspend);
+        }
     }
 
     qtest_add_func("/migration/bad_dest", test_baddest);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 00/12] fix migration of suspended runstate
  2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
                   ` (11 preceding siblings ...)
  2023-12-06 17:23 ` [PATCH V7 12/12] tests/qtest: postcopy " Steve Sistare
@ 2023-12-06 17:30 ` Steven Sistare
  2023-12-11  6:56   ` Peter Xu
  12 siblings, 1 reply; 32+ messages in thread
From: Steven Sistare @ 2023-12-06 17:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

FYI, these patches still need RB:
  migration: propagate suspended runstate
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

This has RB, but the interaction between vm_start and vm_prepare_start
changed, so needs another look.
    cpus: stop vm in suspended runstate

- Steve

On 12/6/2023 12:23 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>       cpus: refactor vm_stop
>       cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>       cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
>     vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
>     migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
>     "pass runstate to vm_prepare_start" to
>     "check running not RUN_STATE_RUNNING"
>   * drop patches:
>       tests/qtest: bootfile per vm
>       tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c          |   2 +-
>  hw/usb/hcd-ehci.c                    |   2 +-
>  hw/usb/redirect.c                    |   2 +-
>  hw/xen/xen-hvm-common.c              |   2 +-
>  include/migration/snapshot.h         |   7 ++
>  include/sysemu/runstate.h            |  16 ++++
>  migration/global_state.c             |  35 ++++++-
>  migration/migration-hmp-cmds.c       |   8 +-
>  migration/migration.c                |  15 +--
>  migration/savevm.c                   |  23 +++--
>  qapi/misc.json                       |  10 +-
>  system/cpus.c                        |  47 +++++++--
>  system/runstate.c                    |   9 ++
>  system/vl.c                          |   2 +
>  tests/migration/i386/Makefile        |   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +++++++++-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c      |  27 ++----
>  tests/qtest/migration-helpers.h      |  11 ++-
>  tests/qtest/migration-test.c         | 181 +++++++++++++++++++++++++----------
>  20 files changed, 354 insertions(+), 126 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 17:23 ` [PATCH V7 02/12] cpus: stop vm in suspended runstate Steve Sistare
@ 2023-12-06 18:45   ` Philippe Mathieu-Daudé
  2023-12-06 19:19     ` Steven Sistare
  2023-12-11 13:37   ` Steven Sistare
  1 sibling, 1 reply; 32+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 18:45 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

Hi Steve,

On 6/12/23 18:23, Steve Sistare wrote:
> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
> have been paused, but the cpu clock still runs, and runstate notifiers for
> the transition to stopped have not been called.  This causes problems for
> live migration.  Stale cpu timers_state is saved to the migration stream,
> causing time errors in the guest when it wakes from suspend, and state that
> would have been modified by runstate notifiers is wrong.
> 
> Modify vm_stop to completely stop the vm if the current state is suspended,
> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
> Modify vm_start to restore the suspended state.
> 
> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
> cont commands.  For example:
> 
>      (qemu) info status
>      VM status: paused (suspended)
> 
>      (qemu) stop
>      (qemu) info status
>      VM status: paused
> 
>      (qemu) system_wakeup
>      Error: Unable to wake up: guest is not in suspended state
> 
>      (qemu) cont
>      (qemu) info status
>      VM status: paused (suspended)
> 
>      (qemu) system_wakeup
>      (qemu) info status
>      VM status: running
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>   include/sysemu/runstate.h |  5 +++++
>   qapi/misc.json            | 10 ++++++++--
>   system/cpus.c             | 23 +++++++++++++++--------
>   system/runstate.c         |  3 +++
>   4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 88a67e2..867e53c 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause)
>       return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>   }
>   
> +static inline bool runstate_is_live(RunState state)
> +{
> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
> +}

Not being familiar with (live) migration, from a generic vCPU PoV
I don't get what "runstate_is_live" means. Can we add a comment
explaining what this helper is for?

Since this is a migration particular case, maybe we can be verbose
in vm_resume() and keep runstate_is_live() -- eventually undocumented
-- in migration/migration.c.

  void vm_resume(RunState state)
  {
      switch (state) {
      case RUN_STATE_RUNNING:
      case RUN_STATE_SUSPENDED:
          vm_start();
          break;
      default:
          runstate_set(state);
      }
  }

Regards,

Phil.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 18:45   ` Philippe Mathieu-Daudé
@ 2023-12-06 19:19     ` Steven Sistare
  2023-12-06 20:48       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Sistare @ 2023-12-06 19:19 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:
> Hi Steve,
> 
> On 6/12/23 18:23, Steve Sistare wrote:
>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>> have been paused, but the cpu clock still runs, and runstate notifiers for
>> the transition to stopped have not been called.  This causes problems for
>> live migration.  Stale cpu timers_state is saved to the migration stream,
>> causing time errors in the guest when it wakes from suspend, and state that
>> would have been modified by runstate notifiers is wrong.
>>
>> Modify vm_stop to completely stop the vm if the current state is suspended,
>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>> Modify vm_start to restore the suspended state.
>>
>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>> cont commands.  For example:
>>
>>      (qemu) info status
>>      VM status: paused (suspended)
>>
>>      (qemu) stop
>>      (qemu) info status
>>      VM status: paused
>>
>>      (qemu) system_wakeup
>>      Error: Unable to wake up: guest is not in suspended state
>>
>>      (qemu) cont
>>      (qemu) info status
>>      VM status: paused (suspended)
>>
>>      (qemu) system_wakeup
>>      (qemu) info status
>>      VM status: running
>>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   include/sysemu/runstate.h |  5 +++++
>>   qapi/misc.json            | 10 ++++++++--
>>   system/cpus.c             | 23 +++++++++++++++--------
>>   system/runstate.c         |  3 +++
>>   4 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>> index 88a67e2..867e53c 100644
>> --- a/include/sysemu/runstate.h
>> +++ b/include/sysemu/runstate.h
>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause)
>>       return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>   }
>>   +static inline bool runstate_is_live(RunState state)
>> +{
>> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>> +}
> 
> Not being familiar with (live) migration, from a generic vCPU PoV
> I don't get what "runstate_is_live" means. Can we add a comment
> explaining what this helper is for?

Sure.  "live" means the cpu clock is ticking, and the runstate notifiers think
we are running.  It is everything that is enabled in vm_start except for starting
the vcpus.

> Since this is a migration particular case, maybe we can be verbose
> in vm_resume() and keep runstate_is_live() -- eventually undocumented
> -- in migration/migration.c.

runstate_is_live is about cpu and vm state, not migration state (the "live" is not 
live migration), and is used in multiple places in cpus code and elsewhere, so I would 
like to keep it in runstate.h.  It has a specific meaning, and it is useful to search 
for it to see who handles "liveness", and distinguish it from code that checks the
running and suspended states for other reasons.

- Steve

>  void vm_resume(RunState state)
>  {
>      switch (state) {
>      case RUN_STATE_RUNNING:
>      case RUN_STATE_SUSPENDED:
>          vm_start();
>          break;
>      default:
>          runstate_set(state);
>      }
>  }



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 19:19     ` Steven Sistare
@ 2023-12-06 20:48       ` Philippe Mathieu-Daudé
  2023-12-06 21:09         ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 32+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 20:48 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On 6/12/23 20:19, Steven Sistare wrote:
> On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:
>> Hi Steve,
>>
>> On 6/12/23 18:23, Steve Sistare wrote:
>>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>>> have been paused, but the cpu clock still runs, and runstate notifiers for
>>> the transition to stopped have not been called.  This causes problems for
>>> live migration.  Stale cpu timers_state is saved to the migration stream,
>>> causing time errors in the guest when it wakes from suspend, and state that
>>> would have been modified by runstate notifiers is wrong.
>>>
>>> Modify vm_stop to completely stop the vm if the current state is suspended,
>>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>>> Modify vm_start to restore the suspended state.
>>>
>>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>>> cont commands.  For example:
>>>
>>>       (qemu) info status
>>>       VM status: paused (suspended)
>>>
>>>       (qemu) stop
>>>       (qemu) info status
>>>       VM status: paused
>>>
>>>       (qemu) system_wakeup
>>>       Error: Unable to wake up: guest is not in suspended state
>>>
>>>       (qemu) cont
>>>       (qemu) info status
>>>       VM status: paused (suspended)
>>>
>>>       (qemu) system_wakeup
>>>       (qemu) info status
>>>       VM status: running
>>>
>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>    include/sysemu/runstate.h |  5 +++++
>>>    qapi/misc.json            | 10 ++++++++--
>>>    system/cpus.c             | 23 +++++++++++++++--------
>>>    system/runstate.c         |  3 +++
>>>    4 files changed, 31 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>>> index 88a67e2..867e53c 100644
>>> --- a/include/sysemu/runstate.h
>>> +++ b/include/sysemu/runstate.h
>>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause)
>>>        return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>>    }
>>>    +static inline bool runstate_is_live(RunState state)
>>> +{
>>> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>>> +}
>>
>> Not being familiar with (live) migration, from a generic vCPU PoV
>> I don't get what "runstate_is_live" means. Can we add a comment
>> explaining what this helper is for?
> 
> Sure.  "live" means the cpu clock is ticking, and the runstate notifiers think
> we are running.  It is everything that is enabled in vm_start except for starting
> the vcpus.
> 
>> Since this is a migration particular case, maybe we can be verbose
>> in vm_resume() and keep runstate_is_live() -- eventually undocumented
>> -- in migration/migration.c.
> 
> runstate_is_live is about cpu and vm state, not migration state (the "live" is not
> live migration), and is used in multiple places in cpus code and elsewhere, so I would
> like to keep it in runstate.h.  It has a specific meaning, and it is useful to search
> for it to see who handles "liveness", and distinguish it from code that checks the
> running and suspended states for other reasons.

OK then, no objection, but please add a comment describing.

Thanks,

Phil.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 20:48       ` Philippe Mathieu-Daudé
@ 2023-12-06 21:09         ` Philippe Mathieu-Daudé
  2023-12-11  6:25           ` Peter Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-12-06 21:09 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On 6/12/23 21:48, Philippe Mathieu-Daudé wrote:
> On 6/12/23 20:19, Steven Sistare wrote:
>> On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:
>>> Hi Steve,
>>>
>>> On 6/12/23 18:23, Steve Sistare wrote:
>>>> Currently, a vm in the suspended state is not completely stopped.  
>>>> The VCPUs
>>>> have been paused, but the cpu clock still runs, and runstate 
>>>> notifiers for
>>>> the transition to stopped have not been called.  This causes 
>>>> problems for
>>>> live migration.  Stale cpu timers_state is saved to the migration 
>>>> stream,
>>>> causing time errors in the guest when it wakes from suspend, and 
>>>> state that
>>>> would have been modified by runstate notifiers is wrong.
>>>>
>>>> Modify vm_stop to completely stop the vm if the current state is 
>>>> suspended,
>>>> transition to RUN_STATE_PAUSED, and remember that the machine was 
>>>> suspended.
>>>> Modify vm_start to restore the suspended state.
>>>>
>>>> This affects all callers of vm_stop and vm_start, notably, the qapi 
>>>> stop and
>>>> cont commands.  For example:
>>>>
>>>>       (qemu) info status
>>>>       VM status: paused (suspended)
>>>>
>>>>       (qemu) stop
>>>>       (qemu) info status
>>>>       VM status: paused
>>>>
>>>>       (qemu) system_wakeup
>>>>       Error: Unable to wake up: guest is not in suspended state
>>>>
>>>>       (qemu) cont
>>>>       (qemu) info status
>>>>       VM status: paused (suspended)
>>>>
>>>>       (qemu) system_wakeup
>>>>       (qemu) info status
>>>>       VM status: running
>>>>
>>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>> ---
>>>>    include/sysemu/runstate.h |  5 +++++
>>>>    qapi/misc.json            | 10 ++++++++--
>>>>    system/cpus.c             | 23 +++++++++++++++--------
>>>>    system/runstate.c         |  3 +++
>>>>    4 files changed, 31 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>>>> index 88a67e2..867e53c 100644
>>>> --- a/include/sysemu/runstate.h
>>>> +++ b/include/sysemu/runstate.h
>>>> @@ -40,6 +40,11 @@ static inline bool 
>>>> shutdown_caused_by_guest(ShutdownCause cause)
>>>>        return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>>>    }
>>>>    +static inline bool runstate_is_live(RunState state)
>>>> +{
>>>> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>>>> +}
>>>
>>> Not being familiar with (live) migration, from a generic vCPU PoV
>>> I don't get what "runstate_is_live" means. Can we add a comment
>>> explaining what this helper is for?
>>
>> Sure.  "live" means the cpu clock is ticking, and the runstate 
>> notifiers think
>> we are running.  It is everything that is enabled in vm_start except 
>> for starting
>> the vcpus.

What about runstate_is_vcpu_clock_ticking() then?

>>> Since this is a migration particular case, maybe we can be verbose
>>> in vm_resume() and keep runstate_is_live() -- eventually undocumented
>>> -- in migration/migration.c.
>>
>> runstate_is_live is about cpu and vm state, not migration state (the 
>> "live" is not
>> live migration), and is used in multiple places in cpus code and 
>> elsewhere, so I would
>> like to keep it in runstate.h.  It has a specific meaning, and it is 
>> useful to search
>> for it to see who handles "liveness", and distinguish it from code 
>> that checks the
>> running and suspended states for other reasons.
> 
> OK then, no objection, but please add a comment describing.
> 
> Thanks,
> 
> Phil.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 05/12] migration: propagate suspended runstate
  2023-12-06 17:23 ` [PATCH V7 05/12] migration: propagate suspended runstate Steve Sistare
@ 2023-12-08 16:37   ` Fabiano Rosas
  2023-12-08 17:28     ` Steven Sistare
  2023-12-11  6:46   ` Peter Xu
  1 sibling, 1 reply; 32+ messages in thread
From: Fabiano Rosas @ 2023-12-08 16:37 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Leonardo Bras, Steve Sistare

Steve Sistare <steven.sistare@oracle.com> writes:

> If the outgoing machine was previously suspended, propagate that to the
> incoming side via global_state, so a subsequent vm_start restores the
> suspended state.  To maintain backward and forward compatibility, reclaim
> some space from the runstate member.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  migration/global_state.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/migration/global_state.c b/migration/global_state.c
> index 4e2a9d8..d4f61a1 100644
> --- a/migration/global_state.c
> +++ b/migration/global_state.c
> @@ -22,7 +22,16 @@
>  
>  typedef struct {
>      uint32_t size;
> -    uint8_t runstate[100];
> +
> +    /*
> +     * runstate was 100 bytes, zero padded, but we trimmed it to add a
> +     * few fields and maintain backwards compatibility.
> +     */
> +    uint8_t runstate[32];
> +    uint8_t has_vm_was_suspended;
> +    uint8_t vm_was_suspended;
> +    uint8_t unused[66];
> +
>      RunState state;
>      bool received;
>  } GlobalState;
> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>      assert(strlen(state_str) < sizeof(global_state.runstate));
>      strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>                state_str, '\0');
> +    global_state.has_vm_was_suspended = true;
> +    global_state.vm_was_suspended = vm_get_suspended();
> +
> +    memset(global_state.unused, 0, sizeof(global_state.unused));
>  }
>  
>  void global_state_store(void)
> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>          return true;
>      }
>  
> +    /* If the suspended state must be remembered, it is needed */
> +
> +    if (vm_get_suspended()) {
> +        return true;
> +    }
> +
>      /* If state is running or paused, it is not needed */
>  
>      if (strcmp(runstate, "running") == 0 ||
> @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id)
>      Error *local_err = NULL;
>      int r;
>      char *runstate = (char *)s->runstate;
> +    bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;

Why do you need has_vm_was_suspended? If they're always read like this,
then one flag should do, no?

>  
>      s->received = true;
>      trace_migrate_global_state_post_load(runstate);
> @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id)
>                  sizeof(s->runstate)) == sizeof(s->runstate)) {
>          /*
>           * This condition should never happen during migration, because
> -         * all runstate names are shorter than 100 bytes (the size of
> +         * all runstate names are shorter than 32 bytes (the size of
>           * s->runstate). However, a malicious stream could overflow
>           * the qapi_enum_parse() call, so we force the last character
>           * to a NUL byte.
> @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id)
>      }
>      s->state = r;
>  
> +    /*
> +     * global_state is saved on the outgoing side before forcing a stopped
> +     * state, so it may have saved state=suspended and vm_was_suspended=0.
> +     * Now we are in a paused state, and when we later call vm_start, it must
> +     * restore the suspended state, so we must set vm_was_suspended=1 here.
> +     */
> +    vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
> +
>      return 0;
>  }
>  
> @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT32(size, GlobalState),
>          VMSTATE_BUFFER(runstate, GlobalState),
> +        VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
> +        VMSTATE_UINT8(vm_was_suspended, GlobalState),
> +        VMSTATE_BUFFER(unused, GlobalState),
>          VMSTATE_END_OF_LIST()
>      },
>  };


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 11/12] tests/qtest: precopy migration with suspend
  2023-12-06 17:23 ` [PATCH V7 11/12] tests/qtest: precopy migration with suspend Steve Sistare
@ 2023-12-08 16:37   ` Fabiano Rosas
  2023-12-11  6:54   ` Peter Xu
  1 sibling, 0 replies; 32+ messages in thread
From: Fabiano Rosas @ 2023-12-08 16:37 UTC (permalink / raw)
  To: Steve Sistare, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Leonardo Bras, Steve Sistare

Steve Sistare <steven.sistare@oracle.com> writes:

> Add a test case to verify that the suspended state is handled correctly
> during live migration precopy.  The test suspends the src, migrates, then
> wakes the dest.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 05/12] migration: propagate suspended runstate
  2023-12-08 16:37   ` Fabiano Rosas
@ 2023-12-08 17:28     ` Steven Sistare
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Sistare @ 2023-12-08 17:28 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Leonardo Bras

On 12/8/2023 11:37 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state.  To maintain backward and forward compatibility, reclaim
>> some space from the runstate member.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  migration/global_state.c | 35 +++++++++++++++++++++++++++++++++--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..d4f61a1 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -22,7 +22,16 @@
>>  
>>  typedef struct {
>>      uint32_t size;
>> -    uint8_t runstate[100];
>> +
>> +    /*
>> +     * runstate was 100 bytes, zero padded, but we trimmed it to add a
>> +     * few fields and maintain backwards compatibility.
>> +     */
>> +    uint8_t runstate[32];
>> +    uint8_t has_vm_was_suspended;
>> +    uint8_t vm_was_suspended;
>> +    uint8_t unused[66];
>> +
>>      RunState state;
>>      bool received;
>>  } GlobalState;
>> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>>      assert(strlen(state_str) < sizeof(global_state.runstate));
>>      strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>>                state_str, '\0');
>> +    global_state.has_vm_was_suspended = true;
>> +    global_state.vm_was_suspended = vm_get_suspended();
>> +
>> +    memset(global_state.unused, 0, sizeof(global_state.unused));
>>  }
>>  
>>  void global_state_store(void)
>> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>>          return true;
>>      }
>>  
>> +    /* If the suspended state must be remembered, it is needed */
>> +
>> +    if (vm_get_suspended()) {
>> +        return true;
>> +    }
>> +
>>      /* If state is running or paused, it is not needed */
>>  
>>      if (strcmp(runstate, "running") == 0 ||
>> @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int version_id)
>>      Error *local_err = NULL;
>>      int r;
>>      char *runstate = (char *)s->runstate;
>> +    bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
> 
> Why do you need has_vm_was_suspended? If they're always read like this,
> then one flag should do, no?

has_vm_was_suspended=0 means a legacy qemu.  Currently the dest has no reason to care,
and we treat that as a vm_was_suspended=0 case, but I want to keep that field in
case we ever need to know.  But you are correct that this line can simply be:
    bool vm_was_suspended = s->vm_was_suspended;

- Steve
 
>>      s->received = true;
>>      trace_migrate_global_state_post_load(runstate);
>> @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int version_id)
>>                  sizeof(s->runstate)) == sizeof(s->runstate)) {
>>          /*
>>           * This condition should never happen during migration, because
>> -         * all runstate names are shorter than 100 bytes (the size of
>> +         * all runstate names are shorter than 32 bytes (the size of
>>           * s->runstate). However, a malicious stream could overflow
>>           * the qapi_enum_parse() call, so we force the last character
>>           * to a NUL byte.
>> @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int version_id)
>>      }
>>      s->state = r;
>>  
>> +    /*
>> +     * global_state is saved on the outgoing side before forcing a stopped
>> +     * state, so it may have saved state=suspended and vm_was_suspended=0.
>> +     * Now we are in a paused state, and when we later call vm_start, it must
>> +     * restore the suspended state, so we must set vm_was_suspended=1 here.
>> +     */
>> +    vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
>> +
>>      return 0;
>>  }
>>  
>> @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
>>      .fields = (VMStateField[]) {
>>          VMSTATE_UINT32(size, GlobalState),
>>          VMSTATE_BUFFER(runstate, GlobalState),
>> +        VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
>> +        VMSTATE_UINT8(vm_was_suspended, GlobalState),
>> +        VMSTATE_BUFFER(unused, GlobalState),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 21:09         ` Philippe Mathieu-Daudé
@ 2023-12-11  6:25           ` Peter Xu
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2023-12-11  6:25 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Steven Sistare, qemu-devel, Juan Quintela, Paolo Bonzini,
	Thomas Huth, Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Wed, Dec 06, 2023 at 10:09:32PM +0100, Philippe Mathieu-Daudé wrote:
> What about runstate_is_vcpu_clock_ticking() then?

The problem is vCPU clock ticks can only be one part of "VM is running"
state (no matter whether vCPUs are running or not).  I'm even thinking
whether cpu_enable_ticks() should one day be put into a vm state change
notifier instead to be even clearer that it's not special (I hope I didn't
overlook its specialty..).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 05/12] migration: propagate suspended runstate
  2023-12-06 17:23 ` [PATCH V7 05/12] migration: propagate suspended runstate Steve Sistare
  2023-12-08 16:37   ` Fabiano Rosas
@ 2023-12-11  6:46   ` Peter Xu
  2023-12-11 13:23     ` Steven Sistare
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Xu @ 2023-12-11  6:46 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Wed, Dec 06, 2023 at 09:23:30AM -0800, Steve Sistare wrote:
> If the outgoing machine was previously suspended, propagate that to the
> incoming side via global_state, so a subsequent vm_start restores the
> suspended state.  To maintain backward and forward compatibility, reclaim
> some space from the runstate member.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

One nitpick below.

> ---
>  migration/global_state.c | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/global_state.c b/migration/global_state.c
> index 4e2a9d8..d4f61a1 100644
> --- a/migration/global_state.c
> +++ b/migration/global_state.c
> @@ -22,7 +22,16 @@
>  
>  typedef struct {
>      uint32_t size;
> -    uint8_t runstate[100];
> +
> +    /*
> +     * runstate was 100 bytes, zero padded, but we trimmed it to add a
> +     * few fields and maintain backwards compatibility.
> +     */
> +    uint8_t runstate[32];
> +    uint8_t has_vm_was_suspended;
> +    uint8_t vm_was_suspended;
> +    uint8_t unused[66];
> +
>      RunState state;
>      bool received;
>  } GlobalState;
> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>      assert(strlen(state_str) < sizeof(global_state.runstate));
>      strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>                state_str, '\0');
> +    global_state.has_vm_was_suspended = true;
> +    global_state.vm_was_suspended = vm_get_suspended();
> +
> +    memset(global_state.unused, 0, sizeof(global_state.unused));
>  }
>  
>  void global_state_store(void)
> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>          return true;
>      }
>  
> +    /* If the suspended state must be remembered, it is needed */
> +
> +    if (vm_get_suspended()) {
> +        return true;
> +    }

Can we drop this section?

I felt unsafe when QEMU can overwrite the option even if user explicitly
specified store-global-state=off but we still send this..  Ideally I think
it's better if it's as simple as:

static bool global_state_needed(void *opaque)
{
    return migrate_get_current()->store_global_state;
}

I don't think we can remove the old trick due to compatibility reasons, but
maybe nice to not add new ones to make this section more unpredictable in
the migration stream?

IMHO it shouldn't matter in reality for the current use case even dropping
it, as I don't expect any non-Xen QEMU VMs migrates without having the
option turned on (store-global-state=on) after QEMU 2.4.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 11/12] tests/qtest: precopy migration with suspend
  2023-12-06 17:23 ` [PATCH V7 11/12] tests/qtest: precopy migration with suspend Steve Sistare
  2023-12-08 16:37   ` Fabiano Rosas
@ 2023-12-11  6:54   ` Peter Xu
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Xu @ 2023-12-11  6:54 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Wed, Dec 06, 2023 at 09:23:36AM -0800, Steve Sistare wrote:
> Add a test case to verify that the suspended state is handled correctly
> during live migration precopy.  The test suspends the src, migrates, then
> wakes the dest.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 12/12] tests/qtest: postcopy migration with suspend
  2023-12-06 17:23 ` [PATCH V7 12/12] tests/qtest: postcopy " Steve Sistare
@ 2023-12-11  6:54   ` Peter Xu
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2023-12-11  6:54 UTC (permalink / raw)
  To: Steve Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Wed, Dec 06, 2023 at 09:23:37AM -0800, Steve Sistare wrote:
> Add a test case to verify that the suspended state is handled correctly by
> live migration postcopy.  The test suspends the src, migrates, then wakes
> the dest.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 00/12] fix migration of suspended runstate
  2023-12-06 17:30 ` [PATCH V7 00/12] fix migration of suspended runstate Steven Sistare
@ 2023-12-11  6:56   ` Peter Xu
  2023-12-11 13:31     ` Steven Sistare
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2023-12-11  6:56 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Wed, Dec 06, 2023 at 12:30:02PM -0500, Steven Sistare wrote:
>     cpus: stop vm in suspended runstate

This patch still didn't copy the QAPI maintainers, please remember to do so
in a new post.

Maybe it would be easier to move the QAPI doc changes into a separate
patch?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 05/12] migration: propagate suspended runstate
  2023-12-11  6:46   ` Peter Xu
@ 2023-12-11 13:23     ` Steven Sistare
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Sistare @ 2023-12-11 13:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On 12/11/2023 1:46 AM, Peter Xu wrote:
> On Wed, Dec 06, 2023 at 09:23:30AM -0800, Steve Sistare wrote:
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state.  To maintain backward and forward compatibility, reclaim
>> some space from the runstate member.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> One nitpick below.
> 
>> ---
>>  migration/global_state.c | 35 +++++++++++++++++++++++++++++++++--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..d4f61a1 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -22,7 +22,16 @@
>>  
>>  typedef struct {
>>      uint32_t size;
>> -    uint8_t runstate[100];
>> +
>> +    /*
>> +     * runstate was 100 bytes, zero padded, but we trimmed it to add a
>> +     * few fields and maintain backwards compatibility.
>> +     */
>> +    uint8_t runstate[32];
>> +    uint8_t has_vm_was_suspended;
>> +    uint8_t vm_was_suspended;
>> +    uint8_t unused[66];
>> +
>>      RunState state;
>>      bool received;
>>  } GlobalState;
>> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>>      assert(strlen(state_str) < sizeof(global_state.runstate));
>>      strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>>                state_str, '\0');
>> +    global_state.has_vm_was_suspended = true;
>> +    global_state.vm_was_suspended = vm_get_suspended();
>> +
>> +    memset(global_state.unused, 0, sizeof(global_state.unused));
>>  }
>>  
>>  void global_state_store(void)
>> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>>          return true;
>>      }
>>  
>> +    /* If the suspended state must be remembered, it is needed */
>> +
>> +    if (vm_get_suspended()) {
>> +        return true;
>> +    }
> 
> Can we drop this section?
> 
> I felt unsafe when QEMU can overwrite the option even if user explicitly
> specified store-global-state=off but we still send this..  Ideally I think
> it's better if it's as simple as:
> 
> static bool global_state_needed(void *opaque)
> {
>     return migrate_get_current()->store_global_state;
> }

I agree, I also did not see the point of dropping global_state for some states.
I will simplify it to this. 

- Steve

> I don't think we can remove the old trick due to compatibility reasons, but
> maybe nice to not add new ones to make this section more unpredictable in
> the migration stream?
> 
> IMHO it shouldn't matter in reality for the current use case even dropping
> it, as I don't expect any non-Xen QEMU VMs migrates without having the
> option turned on (store-global-state=on) after QEMU 2.4.
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 00/12] fix migration of suspended runstate
  2023-12-11  6:56   ` Peter Xu
@ 2023-12-11 13:31     ` Steven Sistare
  2023-12-12  8:54       ` Peter Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Sistare @ 2023-12-11 13:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On 12/11/2023 1:56 AM, Peter Xu wrote:
> On Wed, Dec 06, 2023 at 12:30:02PM -0500, Steven Sistare wrote:
>>     cpus: stop vm in suspended runstate
> 
> This patch still didn't copy the QAPI maintainers, please remember to do so
> in a new post.
> 
> Maybe it would be easier to move the QAPI doc changes into a separate
> patch?

This was intentional.  I did not cc them for the whole series to spare them from
excess email.  You cc'd them for "[PATCH V6 03/14] cpus: stop vm in suspended runstate" 
they had no comments, and there is no change for V7, so I assumed we are OK, but I will
cc them again for that patch to be sure.

- Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
  2023-12-06 17:23 ` [PATCH V7 02/12] cpus: stop vm in suspended runstate Steve Sistare
  2023-12-06 18:45   ` Philippe Mathieu-Daudé
@ 2023-12-11 13:37   ` Steven Sistare
  1 sibling, 0 replies; 32+ messages in thread
From: Steven Sistare @ 2023-12-11 13:37 UTC (permalink / raw)
  To: Markus Armbruster, Eric Blake
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, qemu-devel

Hi Markus, Eric, any comment about this change in stop/cont behavior before
it gets pulled?  There is no change since V6.

- Steve

On 12/6/2023 12:23 PM, Steve Sistare wrote:
> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
> have been paused, but the cpu clock still runs, and runstate notifiers for
> the transition to stopped have not been called.  This causes problems for
> live migration.  Stale cpu timers_state is saved to the migration stream,
> causing time errors in the guest when it wakes from suspend, and state that
> would have been modified by runstate notifiers is wrong.
> 
> Modify vm_stop to completely stop the vm if the current state is suspended,
> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
> Modify vm_start to restore the suspended state.
> 
> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
> cont commands.  For example:
> 
>     (qemu) info status
>     VM status: paused (suspended)
> 
>     (qemu) stop
>     (qemu) info status
>     VM status: paused
> 
>     (qemu) system_wakeup
>     Error: Unable to wake up: guest is not in suspended state
> 
>     (qemu) cont
>     (qemu) info status
>     VM status: paused (suspended)
> 
>     (qemu) system_wakeup
>     (qemu) info status
>     VM status: running
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  include/sysemu/runstate.h |  5 +++++
>  qapi/misc.json            | 10 ++++++++--
>  system/cpus.c             | 23 +++++++++++++++--------
>  system/runstate.c         |  3 +++
>  4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 88a67e2..867e53c 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause cause)
>      return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>  }
>  
> +static inline bool runstate_is_live(RunState state)
> +{
> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
> +}
> +
>  void vm_start(void);
>  
>  /**
> diff --git a/qapi/misc.json b/qapi/misc.json
> index cda2eff..efb8d44 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -134,7 +134,7 @@
>  ##
>  # @stop:
>  #
> -# Stop all guest VCPU execution.
> +# Stop all guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -143,6 +143,9 @@
>  #     the guest remains paused once migration finishes, as if the -S
>  #     option was passed on the command line.
>  #
> +#     In the "suspended" state, it will completely stop the VM and
> +#     cause a transition to the "paused" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "stop" }
> @@ -153,7 +156,7 @@
>  ##
>  # @cont:
>  #
> -# Resume guest VCPU execution.
> +# Resume guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -165,6 +168,9 @@
>  #     guest starts once migration finishes, removing the effect of the
>  #     -S command line option if it was passed.
>  #
> +#     If the VM was previously suspended, and not been reset or woken,
> +#     this command will transition back to the "suspended" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "cont" }
> diff --git a/system/cpus.c b/system/cpus.c
> index 9f631ab..f162435 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -277,11 +277,15 @@ bool vm_get_suspended(void)
>  static int do_vm_stop(RunState state, bool send_stop)
>  {
>      int ret = 0;
> +    RunState oldstate = runstate_get();
>  
> -    if (runstate_is_running()) {
> +    if (runstate_is_live(oldstate)) {
> +        vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
>          runstate_set(state);
>          cpu_disable_ticks();
> -        pause_all_vcpus();
> +        if (oldstate == RUN_STATE_RUNNING) {
> +            pause_all_vcpus();
> +        }
>          vm_state_notify(0, state);
>          if (send_stop) {
>              qapi_event_send_stop();
> @@ -694,11 +698,13 @@ int vm_stop(RunState state)
>  
>  /**
>   * Prepare for (re)starting the VM.
> - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
> - * running or in case of an error condition), 0 otherwise.
> + * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
> + * and 1 otherwise.
>   */
>  int vm_prepare_start(bool step_pending)
>  {
> +    int ret = vm_was_suspended ? 1 : 0;
> +    RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING;
>      RunState requested;
>  
>      qemu_vmstop_requested(&requested);
> @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
>      qapi_event_send_resume();
>  
>      cpu_enable_ticks();
> -    runstate_set(RUN_STATE_RUNNING);
> -    vm_state_notify(1, RUN_STATE_RUNNING);
> -    return 0;
> +    runstate_set(state);
> +    vm_state_notify(1, state);
> +    vm_was_suspended = false;
> +    return ret;
>  }
>  
>  void vm_start(void)
> @@ -745,7 +752,7 @@ void vm_start(void)
>     current state is forgotten forever */
>  int vm_stop_force_state(RunState state)
>  {
> -    if (runstate_is_running()) {
> +    if (runstate_is_live(runstate_get())) {
>          return vm_stop(state);
>      } else {
>          int ret;
> diff --git a/system/runstate.c b/system/runstate.c
> index ea9d6c2..e2fa204 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
>      { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
>      { RUN_STATE_PAUSED, RUN_STATE_COLO},
> +    { RUN_STATE_PAUSED, RUN_STATE_SUSPENDED},
>  
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
> @@ -161,6 +162,7 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>      { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
>      { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
> +    { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
>  
>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> @@ -502,6 +504,7 @@ void qemu_system_reset(ShutdownCause reason)
>          qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
>      }
>      cpu_synchronize_all_post_reset();
> +    vm_set_suspended(false);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 00/12] fix migration of suspended runstate
  2023-12-11 13:31     ` Steven Sistare
@ 2023-12-12  8:54       ` Peter Xu
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2023-12-12  8:54 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel, Juan Quintela, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

On Mon, Dec 11, 2023 at 08:31:17AM -0500, Steven Sistare wrote:
> On 12/11/2023 1:56 AM, Peter Xu wrote:
> > On Wed, Dec 06, 2023 at 12:30:02PM -0500, Steven Sistare wrote:
> >>     cpus: stop vm in suspended runstate
> > 
> > This patch still didn't copy the QAPI maintainers, please remember to do so
> > in a new post.
> > 
> > Maybe it would be easier to move the QAPI doc changes into a separate
> > patch?
> 
> This was intentional.  I did not cc them for the whole series to spare them from
> excess email.  You cc'd them for "[PATCH V6 03/14] cpus: stop vm in suspended runstate" 
> they had no comments, and there is no change for V7, so I assumed we are OK, but I will
> cc them again for that patch to be sure.

Yes, please do so.  Per my experience on QEMU, we normally need an ACK from
all sides to merge a patch.  There can be outliers if the changes are very
trivial so one maintainer may just pull that in, but it'll always be good
to always copy relevant maintainers.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V7 00/12] fix migration of suspended runstate
  2023-12-06 17:12 Steve Sistare
@ 2023-12-06 17:16 ` Steven Sistare
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Sistare @ 2023-12-06 17:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras

Gack, there was cruft in my send mail directory.  Please ignore this threaded series,
and I will resend.  Sorry for the noise.

- Steve

On 12/6/2023 12:12 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>       cpus: refactor vm_stop
>       cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>       cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
>     vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
>     migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
>     "pass runstate to vm_prepare_start" to
>     "check running not RUN_STATE_RUNNING"
>   * drop patches:
>       tests/qtest: bootfile per vm
>       tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c          |   2 +-
>  hw/usb/hcd-ehci.c                    |   2 +-
>  hw/usb/redirect.c                    |   2 +-
>  hw/xen/xen-hvm-common.c              |   2 +-
>  include/migration/snapshot.h         |   7 ++
>  include/sysemu/runstate.h            |  16 ++++
>  migration/global_state.c             |  35 ++++++-
>  migration/migration-hmp-cmds.c       |   8 +-
>  migration/migration.c                |  15 +--
>  migration/savevm.c                   |  23 +++--
>  qapi/misc.json                       |  10 +-
>  system/cpus.c                        |  47 +++++++--
>  system/runstate.c                    |   9 ++
>  system/vl.c                          |   2 +
>  tests/migration/i386/Makefile        |   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +++++++++-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c      |  27 ++----
>  tests/qtest/migration-helpers.h      |  11 ++-
>  tests/qtest/migration-test.c         | 181 +++++++++++++++++++++++++----------
>  20 files changed, 354 insertions(+), 126 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V7 00/12] fix migration of suspended runstate
@ 2023-12-06 17:12 Steve Sistare
  2023-12-06 17:16 ` Steven Sistare
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Sistare @ 2023-12-06 17:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Peter Xu, Paolo Bonzini, Thomas Huth,
	Daniel P. Berrangé,
	Fabiano Rosas, Leonardo Bras, Steve Sistare

Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
      cpus: refactor vm_stop
      cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
      cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
    vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
    migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Changes in V7:
  * rebase to tip, add RB-s
  * fix backwards compatibility for global_state.vm_was_suspended
  * delete vm_prepare_start state argument, and rename patch
    "pass runstate to vm_prepare_start" to
    "check running not RUN_STATE_RUNNING"
  * drop patches:
      tests/qtest: bootfile per vm
      tests/qtest: background migration with suspend
  * rename runstate_is_started to runstate_is_live
  * move wait_for_suspend in tests

Steve Sistare (12):
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: check running not RUN_STATE_RUNNING
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

 backends/tpm/tpm_emulator.c          |   2 +-
 hw/usb/hcd-ehci.c                    |   2 +-
 hw/usb/redirect.c                    |   2 +-
 hw/xen/xen-hvm-common.c              |   2 +-
 include/migration/snapshot.h         |   7 ++
 include/sysemu/runstate.h            |  16 ++++
 migration/global_state.c             |  35 ++++++-
 migration/migration-hmp-cmds.c       |   8 +-
 migration/migration.c                |  15 +--
 migration/savevm.c                   |  23 +++--
 qapi/misc.json                       |  10 +-
 system/cpus.c                        |  47 +++++++--
 system/runstate.c                    |   9 ++
 system/vl.c                          |   2 +
 tests/migration/i386/Makefile        |   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +++++++++-
 tests/migration/i386/a-b-bootblock.h |  26 +++--
 tests/qtest/migration-helpers.c      |  27 ++----
 tests/qtest/migration-helpers.h      |  11 ++-
 tests/qtest/migration-test.c         | 181 +++++++++++++++++++++++++----------
 20 files changed, 354 insertions(+), 126 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-12-12  8:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06 17:23 [PATCH V7 00/12] fix migration of suspended runstate Steve Sistare
2023-12-06 17:23 ` [PATCH V7 01/12] cpus: vm_was_suspended Steve Sistare
2023-12-06 17:23 ` [PATCH V7 02/12] cpus: stop vm in suspended runstate Steve Sistare
2023-12-06 18:45   ` Philippe Mathieu-Daudé
2023-12-06 19:19     ` Steven Sistare
2023-12-06 20:48       ` Philippe Mathieu-Daudé
2023-12-06 21:09         ` Philippe Mathieu-Daudé
2023-12-11  6:25           ` Peter Xu
2023-12-11 13:37   ` Steven Sistare
2023-12-06 17:23 ` [PATCH V7 03/12] cpus: check running not RUN_STATE_RUNNING Steve Sistare
2023-12-06 17:23 ` [PATCH V7 04/12] cpus: vm_resume Steve Sistare
2023-12-06 17:23 ` [PATCH V7 05/12] migration: propagate suspended runstate Steve Sistare
2023-12-08 16:37   ` Fabiano Rosas
2023-12-08 17:28     ` Steven Sistare
2023-12-11  6:46   ` Peter Xu
2023-12-11 13:23     ` Steven Sistare
2023-12-06 17:23 ` [PATCH V7 06/12] migration: preserve " Steve Sistare
2023-12-06 17:23 ` [PATCH V7 07/12] migration: preserve suspended for snapshot Steve Sistare
2023-12-06 17:23 ` [PATCH V7 08/12] migration: preserve suspended for bg_migration Steve Sistare
2023-12-06 17:23 ` [PATCH V7 09/12] tests/qtest: migration events Steve Sistare
2023-12-06 17:23 ` [PATCH V7 10/12] tests/qtest: option to suspend during migration Steve Sistare
2023-12-06 17:23 ` [PATCH V7 11/12] tests/qtest: precopy migration with suspend Steve Sistare
2023-12-08 16:37   ` Fabiano Rosas
2023-12-11  6:54   ` Peter Xu
2023-12-06 17:23 ` [PATCH V7 12/12] tests/qtest: postcopy " Steve Sistare
2023-12-11  6:54   ` Peter Xu
2023-12-06 17:30 ` [PATCH V7 00/12] fix migration of suspended runstate Steven Sistare
2023-12-11  6:56   ` Peter Xu
2023-12-11 13:31     ` Steven Sistare
2023-12-12  8:54       ` Peter Xu
  -- strict thread matches above, loose matches on Subject: below --
2023-12-06 17:12 Steve Sistare
2023-12-06 17:16 ` Steven Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.