All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] virtio-mem: Handle preallocation with migration
@ 2022-12-22 11:02 David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

While playing with migration of virtio-mem with an ordinary file backing,
I realized that migration and prealloc doesn't currently work as expected
for virtio-mem. Further, Jing Qi reported that setup issues (insufficient
huge pages on the destination) result in QEMU getting killed with SIGBUS
instead of failing gracefully.

In contrast to ordinary memory backend preallocation, virtio-mem
preallocates memory before plugging blocks to the guest. Consequently,
when migrating we are not actually preallocating on the destination but
"only" migrate pages. Fix that be migrating the bitmap early, before any
RAM content, and use that information to preallocate memory early, before
migrating any RAM.

Postcopy needs some extra care, and I realized that prealloc+postcopy is
shaky in general. Let's at least try to mimic what ordinary
prealloc+postcopy does: temporarily allocate the memory, discard it, and
cross fingers that we'll still have sufficient memory when postcopy
actually tries placing pages.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Privoznik <mprivozn@redhat.com>

v2 -> v3:
- New approach/rewrite, drop RB and TB of last patch

v1 -> v2:
- Added RBs and Tested-bys
- "virtio-mem: Fail if a memory backend with "prealloc=on" is specified"
-- Fail instead of warn
-- Adjust subject/description

David Hildenbrand (6):
  migration: Allow immutable device state to be migrated early (i.e.,
    before RAM)
  migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and
    VMSTATE_BITMAP_TEST()
  migration: Factor out checks for advised and listening incomming
    postcopy
  virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  virtio-mem: Migrate bitmap, size and sanity checks early
  virtio-mem: Proper support for preallocation with migration

 hw/core/machine.c              |   4 +-
 hw/virtio/virtio-mem.c         | 154 ++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-mem.h |   8 ++
 include/migration/misc.h       |   6 +-
 include/migration/vmstate.h    |  19 +++-
 migration/migration.c          |  27 ++++++
 migration/migration.h          |   4 +
 migration/ram.c                |  16 +---
 migration/savevm.c             | 104 ++++++++++++++++------
 migration/savevm.h             |   1 +
 10 files changed, 298 insertions(+), 45 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  2022-12-23  9:34   ` David Hildenbrand
  2023-01-04 17:23   ` Peter Xu
  2022-12-22 11:02 ` [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content. This
information is immutable on the migration source while migration is active,

For example, we want to use this information for proper preallocation
support with migration: currently, we don't preallocate memory on the
migration target, and especially with hugetlb, we can easily run out of
hugetlb pages during RAM migration and will crash (SIGBUS) instead of
catching this gracefully via preallocation.

Migrating device state before we start iterating is currently impossible.
Introduce and use qemu_savevm_state_start_precopy(), and use
a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
state will be saved in qemu_savevm_state_start_precopy() or in
qemu_savevm_state_complete_precopy_*().

We have to take care of properly including the early device state in the
vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
a bit sub-optimal, but we use that explicitly or implicitly all over the
place already, so this barely matters in practice.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/vmstate.h |   7 +++
 migration/migration.c       |  13 +++++
 migration/migration.h       |   4 ++
 migration/savevm.c          | 104 +++++++++++++++++++++++++++---------
 migration/savevm.h          |   1 +
 5 files changed, 104 insertions(+), 25 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..79eb2409a2 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -156,6 +156,13 @@ typedef enum {
     MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
     MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
     MIG_PRI_GICV3,              /* Must happen before the ITS */
+    /*
+     * Must happen before all other devices (iterable and non-iterable),
+     * especiall, before migrating RAM content. Such device state must be
+     * guaranteed to be immutable on the migration source until migration
+     * ends and must not depend on the CPU state to be synchronized.
+     */
+    MIG_PRI_POST_SETUP,
     MIG_PRI_MAX,
 } MigrationPriority;
 
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..78b6bb8765 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
     s->vm_was_running = false;
     s->iteration_initial_bytes = 0;
     s->threshold_size = 0;
+
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;
 }
 
 int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -3997,6 +4000,9 @@ static void *migration_thread(void *opaque)
 
     trace_migration_thread_setup_complete();
 
+    /* Process early data that has to get migrated before iterating. */
+    qemu_savevm_state_start_precopy(s->to_dst_file);
+
     while (migration_is_active(s)) {
         if (urgent || !qemu_file_rate_limit(s->to_dst_file)) {
             MigIterateState iter_state = migration_iteration_run(s);
@@ -4125,6 +4131,12 @@ static void *bg_migration_thread(void *opaque)
     if (vm_stop_force_state(RUN_STATE_PAUSED)) {
         goto fail;
     }
+
+    /* Migrate early data that would usually get migrated before iterating. */
+    if (qemu_savevm_state_start_precopy(fb)) {
+        goto fail;
+    }
+
     /*
      * Put vCPUs in sync with shadow context structures, then
      * save their state to channel-buffer along with devices.
@@ -4445,6 +4457,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
+    json_writer_free(ms->vmdesc);
 }
 
 static void migration_instance_init(Object *obj)
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+    JSONWriter *vmdesc;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..b810409574 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
 #include "postcopy-ram.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
@@ -1325,6 +1324,71 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
+static int qemu_savevm_state_precopy_one_non_iterable(SaveStateEntry *se,
+                                                      QEMUFile *f,
+                                                      JSONWriter *vmdesc)
+{
+    int ret;
+
+    if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
+        return 0;
+    }
+    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+        trace_savevm_section_skip(se->idstr, se->section_id);
+        return 0;
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_str(vmdesc, "name", se->idstr);
+    json_writer_int64(vmdesc, "instance_id", se->instance_id);
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+    ret = vmstate_save(f, se, vmdesc);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+    trace_savevm_section_end(se->idstr, se->section_id, 0);
+    save_section_footer(f, se);
+
+    json_writer_end_object(vmdesc);
+    return 0;
+}
+
+int qemu_savevm_state_start_precopy(QEMUFile *f)
+{
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc;
+    SaveStateEntry *se;
+    int ret;
+
+    assert(!ms->vmdesc);
+    ms->vmdesc = vmdesc = json_writer_new(false);
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(vmdesc, "devices");
+
+    /*
+     * Only immutable non-iterable device state is expected to be saved this
+     * early. All remaining (ordinary) non-iterable device state will be saved
+     * in qemu_savevm_state_complete_precopy_non_iterable().
+     */
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (save_state_priority(se) < MIG_PRI_POST_SETUP) {
+            continue;
+        }
+
+        ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 static
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
@@ -1364,41 +1428,24 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    /* qemu_savevm_state_start_precopy() is expected to be called first. */
+    assert(vmdesc);
 
-        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
-            continue;
-        }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            trace_savevm_section_skip(se->idstr, se->section_id);
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (save_state_priority(se) >= MIG_PRI_POST_SETUP) {
             continue;
         }
 
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
-        ret = vmstate_save(f, se, vmdesc);
+        ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
         if (ret) {
-            qemu_file_set_error(f, ret);
             return ret;
         }
-        trace_savevm_section_end(se->idstr, se->section_id, 0);
-        save_section_footer(f, se);
-
-        json_writer_end_object(vmdesc);
     }
 
     if (inactivate_disks) {
@@ -1427,6 +1474,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
     }
 
+    /* Free it now to detect any inconsistencies. */
+    g_free(vmdesc);
+    ms->vmdesc = NULL;
+
     return 0;
 }
 
@@ -1541,6 +1592,9 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     qemu_savevm_state_setup(f);
     qemu_mutex_lock_iothread();
 
+    /* Process early data that has to get migrated before iterating. */
+    qemu_savevm_state_start_precopy(f);
+
     while (qemu_file_get_error(f) == 0) {
         if (qemu_savevm_state_iterate(f, false) > 0) {
             break;
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..323bd5ab3b 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -38,6 +38,7 @@ void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
+int qemu_savevm_state_start_precopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  2023-01-05 17:33   ` Dr. David Alan Gilbert
  2022-12-22 11:02 ` [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

We'll make use of both next in the context of virtio-mem.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/vmstate.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 79eb2409a2..73ad1ae0eb 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -712,8 +712,9 @@ extern const VMStateInfo vmstate_info_qlist;
  *        '_state' type
  *    That the pointer is right at the start of _tmp_type.
  */
-#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) {                 \
+#define VMSTATE_WITH_TMP_TEST(_state, _test, _tmp_type, _vmsd) {     \
     .name         = "tmp",                                           \
+    .field_exists = (_test),                                         \
     .size         = sizeof(_tmp_type) +                              \
                     QEMU_BUILD_BUG_ON_ZERO(offsetof(_tmp_type, parent) != 0) + \
                     type_check_pointer(_state,                       \
@@ -722,6 +723,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .info         = &vmstate_info_tmp,                               \
 }
 
+#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) \
+    VMSTATE_WITH_TMP_TEST(_state, NULL, _tmp_type, _vmsd)
+
 #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {              \
     .name         = "unused",                                        \
     .field_exists = (_test),                                         \
@@ -745,8 +749,9 @@ extern const VMStateInfo vmstate_info_qlist;
 /* _field_size should be a int32_t field in the _state struct giving the
  * size of the bitmap _field in bits.
  */
-#define VMSTATE_BITMAP(_field, _state, _version, _field_size) {      \
+#define VMSTATE_BITMAP_TEST(_field, _state, _test, _version, _field_size) { \
     .name         = (stringify(_field)),                             \
+    .field_exists = (_test),                                         \
     .version_id   = (_version),                                      \
     .size_offset  = vmstate_offset_value(_state, _field_size, int32_t),\
     .info         = &vmstate_info_bitmap,                            \
@@ -754,6 +759,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset       = offsetof(_state, _field),                        \
 }
 
+#define VMSTATE_BITMAP(_field, _state, _version, _field_size) \
+    VMSTATE_BITMAP_TEST(_field, _state, NULL, _version, _field_size)
+
 /* For migrating a QTAILQ.
  * Target QTAILQ needs be properly initialized.
  * _type: type of QTAILQ element
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  2023-01-05 17:18   ` Peter Xu
  2022-12-22 11:02 ` [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

Let's factor out both checks, to be used in virtio-mem context next.

While at it, fix a spelling error in a related comment.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/misc.h |  6 +++++-
 migration/migration.c    | 14 ++++++++++++++
 migration/ram.c          | 16 ++--------------
 3 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 465906710d..c7e67a6804 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -67,8 +67,12 @@ bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
 bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
-/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
+/* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+/* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
+bool migration_incoming_postcopy_advised(void);
+/* True if incoming migration entered POSTCOPY_INCOMING_LISTENING */
+bool migration_incoming_postcopy_listening(void);
 /* True if background snapshot is active */
 bool migration_in_bg_snapshot(void);
 
diff --git a/migration/migration.c b/migration/migration.c
index 78b6bb8765..7a69bb93b0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2094,6 +2094,20 @@ bool migration_in_incoming_postcopy(void)
     return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
 }
 
+bool migration_incoming_postcopy_advised(void)
+{
+    PostcopyState ps = postcopy_state_get();
+
+    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
+}
+
+bool migration_incoming_postcopy_listening(void)
+{
+    PostcopyState ps = postcopy_state_get();
+
+    return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
+}
+
 bool migration_in_bg_snapshot(void)
 {
     MigrationState *s = migrate_get_current();
diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..44b063eccd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4091,18 +4091,6 @@ int ram_load_postcopy(QEMUFile *f, int channel)
     return ret;
 }
 
-static bool postcopy_is_advised(void)
-{
-    PostcopyState ps = postcopy_state_get();
-    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
-}
-
-static bool postcopy_is_running(void)
-{
-    PostcopyState ps = postcopy_state_get();
-    return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
-}
-
 /*
  * Flush content of RAM cache into SVM's memory.
  * Only flush the pages that be dirtied by PVM or SVM or both.
@@ -4167,7 +4155,7 @@ static int ram_load_precopy(QEMUFile *f)
     MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
-    bool postcopy_advised = postcopy_is_advised();
+    bool postcopy_advised = migration_incoming_postcopy_advised();
     if (!migrate_use_compression()) {
         invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
     }
@@ -4365,7 +4353,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      * If system is running in postcopy mode, page inserts to host memory must
      * be atomic
      */
-    bool postcopy_running = postcopy_is_running();
+    bool postcopy_running = migration_incoming_postcopy_listening();
 
     seq_iter++;
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (2 preceding siblings ...)
  2022-12-22 11:02 ` [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration David Hildenbrand
  5 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.

Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.

Tested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 5c22c4b876..e0e908d392 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -772,6 +772,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "'%s' property specifies an unsupported memdev",
                    VIRTIO_MEM_MEMDEV_PROP);
         return;
+    } else if (vmem->memdev->prealloc) {
+        error_setg(errp, "'%s' property specifies a memdev with preallocation"
+                   " enabled: %s. Instead, specify 'prealloc=on' for the"
+                   " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP,
+                   object_get_canonical_path_component(OBJECT(vmem->memdev)));
+        return;
     }
 
     if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) ||
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (3 preceding siblings ...)
  2022-12-22 11:02 ` [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  2022-12-22 11:02 ` [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration David Hildenbrand
  5 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content.

Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.

In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.

Use a new device property to keep behavior of compat machines
unmodified.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine.c              |  4 ++-
 hw/virtio/virtio-mem.c         | 51 ++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-mem.h |  8 ++++++
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f589b92909..6532190412 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+    { "virtio-mem", "x-early-migration", "false" },
+};
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index e0e908d392..043b96f509 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -31,6 +31,8 @@
 #include CONFIG_DEVICES
 #include "trace.h"
 
+static const VMStateDescription vmstate_virtio_mem_device_early;
+
 /*
  * We only had legacy x86 guests that did not support
  * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
@@ -878,6 +880,10 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
 
     host_memory_backend_set_mapped(vmem->memdev, true);
     vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
+    if (vmem->early_migration) {
+        vmstate_register(VMSTATE_IF(vmem), VMSTATE_INSTANCE_ID_ANY,
+                         &vmstate_virtio_mem_device_early, vmem);
+    }
     qemu_register_reset(virtio_mem_system_reset, vmem);
 
     /*
@@ -899,6 +905,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      */
     memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
     qemu_unregister_reset(virtio_mem_system_reset, vmem);
+    if (vmem->early_migration) {
+        vmstate_unregister(VMSTATE_IF(vmem), &vmstate_virtio_mem_device_early,
+                           vmem);
+    }
     vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
     host_memory_backend_set_mapped(vmem->memdev, false);
     virtio_del_queue(vdev, 0);
@@ -1015,18 +1025,53 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = {
     },
 };
 
+static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+
+    /* With early migration, these fields were already migrated. */
+    return !vmem->early_migration;
+}
+
 static const VMStateDescription vmstate_virtio_mem_device = {
     .name = "virtio-mem-device",
     .minimum_version_id = 1,
     .version_id = 1,
     .priority = MIG_PRI_VIRTIO_MEM,
     .post_load = virtio_mem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists,
+                              VirtIOMEMMigSanityChecks,
+                              vmstate_virtio_mem_sanity_checks),
+        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
+        VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists),
+        VMSTATE_UINT64(requested_size, VirtIOMEM),
+        VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists,
+                            0, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+/*
+ * Transfer properties that are immutable while migration is active early,
+ * such that we have have this information around before migrating any RAM
+ * content.
+ *
+ * Note that virtio_mem_is_busy() makes sure these properties can no longer
+ * change on the migration source until migration completed.
+ *
+ * With QEMU compat machines, we transmit these properties later, via
+ * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists().
+ */
+static const VMStateDescription vmstate_virtio_mem_device_early = {
+    .name = "virtio-mem-device-early",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .priority = MIG_PRI_POST_SETUP,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
         VMSTATE_UINT64(size, VirtIOMEM),
-        VMSTATE_UINT64(requested_size, VirtIOMEM),
         VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
         VMSTATE_END_OF_LIST()
     },
@@ -1211,6 +1256,8 @@ static Property virtio_mem_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
                             unplugged_inaccessible, ON_OFF_AUTO_AUTO),
 #endif
+    DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM,
+                     early_migration, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 7745cfc1a3..f15e561785 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
 #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
 #define VIRTIO_MEM_ADDR_PROP "memaddr"
 #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
+#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration"
 #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
 
 struct VirtIOMEM {
@@ -74,6 +75,13 @@ struct VirtIOMEM {
     /* whether to prealloc memory when plugging new blocks */
     bool prealloc;
 
+    /*
+     * Whether we migrate properties that are immutable while migration is
+     * active early, before state of other devices and especially, before
+     * migrating any RAM content.
+     */
+    bool early_migration;
+
     /* notifiers to notify when "size" changes */
     NotifierList size_change_notifiers;
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration
  2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (4 preceding siblings ...)
  2022-12-22 11:02 ` [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early David Hildenbrand
@ 2022-12-22 11:02 ` David Hildenbrand
  5 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2022-12-22 11:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik, Jing Qi

Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.

This is especially relevant for the following cases:

(1) User errors

With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.

(2) Excluded memory ranges during migration

For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.

To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.

Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the our state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.

Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.

Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:

  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
  qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
  qemu-system-x86_64: load of migration failed: Cannot allocate memory

Reported-by: Jing Qi <jinqi@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 97 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 043b96f509..c1cf448046 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
+static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg,
+                                             virtio_mem_range_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size);
+    while (first_bit < vmem->bitmap_size) {
+        offset = first_bit * vmem->block_size;
+        last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * vmem->block_size;
+
+        ret = cb(vmem, arg, offset, size);
+        if (ret) {
+            break;
+        }
+        first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                  last_bit + 2);
+    }
+    return ret;
+}
+
 /*
  * Adjust the memory section to cover the intersection with the given range.
  *
@@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     RamDiscardListener *rdl;
     int ret;
 
+    if (vmem->prealloc && !vmem->early_migration) {
+        warn_report("Proper preallocation with migration requires a newer QEMU machine");
+    }
+
     /*
      * We started out with all memory discarded and our memory region is mapped
      * into an address space. Replay, now that we updated the bitmap.
@@ -957,6 +985,74 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     return virtio_mem_restore_unplugged(vmem);
 }
 
+static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg,
+                                        uint64_t offset, uint64_t size)
+{
+    void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset;
+    int fd = memory_region_get_fd(&vmem->memdev->mr);
+    Error *local_err = NULL;
+
+    qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -ENOMEM;
+    }
+    return 0;
+}
+
+static int virtio_mem_post_load_early(void *opaque, int version_id)
+{
+    VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+    RAMBlock *rb = vmem->memdev->mr.ram_block;
+    int ret;
+
+    if (!vmem->prealloc) {
+        return 0;
+    }
+
+    if (migration_incoming_postcopy_listening()) {
+        /*
+         * This is unexpected, we're not supposed to be loaded after
+         * postcopy is listening because ram_block_enable_notify() already
+         * armed userfaultfd. Let's play safe and catch it.
+         */
+        warn_report("Postcopy is already listening, preallocation is impossible.");
+        return -EBUSY;
+    }
+
+    /*
+     * We restored the bitmap and verified that the basic properties
+     * match on source and destination, so we can go ahead and preallocate
+     * memory for all plugged memory blocks, before actual RAM migration starts
+     * touching this memory.
+     */
+    ret = virtio_mem_for_each_plugged_range(vmem, NULL,
+                                            virtio_mem_prealloc_range_cb);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * This is tricky: postcopy wants to start with a clean slate. On
+     * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily
+     * preallocated) RAM such that postcopy will work as expected later.
+     *
+     * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual
+     * RAM migration. So let's discard all memory again. This looks like an
+     * expensive NOP, but actually serves a purpose: we made sure that we
+     * were able to allocate all required backend memory once. We cannot
+     * guarantee that the backend memory we will free will remain free
+     * until we need it during postcopy, but at least we can catch the
+     * obvious setup issues this way.
+     */
+    if (migration_incoming_postcopy_advised()) {
+        if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
+            return -EBUSY;
+        }
+    }
+    return 0;
+}
+
 typedef struct VirtIOMEMMigSanityChecks {
     VirtIOMEM *parent;
     uint64_t addr;
@@ -1068,6 +1164,7 @@ static const VMStateDescription vmstate_virtio_mem_device_early = {
     .minimum_version_id = 1,
     .version_id = 1,
     .priority = MIG_PRI_POST_SETUP,
+    .post_load = virtio_mem_post_load_early,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
@ 2022-12-23  9:34   ` David Hildenbrand
  2023-01-05  1:27     ` Michael S. Tsirkin
  2023-01-04 17:23   ` Peter Xu
  1 sibling, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2022-12-23  9:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Juan Quintela, Peter Xu,
	Michael S . Tsirkin, Michal Privoznik

On 22.12.22 12:02, David Hildenbrand wrote:
> For virtio-mem, we want to have the plugged/unplugged state of memory
> blocks available before migrating any actual RAM content. This
> information is immutable on the migration source while migration is active,
> 
> For example, we want to use this information for proper preallocation
> support with migration: currently, we don't preallocate memory on the
> migration target, and especially with hugetlb, we can easily run out of
> hugetlb pages during RAM migration and will crash (SIGBUS) instead of
> catching this gracefully via preallocation.
> 
> Migrating device state before we start iterating is currently impossible.
> Introduce and use qemu_savevm_state_start_precopy(), and use
> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
> state will be saved in qemu_savevm_state_start_precopy() or in
> qemu_savevm_state_complete_precopy_*().
> 
> We have to take care of properly including the early device state in the
> vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
> a bit sub-optimal, but we use that explicitly or implicitly all over the
> place already, so this barely matters in practice.
> 
> Note that only very selected devices (i.e., ones seriously messing with
> RAM setup) are supposed to make use of that.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

[...]

>   
>       if (inactivate_disks) {
> @@ -1427,6 +1474,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>           qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>       }
>   
> +    /* Free it now to detect any inconsistencies. */
> +    g_free(vmdesc);

Missed to convert that to a json_writer_free().

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
  2022-12-23  9:34   ` David Hildenbrand
@ 2023-01-04 17:23   ` Peter Xu
  2023-01-05  8:35     ` David Hildenbrand
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-04 17:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
> Migrating device state before we start iterating is currently impossible.
> Introduce and use qemu_savevm_state_start_precopy(), and use
> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
> state will be saved in qemu_savevm_state_start_precopy() or in
> qemu_savevm_state_complete_precopy_*().

Can something like this be done in qemu_savevm_state_setup()?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2022-12-23  9:34   ` David Hildenbrand
@ 2023-01-05  1:27     ` Michael S. Tsirkin
  2023-01-05  8:20       ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2023-01-05  1:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela, Peter Xu,
	Michal Privoznik

On Fri, Dec 23, 2022 at 10:34:36AM +0100, David Hildenbrand wrote:
> On 22.12.22 12:02, David Hildenbrand wrote:
> > For virtio-mem, we want to have the plugged/unplugged state of memory
> > blocks available before migrating any actual RAM content. This
> > information is immutable on the migration source while migration is active,
> > 
> > For example, we want to use this information for proper preallocation
> > support with migration: currently, we don't preallocate memory on the
> > migration target, and especially with hugetlb, we can easily run out of
> > hugetlb pages during RAM migration and will crash (SIGBUS) instead of
> > catching this gracefully via preallocation.
> > 
> > Migrating device state before we start iterating is currently impossible.
> > Introduce and use qemu_savevm_state_start_precopy(), and use
> > a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
> > state will be saved in qemu_savevm_state_start_precopy() or in
> > qemu_savevm_state_complete_precopy_*().
> > 
> > We have to take care of properly including the early device state in the
> > vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
> > a bit sub-optimal, but we use that explicitly or implicitly all over the
> > place already, so this barely matters in practice.
> > 
> > Note that only very selected devices (i.e., ones seriously messing with
> > RAM setup) are supposed to make use of that.
> > 
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> [...]
> 
> >       if (inactivate_disks) {
> > @@ -1427,6 +1474,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
> >           qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
> >       }
> > +    /* Free it now to detect any inconsistencies. */
> > +    g_free(vmdesc);
> 
> Missed to convert that to a json_writer_free().


I get it you will post v4?

> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-05  1:27     ` Michael S. Tsirkin
@ 2023-01-05  8:20       ` David Hildenbrand
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-01-05  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela, Peter Xu,
	Michal Privoznik

On 05.01.23 02:27, Michael S. Tsirkin wrote:
> On Fri, Dec 23, 2022 at 10:34:36AM +0100, David Hildenbrand wrote:
>> On 22.12.22 12:02, David Hildenbrand wrote:
>>> For virtio-mem, we want to have the plugged/unplugged state of memory
>>> blocks available before migrating any actual RAM content. This
>>> information is immutable on the migration source while migration is active,
>>>
>>> For example, we want to use this information for proper preallocation
>>> support with migration: currently, we don't preallocate memory on the
>>> migration target, and especially with hugetlb, we can easily run out of
>>> hugetlb pages during RAM migration and will crash (SIGBUS) instead of
>>> catching this gracefully via preallocation.
>>>
>>> Migrating device state before we start iterating is currently impossible.
>>> Introduce and use qemu_savevm_state_start_precopy(), and use
>>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>>> state will be saved in qemu_savevm_state_start_precopy() or in
>>> qemu_savevm_state_complete_precopy_*().
>>>
>>> We have to take care of properly including the early device state in the
>>> vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
>>> a bit sub-optimal, but we use that explicitly or implicitly all over the
>>> place already, so this barely matters in practice.
>>>
>>> Note that only very selected devices (i.e., ones seriously messing with
>>> RAM setup) are supposed to make use of that.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> [...]
>>
>>>        if (inactivate_disks) {
>>> @@ -1427,6 +1474,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>>            qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>>>        }
>>> +    /* Free it now to detect any inconsistencies. */
>>> +    g_free(vmdesc);
>>
>> Missed to convert that to a json_writer_free().
> 
> 
> I get it you will post v4?

Yes, once the discussions on this version are done.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-04 17:23   ` Peter Xu
@ 2023-01-05  8:35     ` David Hildenbrand
  2023-01-05 17:15       ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-05  8:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 04.01.23 18:23, Peter Xu wrote:
> On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
>> Migrating device state before we start iterating is currently impossible.
>> Introduce and use qemu_savevm_state_start_precopy(), and use
>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>> state will be saved in qemu_savevm_state_start_precopy() or in
>> qemu_savevm_state_complete_precopy_*().
> 
> Can something like this be done in qemu_savevm_state_setup()?

Hi Peter,

Do you mean

(a) Moving qemu_savevm_state_start_precopy() effectively into
     qemu_savevm_state_setup()

(b) Using se->ops->save_setup()

I first tried going via (b), but decided to go the current way of using 
a proper vmstate with properties (instead of e.g., filling the stream 
manually), which also made vmdesc handling possible (and significantly 
cleaner).

Regarding (a), I decided to not move logic of 
qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), 
because it looked cleaner to save device state with the BQL held and for 
background snapshots, the VM has been stopped. To decouple device state 
saving from the setup path, just like we do it right now for all vmstates.

Having that said, for virtio-mem, it would still work because that state 
is immutable once migration starts, but it felt cleaner to separate the 
setup() phase from actual device state saving.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-05  8:35     ` David Hildenbrand
@ 2023-01-05 17:15       ` Peter Xu
  2023-01-09 14:34         ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-05 17:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
> On 04.01.23 18:23, Peter Xu wrote:
> > On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
> > > Migrating device state before we start iterating is currently impossible.
> > > Introduce and use qemu_savevm_state_start_precopy(), and use
> > > a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
> > > state will be saved in qemu_savevm_state_start_precopy() or in
> > > qemu_savevm_state_complete_precopy_*().
> > 
> > Can something like this be done in qemu_savevm_state_setup()?
> 
> Hi Peter,

Hi, David,

> 
> Do you mean
> 
> (a) Moving qemu_savevm_state_start_precopy() effectively into
>     qemu_savevm_state_setup()
> 
> (b) Using se->ops->save_setup()

I meant (b).

> 
> I first tried going via (b), but decided to go the current way of using a
> proper vmstate with properties (instead of e.g., filling the stream
> manually), which also made vmdesc handling possible (and significantly
> cleaner).
> 
> Regarding (a), I decided to not move logic of
> qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
> looked cleaner to save device state with the BQL held and for background
> snapshots, the VM has been stopped. To decouple device state saving from the
> setup path, just like we do it right now for all vmstates.

Is BQL required or optional?  IIUC it's at least still not taken in the
migration thread path, only in savevm path.

> 
> Having that said, for virtio-mem, it would still work because that state is
> immutable once migration starts, but it felt cleaner to separate the setup()
> phase from actual device state saving.

I get the point.  My major concerns are:

  (1) The new migration priority is changing the semantic of original,
      making it over-complicated

  (2) The new precopy-start routine added one more step to the migration
      framework, while it's somehow overlapping (if not to say, mostly the
      same as..) save_setup().

For (1): the old priority was only deciding the order of save entries in
the global list, nothing more than that.  Even if we want to have a
precopy-start phase, I'd suggest we use something else and keep the
migration priority simple.  Otherwise we really need serious documentation
for MigrationPriority and if so I'd rather don't bother and not reuse the
priority field.

For (2), if you see there're a bunch of save_setup() that already does
things like transferring static data besides the device states.  Besides
the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
also sends a bitmap during save_setup() and some others.  It looks clean to
me to do it in the same way as we used to.

Reusing vmstate_save() and vmsd structures are useful too which I totally
agree.  So.. can we just call vmstate_save_state() in the save_setup() of
the other new vmsd of virtio-mem?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy
  2022-12-22 11:02 ` [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy David Hildenbrand
@ 2023-01-05 17:18   ` Peter Xu
  2023-01-09 14:39     ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-05 17:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Dec 22, 2022 at 12:02:12PM +0100, David Hildenbrand wrote:
> +bool migration_incoming_postcopy_listening(void)
> +{
> +    PostcopyState ps = postcopy_state_get();
> +
> +    return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
> +}

This name is misleading, IMHO.

The code means "we passed listening phase" but the name implies "we're
listening".  We can add the "incoming" into that if we want, though.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()
  2022-12-22 11:02 ` [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
@ 2023-01-05 17:33   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-05 17:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> We'll make use of both next in the context of virtio-mem.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/migration/vmstate.h | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 79eb2409a2..73ad1ae0eb 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -712,8 +712,9 @@ extern const VMStateInfo vmstate_info_qlist;
>   *        '_state' type
>   *    That the pointer is right at the start of _tmp_type.
>   */
> -#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) {                 \
> +#define VMSTATE_WITH_TMP_TEST(_state, _test, _tmp_type, _vmsd) {     \
>      .name         = "tmp",                                           \
> +    .field_exists = (_test),                                         \
>      .size         = sizeof(_tmp_type) +                              \
>                      QEMU_BUILD_BUG_ON_ZERO(offsetof(_tmp_type, parent) != 0) + \
>                      type_check_pointer(_state,                       \
> @@ -722,6 +723,9 @@ extern const VMStateInfo vmstate_info_qlist;
>      .info         = &vmstate_info_tmp,                               \
>  }
>  
> +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) \
> +    VMSTATE_WITH_TMP_TEST(_state, NULL, _tmp_type, _vmsd)
> +
>  #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {              \
>      .name         = "unused",                                        \
>      .field_exists = (_test),                                         \
> @@ -745,8 +749,9 @@ extern const VMStateInfo vmstate_info_qlist;
>  /* _field_size should be a int32_t field in the _state struct giving the
>   * size of the bitmap _field in bits.
>   */
> -#define VMSTATE_BITMAP(_field, _state, _version, _field_size) {      \
> +#define VMSTATE_BITMAP_TEST(_field, _state, _test, _version, _field_size) { \
>      .name         = (stringify(_field)),                             \
> +    .field_exists = (_test),                                         \
>      .version_id   = (_version),                                      \
>      .size_offset  = vmstate_offset_value(_state, _field_size, int32_t),\
>      .info         = &vmstate_info_bitmap,                            \
> @@ -754,6 +759,9 @@ extern const VMStateInfo vmstate_info_qlist;
>      .offset       = offsetof(_state, _field),                        \
>  }
>  
> +#define VMSTATE_BITMAP(_field, _state, _version, _field_size) \
> +    VMSTATE_BITMAP_TEST(_field, _state, NULL, _version, _field_size)
> +
>  /* For migrating a QTAILQ.
>   * Target QTAILQ needs be properly initialized.
>   * _type: type of QTAILQ element
> -- 
> 2.38.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-05 17:15       ` Peter Xu
@ 2023-01-09 14:34         ` David Hildenbrand
  2023-01-09 19:54           ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-09 14:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 05.01.23 18:15, Peter Xu wrote:
> On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
>> On 04.01.23 18:23, Peter Xu wrote:
>>> On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
>>>> Migrating device state before we start iterating is currently impossible.
>>>> Introduce and use qemu_savevm_state_start_precopy(), and use
>>>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>>>> state will be saved in qemu_savevm_state_start_precopy() or in
>>>> qemu_savevm_state_complete_precopy_*().
>>>
>>> Can something like this be done in qemu_savevm_state_setup()?
>>
>> Hi Peter,
> 
> Hi, David,
> 
>>
>> Do you mean
>>
>> (a) Moving qemu_savevm_state_start_precopy() effectively into
>>      qemu_savevm_state_setup()
>>
>> (b) Using se->ops->save_setup()
> 
> I meant (b).
> 
>>
>> I first tried going via (b), but decided to go the current way of using a
>> proper vmstate with properties (instead of e.g., filling the stream
>> manually), which also made vmdesc handling possible (and significantly
>> cleaner).
>>
>> Regarding (a), I decided to not move logic of
>> qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
>> looked cleaner to save device state with the BQL held and for background
>> snapshots, the VM has been stopped. To decouple device state saving from the
>> setup path, just like we do it right now for all vmstates.
> 
> Is BQL required or optional?  IIUC it's at least still not taken in the
> migration thread path, only in savevm path.
> 
>>
>> Having that said, for virtio-mem, it would still work because that state is
>> immutable once migration starts, but it felt cleaner to separate the setup()
>> phase from actual device state saving.
> 
> I get the point.  My major concerns are:
> 
>    (1) The new migration priority is changing the semantic of original,
>        making it over-complicated
> 
>    (2) The new precopy-start routine added one more step to the migration
>        framework, while it's somehow overlapping (if not to say, mostly the
>        same as..) save_setup().
> 
> For (1): the old priority was only deciding the order of save entries in
> the global list, nothing more than that.  Even if we want to have a
> precopy-start phase, I'd suggest we use something else and keep the
> migration priority simple.  Otherwise we really need serious documentation
> for MigrationPriority and if so I'd rather don't bother and not reuse the
> priority field.
> 
> For (2), if you see there're a bunch of save_setup() that already does
> things like transferring static data besides the device states.  Besides
> the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
> also sends a bitmap during save_setup() and some others.  It looks clean to
> me to do it in the same way as we used to.
> 
> Reusing vmstate_save() and vmsd structures are useful too which I totally
> agree.  So.. can we just call vmstate_save_state() in the save_setup() of
> the other new vmsd of virtio-mem?


I went halfway that way, by moving stuff into qemu_savevm_state_setup()
and avoiding using a new migration priority. Seems to work:

I think we could go one step further and perform it from a save_setup() callback,
however, I'm not convinced that this gets particularly cleaner (vmdesc handling
eventually).

However, if there are hard feelings, I can look into that. Thanks.


 From e501f80dbbca1260445a6dac03053f426fbb572d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 20 Dec 2022 18:14:33 +0100
Subject: [PATCH] migration: Allow immutable device state to be migrated early
  (i.e., before RAM)

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content. This
information is immutable on the migration source while migration is active,

For example, we want to use this information for proper preallocation
support with migration: currently, we don't preallocate memory on the
migration target, and especially with hugetlb, we can easily run out of
hugetlb pages during RAM migration and will crash (SIGBUS) instead of
catching this gracefully via preallocation.

Migrating device state before we start iterating is currently impossible.
Let's allow for migrating such state during the setup state, indicating
applicable vmstate descriptors using a "immutable" flag.

We have to take care of properly including the early device state in the
vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
a bit sub-optimal, but we use that explicitly or implicitly all over the
place already, so this barely matters in practice.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  include/migration/vmstate.h |  5 +++
  migration/migration.c       |  4 ++
  migration/migration.h       |  4 ++
  migration/savevm.c          | 85 +++++++++++++++++++++++++++----------
  4 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..610e4c1e38 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -179,6 +179,11 @@ struct VMStateField {
  struct VMStateDescription {
      const char *name;
      int unmigratable;
+    /*
+     * The state is immutable while migration is active and the state can
+     * be migrated early, during the setup phase.
+     */
+    int immutable;
      int version_id;
      int minimum_version_id;
      MigrationPriority priority;
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..1d33a7efa0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
      s->vm_was_running = false;
      s->iteration_initial_bytes = 0;
      s->threshold_size = 0;
+
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;
  }
  
  int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -4445,6 +4448,7 @@ static void migration_instance_finalize(Object *obj)
      qemu_sem_destroy(&ms->rp_state.rp_sem);
      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
      error_free(ms->error);
+    json_writer_free(ms->vmdesc);
  }
  
  static void migration_instance_init(Object *obj)
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
  #include "exec/cpu-common.h"
  #include "hw/qdev-core.h"
  #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
  #include "qemu/thread.h"
  #include "qemu/coroutine_int.h"
  #include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
       * This save hostname when out-going migration starts
       */
      char *hostname;
+
+    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+    JSONWriter *vmdesc;
  };
  
  void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..e77f643f52 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
  #include "postcopy-ram.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
  #include "qapi/clone-visitor.h"
  #include "qapi/qapi-builtin-visit.h"
  #include "qapi/qmp/qerror.h"
@@ -1161,14 +1160,63 @@ bool qemu_savevm_state_guest_unplug_pending(void)
      return false;
  }
  
+static int qemu_savevm_state_precopy_one_non_iterable(SaveStateEntry *se,
+                                                      QEMUFile *f,
+                                                      JSONWriter *vmdesc)
+{
+    int ret;
+
+    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+        trace_savevm_section_skip(se->idstr, se->section_id);
+        return 0;
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_str(vmdesc, "name", se->idstr);
+    json_writer_int64(vmdesc, "instance_id", se->instance_id);
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+    ret = vmstate_save(f, se, vmdesc);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+    trace_savevm_section_end(se->idstr, se->section_id, 0);
+    save_section_footer(f, se);
+
+    json_writer_end_object(vmdesc);
+    return 0;
+}
+
  void qemu_savevm_state_setup(QEMUFile *f)
  {
-    SaveStateEntry *se;
+    MigrationState *ms = migrate_get_current();
      Error *local_err = NULL;
+    SaveStateEntry *se;
+    JSONWriter *vmdesc;
      int ret;
  
+    assert(!ms->vmdesc);
+    ms->vmdesc = vmdesc = json_writer_new(false);
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(vmdesc, "devices");
+
      trace_savevm_state_setup();
      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd) {
+            if (!se->vmsd->immutable) {
+                continue;
+            }
+            ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
+            if (ret) {
+                break;
+            }
+            continue;
+        }
+
          if (!se->ops || !se->ops->save_setup) {
              continue;
          }
@@ -1364,41 +1412,28 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                      bool in_postcopy,
                                                      bool inactivate_disks)
  {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc = ms->vmdesc;
      int vmdesc_len;
      SaveStateEntry *se;
      int ret;
  
-    vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    /* qemu_savevm_state_start_precopy() is expected to be called first. */
+    assert(vmdesc);
  
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
          if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
              continue;
          }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            trace_savevm_section_skip(se->idstr, se->section_id);
+        if (se->vmsd && se->vmsd->immutable) {
+            /* Already saved during qemu_savevm_state_setup(). */
              continue;
          }
  
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
-        ret = vmstate_save(f, se, vmdesc);
+        ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
          if (ret) {
-            qemu_file_set_error(f, ret);
              return ret;
          }
-        trace_savevm_section_end(se->idstr, se->section_id, 0);
-        save_section_footer(f, se);
-
-        json_writer_end_object(vmdesc);
      }
  
      if (inactivate_disks) {
@@ -1427,6 +1462,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
          qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
      }
  
+    /* Free it now to detect any inconsistencies. */
+    json_writer_free(vmdesc);
+    ms->vmdesc = NULL;
+
      return 0;
  }
  
-- 
2.39.0



-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy
  2023-01-05 17:18   ` Peter Xu
@ 2023-01-09 14:39     ` David Hildenbrand
  2023-01-09 14:42       ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-09 14:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 05.01.23 18:18, Peter Xu wrote:
> On Thu, Dec 22, 2022 at 12:02:12PM +0100, David Hildenbrand wrote:
>> +bool migration_incoming_postcopy_listening(void)
>> +{
>> +    PostcopyState ps = postcopy_state_get();
>> +
>> +    return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
>> +}
> 
> This name is misleading, IMHO.
> 
> The code means "we passed listening phase" but the name implies "we're
> listening".  We can add the "incoming" into that if we want, though.
> 

Let me call that migration_incoming_postcopy_running(). Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy
  2023-01-09 14:39     ` David Hildenbrand
@ 2023-01-09 14:42       ` David Hildenbrand
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-01-09 14:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 09.01.23 15:39, David Hildenbrand wrote:
> On 05.01.23 18:18, Peter Xu wrote:
>> On Thu, Dec 22, 2022 at 12:02:12PM +0100, David Hildenbrand wrote:
>>> +bool migration_incoming_postcopy_listening(void)
>>> +{
>>> +    PostcopyState ps = postcopy_state_get();
>>> +
>>> +    return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
>>> +}
>>
>> This name is misleading, IMHO.
>>
>> The code means "we passed listening phase" but the name implies "we're
>> listening".  We can add the "incoming" into that if we want, though.
>>
> 
> Let me call that migration_incoming_postcopy_running(). Thanks!
> 

... which is also misleading. Let me just drop the sanity check and this 
function.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-09 14:34         ` David Hildenbrand
@ 2023-01-09 19:54           ` Peter Xu
  2023-01-10 10:18             ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-09 19:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Mon, Jan 09, 2023 at 03:34:48PM +0100, David Hildenbrand wrote:
> On 05.01.23 18:15, Peter Xu wrote:
> > On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
> > > On 04.01.23 18:23, Peter Xu wrote:
> > > > On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
> > > > > Migrating device state before we start iterating is currently impossible.
> > > > > Introduce and use qemu_savevm_state_start_precopy(), and use
> > > > > a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
> > > > > state will be saved in qemu_savevm_state_start_precopy() or in
> > > > > qemu_savevm_state_complete_precopy_*().
> > > > 
> > > > Can something like this be done in qemu_savevm_state_setup()?
> > > 
> > > Hi Peter,
> > 
> > Hi, David,
> > 
> > > 
> > > Do you mean
> > > 
> > > (a) Moving qemu_savevm_state_start_precopy() effectively into
> > >      qemu_savevm_state_setup()
> > > 
> > > (b) Using se->ops->save_setup()
> > 
> > I meant (b).
> > 
> > > 
> > > I first tried going via (b), but decided to go the current way of using a
> > > proper vmstate with properties (instead of e.g., filling the stream
> > > manually), which also made vmdesc handling possible (and significantly
> > > cleaner).
> > > 
> > > Regarding (a), I decided to not move logic of
> > > qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
> > > looked cleaner to save device state with the BQL held and for background
> > > snapshots, the VM has been stopped. To decouple device state saving from the
> > > setup path, just like we do it right now for all vmstates.
> > 
> > Is BQL required or optional?  IIUC it's at least still not taken in the
> > migration thread path, only in savevm path.
> > 
> > > 
> > > Having that said, for virtio-mem, it would still work because that state is
> > > immutable once migration starts, but it felt cleaner to separate the setup()
> > > phase from actual device state saving.
> > 
> > I get the point.  My major concerns are:
> > 
> >    (1) The new migration priority is changing the semantic of original,
> >        making it over-complicated
> > 
> >    (2) The new precopy-start routine added one more step to the migration
> >        framework, while it's somehow overlapping (if not to say, mostly the
> >        same as..) save_setup().
> > 
> > For (1): the old priority was only deciding the order of save entries in
> > the global list, nothing more than that.  Even if we want to have a
> > precopy-start phase, I'd suggest we use something else and keep the
> > migration priority simple.  Otherwise we really need serious documentation
> > for MigrationPriority and if so I'd rather don't bother and not reuse the
> > priority field.
> > 
> > For (2), if you see there're a bunch of save_setup() that already does
> > things like transferring static data besides the device states.  Besides
> > the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
> > also sends a bitmap during save_setup() and some others.  It looks clean to
> > me to do it in the same way as we used to.
> > 
> > Reusing vmstate_save() and vmsd structures are useful too which I totally
> > agree.  So.. can we just call vmstate_save_state() in the save_setup() of
> > the other new vmsd of virtio-mem?
> 
> 
> I went halfway that way, by moving stuff into qemu_savevm_state_setup()
> and avoiding using a new migration priority. Seems to work:

The whole point of my suggestion is not moving things into
qemu_savevm_state_setup(), but avoid introducing more complexity to the
migration framework if unnecessary, so keep the generic framework as simple
as possible.

> 
> I think we could go one step further and perform it from a save_setup() callback,
> however, I'm not convinced that this gets particularly cleaner (vmdesc handling
> eventually).

What I wanted to suggest is exactly trying to avoid vmsd handling.  To be
more explicit, I mean: besides vmstate_virtio_mem_device_early, virtio-mem
can register with another new SaveVMHandlers with both save_setup and
load_setup registered, then e.g. in its save_setup(), one simply calls:

  vmstate_save_state(f, &vmstate_virtio_mem_device_early, virtio_mem_dev,
                     NULL);

I'm not sure whether the JSONWriter* is required in this case, maybe not
yet to at least make it work.

We'll need to impl the load part, but then IIUC we don't need to touch the
migration framework at all, and we keep all similar things (like other
devices I mentioned) to be inside save_setup().

Would that work?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-09 19:54           ` Peter Xu
@ 2023-01-10 10:18             ` David Hildenbrand
  2023-01-10 11:52               ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-10 10:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 09.01.23 20:54, Peter Xu wrote:
> On Mon, Jan 09, 2023 at 03:34:48PM +0100, David Hildenbrand wrote:
>> On 05.01.23 18:15, Peter Xu wrote:
>>> On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
>>>> On 04.01.23 18:23, Peter Xu wrote:
>>>>> On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
>>>>>> Migrating device state before we start iterating is currently impossible.
>>>>>> Introduce and use qemu_savevm_state_start_precopy(), and use
>>>>>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>>>>>> state will be saved in qemu_savevm_state_start_precopy() or in
>>>>>> qemu_savevm_state_complete_precopy_*().
>>>>>
>>>>> Can something like this be done in qemu_savevm_state_setup()?
>>>>
>>>> Hi Peter,
>>>
>>> Hi, David,
>>>
>>>>
>>>> Do you mean
>>>>
>>>> (a) Moving qemu_savevm_state_start_precopy() effectively into
>>>>       qemu_savevm_state_setup()
>>>>
>>>> (b) Using se->ops->save_setup()
>>>
>>> I meant (b).
>>>
>>>>
>>>> I first tried going via (b), but decided to go the current way of using a
>>>> proper vmstate with properties (instead of e.g., filling the stream
>>>> manually), which also made vmdesc handling possible (and significantly
>>>> cleaner).
>>>>
>>>> Regarding (a), I decided to not move logic of
>>>> qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
>>>> looked cleaner to save device state with the BQL held and for background
>>>> snapshots, the VM has been stopped. To decouple device state saving from the
>>>> setup path, just like we do it right now for all vmstates.
>>>
>>> Is BQL required or optional?  IIUC it's at least still not taken in the
>>> migration thread path, only in savevm path.
>>>
>>>>
>>>> Having that said, for virtio-mem, it would still work because that state is
>>>> immutable once migration starts, but it felt cleaner to separate the setup()
>>>> phase from actual device state saving.
>>>
>>> I get the point.  My major concerns are:
>>>
>>>     (1) The new migration priority is changing the semantic of original,
>>>         making it over-complicated
>>>
>>>     (2) The new precopy-start routine added one more step to the migration
>>>         framework, while it's somehow overlapping (if not to say, mostly the
>>>         same as..) save_setup().
>>>
>>> For (1): the old priority was only deciding the order of save entries in
>>> the global list, nothing more than that.  Even if we want to have a
>>> precopy-start phase, I'd suggest we use something else and keep the
>>> migration priority simple.  Otherwise we really need serious documentation
>>> for MigrationPriority and if so I'd rather don't bother and not reuse the
>>> priority field.
>>>
>>> For (2), if you see there're a bunch of save_setup() that already does
>>> things like transferring static data besides the device states.  Besides
>>> the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
>>> also sends a bitmap during save_setup() and some others.  It looks clean to
>>> me to do it in the same way as we used to.
>>>
>>> Reusing vmstate_save() and vmsd structures are useful too which I totally
>>> agree.  So.. can we just call vmstate_save_state() in the save_setup() of
>>> the other new vmsd of virtio-mem?
>>
>>
>> I went halfway that way, by moving stuff into qemu_savevm_state_setup()
>> and avoiding using a new migration priority. Seems to work:
> 
> The whole point of my suggestion is not moving things into
> qemu_savevm_state_setup(), but avoid introducing more complexity to the
> migration framework if unnecessary, so keep the generic framework as simple
> as possible.

IMHO, the current approach is actually quite simple and clean. But ...
> 
>>
>> I think we could go one step further and perform it from a save_setup() callback,
>> however, I'm not convinced that this gets particularly cleaner (vmdesc handling
>> eventually).
> 
> What I wanted to suggest is exactly trying to avoid vmsd handling.  To be
> more explicit, I mean: besides vmstate_virtio_mem_device_early, virtio-mem
> can register with another new SaveVMHandlers with both save_setup and
> load_setup registered, then e.g. in its save_setup(), one simply calls:

... I can see if it can be made working that way and how the result looks. I know
that we use vmstate_save_state() from virtio code, but I don't remember using
it in save_setup() from QEMU_VM_SECTION_START and not QEMU_VM_SECTION_FULL.


There is this interesting bit in register_savevm_live(), which sets "se->is_ram = 1".
qemu_save_device_state() will not include the state. As it's used by XEN, I don't
particularly care.


> 
>    vmstate_save_state(f, &vmstate_virtio_mem_device_early, virtio_mem_dev,
>                       NULL);
> 
> I'm not sure whether the JSONWriter* is required in this case, maybe not
> yet to at least make it work.

It was required when handling vmstates the current way to make
analyze-migration.py not bail out (which is a good thing because one can
actually inspect the migration content):

$ ./scripts/analyze-migration.py -f STATEFILE
{
     "ram (2)": {
         "section sizes": {
             "0000:00:03.0/mem0": "0x0000000f00000000",
             "pc.ram": "0x0000000100000000",
             "/rom@etc/acpi/tables": "0x0000000000020000",
             "pc.bios": "0x0000000000040000",
             "0000:00:02.0/e1000.rom": "0x0000000000040000",
             "pc.rom": "0x0000000000020000",
             "/rom@etc/table-loader": "0x0000000000001000",
             "/rom@etc/acpi/rsdp": "0x0000000000001000"
         }
     },
     "0000:00:03.0/virtio-mem-device-early (51)": {
         "tmp": "00 00 00 01 40 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
         "size": "0x0000000000000000",
         "bitmap": "00 00 00 00 00 00 00 00 00 00
	...
     },
     "timer (0)": {
...

> 
> We'll need to impl the load part, but then IIUC we don't need to touch the
> migration framework at all, and we keep all similar things (like other
> devices I mentioned) to be inside save_setup().
> 
> Would that work?

Let me play with it.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-10 10:18             ` David Hildenbrand
@ 2023-01-10 11:52               ` David Hildenbrand
  2023-01-10 20:03                 ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-10 11:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 10.01.23 11:18, David Hildenbrand wrote:
> On 09.01.23 20:54, Peter Xu wrote:
>> On Mon, Jan 09, 2023 at 03:34:48PM +0100, David Hildenbrand wrote:
>>> On 05.01.23 18:15, Peter Xu wrote:
>>>> On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
>>>>> On 04.01.23 18:23, Peter Xu wrote:
>>>>>> On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
>>>>>>> Migrating device state before we start iterating is currently impossible.
>>>>>>> Introduce and use qemu_savevm_state_start_precopy(), and use
>>>>>>> a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
>>>>>>> state will be saved in qemu_savevm_state_start_precopy() or in
>>>>>>> qemu_savevm_state_complete_precopy_*().
>>>>>>
>>>>>> Can something like this be done in qemu_savevm_state_setup()?
>>>>>
>>>>> Hi Peter,
>>>>
>>>> Hi, David,
>>>>
>>>>>
>>>>> Do you mean
>>>>>
>>>>> (a) Moving qemu_savevm_state_start_precopy() effectively into
>>>>>        qemu_savevm_state_setup()
>>>>>
>>>>> (b) Using se->ops->save_setup()
>>>>
>>>> I meant (b).
>>>>
>>>>>
>>>>> I first tried going via (b), but decided to go the current way of using a
>>>>> proper vmstate with properties (instead of e.g., filling the stream
>>>>> manually), which also made vmdesc handling possible (and significantly
>>>>> cleaner).
>>>>>
>>>>> Regarding (a), I decided to not move logic of
>>>>> qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
>>>>> looked cleaner to save device state with the BQL held and for background
>>>>> snapshots, the VM has been stopped. To decouple device state saving from the
>>>>> setup path, just like we do it right now for all vmstates.
>>>>
>>>> Is BQL required or optional?  IIUC it's at least still not taken in the
>>>> migration thread path, only in savevm path.
>>>>
>>>>>
>>>>> Having that said, for virtio-mem, it would still work because that state is
>>>>> immutable once migration starts, but it felt cleaner to separate the setup()
>>>>> phase from actual device state saving.
>>>>
>>>> I get the point.  My major concerns are:
>>>>
>>>>      (1) The new migration priority is changing the semantic of original,
>>>>          making it over-complicated
>>>>
>>>>      (2) The new precopy-start routine added one more step to the migration
>>>>          framework, while it's somehow overlapping (if not to say, mostly the
>>>>          same as..) save_setup().
>>>>
>>>> For (1): the old priority was only deciding the order of save entries in
>>>> the global list, nothing more than that.  Even if we want to have a
>>>> precopy-start phase, I'd suggest we use something else and keep the
>>>> migration priority simple.  Otherwise we really need serious documentation
>>>> for MigrationPriority and if so I'd rather don't bother and not reuse the
>>>> priority field.
>>>>
>>>> For (2), if you see there're a bunch of save_setup() that already does
>>>> things like transferring static data besides the device states.  Besides
>>>> the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
>>>> also sends a bitmap during save_setup() and some others.  It looks clean to
>>>> me to do it in the same way as we used to.
>>>>
>>>> Reusing vmstate_save() and vmsd structures are useful too which I totally
>>>> agree.  So.. can we just call vmstate_save_state() in the save_setup() of
>>>> the other new vmsd of virtio-mem?
>>>
>>>
>>> I went halfway that way, by moving stuff into qemu_savevm_state_setup()
>>> and avoiding using a new migration priority. Seems to work:
>>
>> The whole point of my suggestion is not moving things into
>> qemu_savevm_state_setup(), but avoid introducing more complexity to the
>> migration framework if unnecessary, so keep the generic framework as simple
>> as possible.
> 
> IMHO, the current approach is actually quite simple and clean. But ...
>>
>>>
>>> I think we could go one step further and perform it from a save_setup() callback,
>>> however, I'm not convinced that this gets particularly cleaner (vmdesc handling
>>> eventually).
>>
>> What I wanted to suggest is exactly trying to avoid vmsd handling.  To be
>> more explicit, I mean: besides vmstate_virtio_mem_device_early, virtio-mem
>> can register with another new SaveVMHandlers with both save_setup and
>> load_setup registered, then e.g. in its save_setup(), one simply calls:
> 
> ... I can see if it can be made working that way and how the result looks. I know
> that we use vmstate_save_state() from virtio code, but I don't remember using
> it in save_setup() from QEMU_VM_SECTION_START and not QEMU_VM_SECTION_FULL.
> 
> 
> There is this interesting bit in register_savevm_live(), which sets "se->is_ram = 1".
> qemu_save_device_state() will not include the state. As it's used by XEN, I don't
> particularly care.
> 
> 
>>
>>     vmstate_save_state(f, &vmstate_virtio_mem_device_early, virtio_mem_dev,
>>                        NULL);
>>
>> I'm not sure whether the JSONWriter* is required in this case, maybe not
>> yet to at least make it work.
> 
> It was required when handling vmstates the current way to make
> analyze-migration.py not bail out (which is a good thing because one can
> actually inspect the migration content):
> 
> $ ./scripts/analyze-migration.py -f STATEFILE
> {
>       "ram (2)": {
>           "section sizes": {
>               "0000:00:03.0/mem0": "0x0000000f00000000",
>               "pc.ram": "0x0000000100000000",
>               "/rom@etc/acpi/tables": "0x0000000000020000",
>               "pc.bios": "0x0000000000040000",
>               "0000:00:02.0/e1000.rom": "0x0000000000040000",
>               "pc.rom": "0x0000000000020000",
>               "/rom@etc/table-loader": "0x0000000000001000",
>               "/rom@etc/acpi/rsdp": "0x0000000000001000"
>           }
>       },
>       "0000:00:03.0/virtio-mem-device-early (51)": {
>           "tmp": "00 00 00 01 40 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
>           "size": "0x0000000000000000",
>           "bitmap": "00 00 00 00 00 00 00 00 00 00
> 	...
>       },
>       "timer (0)": {
> ...
> 
>>
>> We'll need to impl the load part, but then IIUC we don't need to touch the
>> migration framework at all, and we keep all similar things (like other
>> devices I mentioned) to be inside save_setup().
>>
>> Would that work?
> 
> Let me play with it.
> 

The following seems to work, but makes analyze-migration.py angry:

$ ./scripts/analyze-migration.py -f STATEFILE
Traceback (most recent call last):
   File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
     dump.read(dump_memory = args.memory)
   File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
     classdesc = self.section_classes[section_key]
                 ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: ('0000:00:03.0/virtio-mem-early', 0)


We need the vmdesc to create info for the device.


 From 052d8deaa5341d7abe9c8428b50b50613bfef408 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 21 Dec 2022 10:04:06 +0100
Subject: [PATCH] virtio-mem: Migrate bitmap, size and sanity checks early

The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content.

Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.

In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.

Use a new device property to keep behavior of compat machines
unmodified.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  hw/core/machine.c              |  4 +-
  hw/virtio/virtio-mem.c         | 81 +++++++++++++++++++++++++++++++++-
  include/hw/virtio/virtio-mem.h |  8 ++++
  3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 616f3a207c..29b57f6448 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,7 +41,9 @@
  #include "hw/virtio/virtio-pci.h"
  #include "qom/object_interfaces.h"
  
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+    { "virtio-mem", "x-early-migration", "false" },
+};
  const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
  
  GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 02f7b5469a..54e4f956c8 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -26,11 +26,14 @@
  #include "qapi/visitor.h"
  #include "exec/ram_addr.h"
  #include "migration/misc.h"
+#include "migration/register.h"
  #include "hw/boards.h"
  #include "hw/qdev-properties.h"
  #include CONFIG_DEVICES
  #include "trace.h"
  
+static const SaveVMHandlers vmstate_virtio_mem_device_early_ops;
+
  /*
   * We only had legacy x86 guests that did not support
   * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
@@ -878,6 +881,23 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
  
      host_memory_backend_set_mapped(vmem->memdev, true);
      vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
+    if (vmem->early_migration) {
+        char idstr[256] = "";
+        char *oid;
+
+        /* See unregister_savevm() */
+        oid = vmstate_if_get_id(VMSTATE_IF(vmem));
+        if (oid) {
+            pstrcpy(idstr, sizeof(idstr), oid);
+            pstrcat(idstr, sizeof(idstr), "/");
+            g_free(oid);
+        }
+        pstrcat(idstr, sizeof(idstr), "virtio-mem-early");
+
+        register_savevm_live(idstr, VMSTATE_INSTANCE_ID_ANY, 1,
+                             &vmstate_virtio_mem_device_early_ops,
+                             vmem);
+    }
      qemu_register_reset(virtio_mem_system_reset, vmem);
  
      /*
@@ -899,6 +919,9 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
       */
      memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
      qemu_unregister_reset(virtio_mem_system_reset, vmem);
+    if (vmem->early_migration) {
+        unregister_savevm(VMSTATE_IF(vmem), "virtio-mem-early", vmem);
+    }
      vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
      host_memory_backend_set_mapped(vmem->memdev, false);
      virtio_del_queue(vdev, 0);
@@ -1015,23 +1038,75 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = {
      },
  };
  
+static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+
+    /* With early migration, these fields were already migrated. */
+    return !vmem->early_migration;
+}
+
  static const VMStateDescription vmstate_virtio_mem_device = {
      .name = "virtio-mem-device",
      .minimum_version_id = 1,
      .version_id = 1,
      .priority = MIG_PRI_VIRTIO_MEM,
      .post_load = virtio_mem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists,
+                              VirtIOMEMMigSanityChecks,
+                              vmstate_virtio_mem_sanity_checks),
+        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
+        VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists),
+        VMSTATE_UINT64(requested_size, VirtIOMEM),
+        VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists,
+                            0, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+/*
+ * Transfer properties that are immutable while migration is active early,
+ * such that we have have this information around before migrating any RAM
+ * content.
+ *
+ * Note that virtio_mem_is_busy() makes sure these properties can no longer
+ * change on the migration source until migration completed.
+ *
+ * With QEMU compat machines, we transmit these properties later, via
+ * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists().
+ */
+static const VMStateDescription vmstate_virtio_mem_device_early = {
+    .name = "virtio-mem-device-early",
+    .minimum_version_id = 1,
+    .version_id = 1,
      .fields = (VMStateField[]) {
          VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                           vmstate_virtio_mem_sanity_checks),
-        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
          VMSTATE_UINT64(size, VirtIOMEM),
-        VMSTATE_UINT64(requested_size, VirtIOMEM),
          VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
          VMSTATE_END_OF_LIST()
      },
  };
  
+static int virtio_mem_save_setup_early(QEMUFile *f, void *opaque)
+{
+    return vmstate_save_state(f, &vmstate_virtio_mem_device_early, opaque,
+                              NULL);
+}
+
+static int virtio_mem_load_state_early(QEMUFile *f, void *opaque,
+                                       int version_id)
+{
+    return vmstate_load_state(f, &vmstate_virtio_mem_device_early, opaque,
+                              vmstate_virtio_mem_device_early.version_id);
+}
+
+static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
+    .save_setup = virtio_mem_save_setup_early,
+    .load_state = virtio_mem_load_state_early,
+};
+
  static const VMStateDescription vmstate_virtio_mem = {
      .name = "virtio-mem",
      .minimum_version_id = 1,
@@ -1211,6 +1286,8 @@ static Property virtio_mem_properties[] = {
      DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
                              unplugged_inaccessible, ON_OFF_AUTO_AUTO),
  #endif
+    DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM,
+                     early_migration, true),
      DEFINE_PROP_END_OF_LIST(),
  };
  
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 7745cfc1a3..f15e561785 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
  #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
  #define VIRTIO_MEM_ADDR_PROP "memaddr"
  #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
+#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration"
  #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
  
  struct VirtIOMEM {
@@ -74,6 +75,13 @@ struct VirtIOMEM {
      /* whether to prealloc memory when plugging new blocks */
      bool prealloc;
  
+    /*
+     * Whether we migrate properties that are immutable while migration is
+     * active early, before state of other devices and especially, before
+     * migrating any RAM content.
+     */
+    bool early_migration;
+
      /* notifiers to notify when "size" changes */
      NotifierList size_change_notifiers;
  
-- 
2.39.0



-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-10 11:52               ` David Hildenbrand
@ 2023-01-10 20:03                 ` Peter Xu
  2023-01-11 13:48                   ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-10 20:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
> The following seems to work,

That looks much better at least from the diffstat pov (comparing to the
existing patch 1+5 and the framework changes), thanks.

> but makes analyze-migration.py angry:
> 
> $ ./scripts/analyze-migration.py -f STATEFILE
> Traceback (most recent call last):
>   File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
>     dump.read(dump_memory = args.memory)
>   File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
>     classdesc = self.section_classes[section_key]
>                 ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
> KeyError: ('0000:00:03.0/virtio-mem-early', 0)
> 
> 
> We need the vmdesc to create info for the device.

Migration may ignore the save entry if save_state() not provided in the
"devices" section:

        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
            continue;
        }

Could you try providing a shim save_state() for the new virtio-mem save
entry?

/*
 * Shim function to make sure the save entry will be dumped into "devices"
 * section, to make analyze-migration.py happy.
 */
static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
{
}

Then:

static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
    .save_setup = virtio_mem_save_setup_early,
    .save_state = virtio_mem_save_state_early,
    .load_state = virtio_mem_load_state_early,
};

I'm not 100% sure it'll work yet, but maybe worth trying.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-10 20:03                 ` Peter Xu
@ 2023-01-11 13:48                   ` David Hildenbrand
  2023-01-11 16:35                     ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-11 13:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 10.01.23 21:03, Peter Xu wrote:
> On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
>> The following seems to work,
> 
> That looks much better at least from the diffstat pov (comparing to the
> existing patch 1+5 and the framework changes), thanks.
> 
>> but makes analyze-migration.py angry:
>>
>> $ ./scripts/analyze-migration.py -f STATEFILE
>> Traceback (most recent call last):
>>    File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
>>      dump.read(dump_memory = args.memory)
>>    File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
>>      classdesc = self.section_classes[section_key]
>>                  ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
>> KeyError: ('0000:00:03.0/virtio-mem-early', 0)
>>
>>
>> We need the vmdesc to create info for the device.
> 
> Migration may ignore the save entry if save_state() not provided in the
> "devices" section:
> 
>          if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
>              continue;
>          }
> 
> Could you try providing a shim save_state() for the new virtio-mem save
> entry?
> 
> /*
>   * Shim function to make sure the save entry will be dumped into "devices"
>   * section, to make analyze-migration.py happy.
>   */
> static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
> {
> }
> 
> Then:
> 
> static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
>      .save_setup = virtio_mem_save_setup_early,
>      .save_state = virtio_mem_save_state_early,
>      .load_state = virtio_mem_load_state_early,
> };
> 
> I'm not 100% sure it'll work yet, but maybe worth trying.

It doesn't. virtio_mem_load_state_early() will get called twice (once 
with state saved during save_setup() and once with effectively nothing 
during save_state(), which breaks the whole migration).

vmdesc handling is also wrong, because analyze-migration.py gets 
confused because it receives data stored during save_setup() but vmdesc 
created during save_state() was told that there would be "nothing" to 
interpret ...

$ ./scripts/analyze-migration.py -f STATEFILE
{
     "ram (2)": {
         "section sizes": {
             "0000:00:03.0/mem0": "0x0000000f00000000",
             "pc.ram": "0x0000000100000000",
             "/rom@etc/acpi/tables": "0x0000000000020000",
             "pc.bios": "0x0000000000040000",
             "0000:00:02.0/e1000.rom": "0x0000000000040000",
             "pc.rom": "0x0000000000020000",
             "/rom@etc/table-loader": "0x0000000000001000",
             "/rom@etc/acpi/rsdp": "0x0000000000001000"
         }
     },
     "0000:00:03.0/virtio-mem-early (51)": {
         "data": ""
     }
}


Not sure if the whole thing becomes nicer when manually looking up the 
vmdesc ... because filling it will also requires manually storing the 
se->idstr and the se->instance_id, whereby both are migration-internal 
and not available inside save_setup().


Hm, I really prefer something like the simplified version that let's 
migration core deal with vmstate to be migrated during save_setup() 
phase. We could avoid the vmstate->immutable flag and simply use a 
separate function for registering the vmstate, like:

vmstate_register_immutable()
vmstate_register_early()
...

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-11 13:48                   ` David Hildenbrand
@ 2023-01-11 16:35                     ` Peter Xu
  2023-01-11 16:58                       ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-11 16:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:
> On 10.01.23 21:03, Peter Xu wrote:
> > On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
> > > The following seems to work,
> > 
> > That looks much better at least from the diffstat pov (comparing to the
> > existing patch 1+5 and the framework changes), thanks.
> > 
> > > but makes analyze-migration.py angry:
> > > 
> > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > Traceback (most recent call last):
> > >    File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
> > >      dump.read(dump_memory = args.memory)
> > >    File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
> > >      classdesc = self.section_classes[section_key]
> > >                  ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
> > > KeyError: ('0000:00:03.0/virtio-mem-early', 0)
> > > 
> > > 
> > > We need the vmdesc to create info for the device.
> > 
> > Migration may ignore the save entry if save_state() not provided in the
> > "devices" section:
> > 
> >          if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> >              continue;
> >          }
> > 
> > Could you try providing a shim save_state() for the new virtio-mem save
> > entry?
> > 
> > /*
> >   * Shim function to make sure the save entry will be dumped into "devices"
> >   * section, to make analyze-migration.py happy.
> >   */
> > static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
> > {
> > }
> > 
> > Then:
> > 
> > static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
> >      .save_setup = virtio_mem_save_setup_early,
> >      .save_state = virtio_mem_save_state_early,
> >      .load_state = virtio_mem_load_state_early,
> > };
> > 
> > I'm not 100% sure it'll work yet, but maybe worth trying.
> 
> It doesn't. virtio_mem_load_state_early() will get called twice (once with
> state saved during save_setup() and once with effectively nothing during
> save_state(), which breaks the whole migration).

Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
with load_setup(), which I just noticed..  Then the load_state() needs to
be another shim like save_state().

> 
> vmdesc handling is also wrong, because analyze-migration.py gets confused
> because it receives data stored during save_setup() but vmdesc created
> during save_state() was told that there would be "nothing" to interpret ...
> 
> $ ./scripts/analyze-migration.py -f STATEFILE
> {
>     "ram (2)": {
>         "section sizes": {
>             "0000:00:03.0/mem0": "0x0000000f00000000",
>             "pc.ram": "0x0000000100000000",
>             "/rom@etc/acpi/tables": "0x0000000000020000",
>             "pc.bios": "0x0000000000040000",
>             "0000:00:02.0/e1000.rom": "0x0000000000040000",
>             "pc.rom": "0x0000000000020000",
>             "/rom@etc/table-loader": "0x0000000000001000",
>             "/rom@etc/acpi/rsdp": "0x0000000000001000"
>         }
>     },
>     "0000:00:03.0/virtio-mem-early (51)": {
>         "data": ""
>     }
> }

Yeah this is expected, because the whole data stream within the setup phase
is a black box and not described by anything.  "ram" can display correctly
only because it's hard coded in the python script, I think.  The trick
above can only make the script not break but not teaching the script to
also check for the early vmsd.

But that's one step forward and IMHO it's fine for special devices. For
example, even with your initial patch, I think the static analyzer (aka,
qemu -dump-vmstate) will also ignore your new vmsd anyway because it's not
directly bound to the DeviceState* but registered separately.  So no ideal
solution so far, afaict, without more work done on this aspect.

> 
> 
> Not sure if the whole thing becomes nicer when manually looking up the
> vmdesc ... because filling it will also requires manually storing the
> se->idstr and the se->instance_id, whereby both are migration-internal and
> not available inside save_setup().
> 
> 
> Hm, I really prefer something like the simplified version that let's
> migration core deal with vmstate to be migrated during save_setup() phase.
> We could avoid the vmstate->immutable flag and simply use a separate
> function for registering the vmstate, like:
> 
> vmstate_register_immutable()
> vmstate_register_early()
> ...

I agree, this looks useful.  It's just that the devices that need this will
be rare, and unfortunately some of them already implemented the stream in
other ways so maybe non-trivial to convert them.

Not against it if you want to keep exploring, but if so please avoid using
the priority field, also I'd hope the early vmsd will be saved within
qemu_savevm_state_setup() so maybe it can be another alternative to
save_setup().

Maybe it's still easier we keep going with the save_setup() and the shim
functions above, if it works for you.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-11 16:35                     ` Peter Xu
@ 2023-01-11 16:58                       ` David Hildenbrand
  2023-01-11 17:28                         ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-01-11 16:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 11.01.23 17:35, Peter Xu wrote:
> On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:
>> On 10.01.23 21:03, Peter Xu wrote:
>>> On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
>>>> The following seems to work,
>>>
>>> That looks much better at least from the diffstat pov (comparing to the
>>> existing patch 1+5 and the framework changes), thanks.
>>>
>>>> but makes analyze-migration.py angry:
>>>>
>>>> $ ./scripts/analyze-migration.py -f STATEFILE
>>>> Traceback (most recent call last):
>>>>     File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
>>>>       dump.read(dump_memory = args.memory)
>>>>     File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
>>>>       classdesc = self.section_classes[section_key]
>>>>                   ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
>>>> KeyError: ('0000:00:03.0/virtio-mem-early', 0)
>>>>
>>>>
>>>> We need the vmdesc to create info for the device.
>>>
>>> Migration may ignore the save entry if save_state() not provided in the
>>> "devices" section:
>>>
>>>           if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
>>>               continue;
>>>           }
>>>
>>> Could you try providing a shim save_state() for the new virtio-mem save
>>> entry?
>>>
>>> /*
>>>    * Shim function to make sure the save entry will be dumped into "devices"
>>>    * section, to make analyze-migration.py happy.
>>>    */
>>> static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
>>> {
>>> }
>>>
>>> Then:
>>>
>>> static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
>>>       .save_setup = virtio_mem_save_setup_early,
>>>       .save_state = virtio_mem_save_state_early,
>>>       .load_state = virtio_mem_load_state_early,
>>> };
>>>
>>> I'm not 100% sure it'll work yet, but maybe worth trying.
>>
>> It doesn't. virtio_mem_load_state_early() will get called twice (once with
>> state saved during save_setup() and once with effectively nothing during
>> save_state(), which breaks the whole migration).
> 
> Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
> with load_setup(), which I just noticed..  Then the load_state() needs to
> be another shim like save_state().

Let me see if that improves the situation. E.g., ram.c writes state in 
save_setup() but doesn't load any state in load_setup() -- it's all done 
in load_state().

... no, still not working. It will read garbage during load_setup() now.

qemu-system-x86_64: Property 'memaddr' changed from 0x100000002037261 to 
0x140000000
qemu-system-x86_64: Failed to load virtio-mem-device-early:tmp
qemu-system-x86_64: Load state of device 0000:00:03.0/virtio-mem-early 
failed
qemu-system-x86_64: load of migration failed: Invalid argument


I don't think that load_setup() is supposed to consume anything useful 
from the migration stream. I suspects it should actually not even 
consume a QEMU file right now, because they way it's used is just for 
initializing stuff.

qemu_loadvm_state_main() does the actual loading part, parsing sections 
etc. qemu_loadvm_state_setup() doesn't do any of that.

AFAIKS, at least qemu_loadvm_state_setup() would have to parse sections 
and the save_setup() users would have to be converted into using 
load_setup() as well. Not sure if more is missing.

Even with that, I doubt that it would make analyze-migration.py work, 
because what we store is something different than what we record in the 
vmdesc.

> 
>>
>> vmdesc handling is also wrong, because analyze-migration.py gets confused
>> because it receives data stored during save_setup() but vmdesc created
>> during save_state() was told that there would be "nothing" to interpret ...
>>
>> $ ./scripts/analyze-migration.py -f STATEFILE
>> {
>>      "ram (2)": {
>>          "section sizes": {
>>              "0000:00:03.0/mem0": "0x0000000f00000000",
>>              "pc.ram": "0x0000000100000000",
>>              "/rom@etc/acpi/tables": "0x0000000000020000",
>>              "pc.bios": "0x0000000000040000",
>>              "0000:00:02.0/e1000.rom": "0x0000000000040000",
>>              "pc.rom": "0x0000000000020000",
>>              "/rom@etc/table-loader": "0x0000000000001000",
>>              "/rom@etc/acpi/rsdp": "0x0000000000001000"
>>          }
>>      },
>>      "0000:00:03.0/virtio-mem-early (51)": {
>>          "data": ""
>>      }
>> }
> 
> Yeah this is expected, because the whole data stream within the setup phase
> is a black box and not described by anything.  "ram" can display correctly
> only because it's hard coded in the python script, I think.  The trick
> above can only make the script not break but not teaching the script to
> also check for the early vmsd.

Note that the issue here is that the scripts stops the output after the 
virtio-mem device. So I'm not complaining about the "data": "", but 
about the vmstate according to the vmdesc not having any other devices :)

> 
> But that's one step forward and IMHO it's fine for special devices. For
> example, even with your initial patch, I think the static analyzer (aka,
> qemu -dump-vmstate) will also ignore your new vmsd anyway because it's not
> directly bound to the DeviceState* but registered separately.  So no ideal
> solution so far, afaict, without more work done on this aspect.
> 
>>
>>
>> Not sure if the whole thing becomes nicer when manually looking up the
>> vmdesc ... because filling it will also requires manually storing the
>> se->idstr and the se->instance_id, whereby both are migration-internal and
>> not available inside save_setup().
>>
>>
>> Hm, I really prefer something like the simplified version that let's
>> migration core deal with vmstate to be migrated during save_setup() phase.
>> We could avoid the vmstate->immutable flag and simply use a separate
>> function for registering the vmstate, like:
>>
>> vmstate_register_immutable()
>> vmstate_register_early()
>> ...
> 
> I agree, this looks useful.  It's just that the devices that need this will
> be rare, and unfortunately some of them already implemented the stream in
> other ways so maybe non-trivial to convert them.

Right, I agree about the "rare" part and that conversion might be hard, 
because they are not using a vmstate descriptor.

The only way to avoid that is moving away from using a vmstate 
descriptor and storing everything manually in virtio-mem code. Not 
particularly happy about that, but it would be the only feasible option 
without touching migration code that I can see.

> 
> Not against it if you want to keep exploring, but if so please avoid using
> the priority field, also I'd hope the early vmsd will be saved within
> qemu_savevm_state_setup() so maybe it can be another alternative to
> save_setup().
> 
> Maybe it's still easier we keep going with the save_setup() and the shim
> functions above, if it works for you.

I'll happy to go the path you suggested if we can make it work. I'd be 
happy to have something "reasonable" for the virtio-mem device in the 
analyze-migration.py output. But I could live with just nothing useful 
for the device itself -- as long as at least the other devices still 
show up.

Thanks Peter!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-11 16:58                       ` David Hildenbrand
@ 2023-01-11 17:28                         ` Peter Xu
  2023-01-11 17:44                           ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2023-01-11 17:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Wed, Jan 11, 2023 at 05:58:36PM +0100, David Hildenbrand wrote:
> On 11.01.23 17:35, Peter Xu wrote:
> > On Wed, Jan 11, 2023 at 02:48:09PM +0100, David Hildenbrand wrote:
> > > On 10.01.23 21:03, Peter Xu wrote:
> > > > On Tue, Jan 10, 2023 at 12:52:32PM +0100, David Hildenbrand wrote:
> > > > > The following seems to work,
> > > > 
> > > > That looks much better at least from the diffstat pov (comparing to the
> > > > existing patch 1+5 and the framework changes), thanks.
> > > > 
> > > > > but makes analyze-migration.py angry:
> > > > > 
> > > > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > > > Traceback (most recent call last):
> > > > >     File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 605, in <module>
> > > > >       dump.read(dump_memory = args.memory)
> > > > >     File "/home/dhildenb/git/qemu/./scripts/analyze-migration.py", line 539, in read
> > > > >       classdesc = self.section_classes[section_key]
> > > > >                   ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
> > > > > KeyError: ('0000:00:03.0/virtio-mem-early', 0)
> > > > > 
> > > > > 
> > > > > We need the vmdesc to create info for the device.
> > > > 
> > > > Migration may ignore the save entry if save_state() not provided in the
> > > > "devices" section:
> > > > 
> > > >           if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> > > >               continue;
> > > >           }
> > > > 
> > > > Could you try providing a shim save_state() for the new virtio-mem save
> > > > entry?
> > > > 
> > > > /*
> > > >    * Shim function to make sure the save entry will be dumped into "devices"
> > > >    * section, to make analyze-migration.py happy.
> > > >    */
> > > > static void virtio_mem_save_state_early(QEMUFile *file, void *opaque)
> > > > {
> > > > }
> > > > 
> > > > Then:
> > > > 
> > > > static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
> > > >       .save_setup = virtio_mem_save_setup_early,
> > > >       .save_state = virtio_mem_save_state_early,
> > > >       .load_state = virtio_mem_load_state_early,
> > > > };
> > > > 
> > > > I'm not 100% sure it'll work yet, but maybe worth trying.
> > > 
> > > It doesn't. virtio_mem_load_state_early() will get called twice (once with
> > > state saved during save_setup() and once with effectively nothing during
> > > save_state(), which breaks the whole migration).
> > 
> > Oh hold on.. so load_state() to pair save_setup()? Maybe you should pair it
> > with load_setup(), which I just noticed..  Then the load_state() needs to
> > be another shim like save_state().
> 
> Let me see if that improves the situation. E.g., ram.c writes state in
> save_setup() but doesn't load any state in load_setup() -- it's all done in
> load_state().
> 
> ... no, still not working. It will read garbage during load_setup() now.
> 
> qemu-system-x86_64: Property 'memaddr' changed from 0x100000002037261 to
> 0x140000000
> qemu-system-x86_64: Failed to load virtio-mem-device-early:tmp
> qemu-system-x86_64: Load state of device 0000:00:03.0/virtio-mem-early
> failed
> qemu-system-x86_64: load of migration failed: Invalid argument
> 
> 
> I don't think that load_setup() is supposed to consume anything useful from
> the migration stream. I suspects it should actually not even consume a QEMU
> file right now, because they way it's used is just for initializing stuff.
> 
> qemu_loadvm_state_main() does the actual loading part, parsing sections etc.
> qemu_loadvm_state_setup() doesn't do any of that.
> 
> AFAIKS, at least qemu_loadvm_state_setup() would have to parse sections and
> the save_setup() users would have to be converted into using load_setup() as
> well. Not sure if more is missing.

Ouch, yeah, load_setup() is unfortunate. :(

I think load_setup() should be named load_init() without having the
qemufile at all.  But let's keep that aside for now, and see what other
option we have..

> 
> Even with that, I doubt that it would make analyze-migration.py work,
> because what we store is something different than what we record in the
> vmdesc.
> 
> > 
> > > 
> > > vmdesc handling is also wrong, because analyze-migration.py gets confused
> > > because it receives data stored during save_setup() but vmdesc created
> > > during save_state() was told that there would be "nothing" to interpret ...
> > > 
> > > $ ./scripts/analyze-migration.py -f STATEFILE
> > > {
> > >      "ram (2)": {
> > >          "section sizes": {
> > >              "0000:00:03.0/mem0": "0x0000000f00000000",
> > >              "pc.ram": "0x0000000100000000",
> > >              "/rom@etc/acpi/tables": "0x0000000000020000",
> > >              "pc.bios": "0x0000000000040000",
> > >              "0000:00:02.0/e1000.rom": "0x0000000000040000",
> > >              "pc.rom": "0x0000000000020000",
> > >              "/rom@etc/table-loader": "0x0000000000001000",
> > >              "/rom@etc/acpi/rsdp": "0x0000000000001000"
> > >          }
> > >      },
> > >      "0000:00:03.0/virtio-mem-early (51)": {
> > >          "data": ""
> > >      }
> > > }
> > 
> > Yeah this is expected, because the whole data stream within the setup phase
> > is a black box and not described by anything.  "ram" can display correctly
> > only because it's hard coded in the python script, I think.  The trick
> > above can only make the script not break but not teaching the script to
> > also check for the early vmsd.
> 
> Note that the issue here is that the scripts stops the output after the
> virtio-mem device. So I'm not complaining about the "data": "", but about
> the vmstate according to the vmdesc not having any other devices :)
> 
> > 
> > But that's one step forward and IMHO it's fine for special devices. For
> > example, even with your initial patch, I think the static analyzer (aka,
> > qemu -dump-vmstate) will also ignore your new vmsd anyway because it's not
> > directly bound to the DeviceState* but registered separately.  So no ideal
> > solution so far, afaict, without more work done on this aspect.
> > 
> > > 
> > > 
> > > Not sure if the whole thing becomes nicer when manually looking up the
> > > vmdesc ... because filling it will also requires manually storing the
> > > se->idstr and the se->instance_id, whereby both are migration-internal and
> > > not available inside save_setup().
> > > 
> > > 
> > > Hm, I really prefer something like the simplified version that let's
> > > migration core deal with vmstate to be migrated during save_setup() phase.
> > > We could avoid the vmstate->immutable flag and simply use a separate
> > > function for registering the vmstate, like:
> > > 
> > > vmstate_register_immutable()
> > > vmstate_register_early()
> > > ...
> > 
> > I agree, this looks useful.  It's just that the devices that need this will
> > be rare, and unfortunately some of them already implemented the stream in
> > other ways so maybe non-trivial to convert them.
> 
> Right, I agree about the "rare" part and that conversion might be hard,
> because they are not using a vmstate descriptor.
> 
> The only way to avoid that is moving away from using a vmstate descriptor
> and storing everything manually in virtio-mem code. Not particularly happy
> about that, but it would be the only feasible option without touching
> migration code that I can see.
> 
> > 
> > Not against it if you want to keep exploring, but if so please avoid using
> > the priority field, also I'd hope the early vmsd will be saved within
> > qemu_savevm_state_setup() so maybe it can be another alternative to
> > save_setup().
> > 
> > Maybe it's still easier we keep going with the save_setup() and the shim
> > functions above, if it works for you.
> 
> I'll happy to go the path you suggested if we can make it work. I'd be happy
> to have something "reasonable" for the virtio-mem device in the
> analyze-migration.py output. But I could live with just nothing useful for
> the device itself -- as long as at least the other devices still show up.

So, let's see whether we can go back to the load_state() approach..

static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
      .save_setup = virtio_mem_save_setup_early,
      .save_state = virtio_mem_save_state_early,
      .load_state = virtio_mem_load_state_early,
};

vfio handled it using a single header flag for either save_setup() or
save_state(), then load_state() identifies it:

    data = qemu_get_be64(f);
    ...
        switch (data) {
        case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
        ...
        case VFIO_MIG_FLAG_DEV_SETUP_STATE:
        ...

Maybe play the same trick here?  The flag needs to be hard coded but at
least not the vmsd.  Based on previous experience, I could have missed
something else, though. :)

Or if you like go the other way I'm fine too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-11 17:28                         ` Peter Xu
@ 2023-01-11 17:44                           ` David Hildenbrand
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-01-11 17:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Dr . David Alan Gilbert, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

>>>
>>> Not against it if you want to keep exploring, but if so please avoid using
>>> the priority field, also I'd hope the early vmsd will be saved within
>>> qemu_savevm_state_setup() so maybe it can be another alternative to
>>> save_setup().
>>>
>>> Maybe it's still easier we keep going with the save_setup() and the shim
>>> functions above, if it works for you.
>>
>> I'll happy to go the path you suggested if we can make it work. I'd be happy
>> to have something "reasonable" for the virtio-mem device in the
>> analyze-migration.py output. But I could live with just nothing useful for
>> the device itself -- as long as at least the other devices still show up.
> 
> So, let's see whether we can go back to the load_state() approach..
> 
> static const SaveVMHandlers vmstate_virtio_mem_device_early_ops = {
>        .save_setup = virtio_mem_save_setup_early,
>        .save_state = virtio_mem_save_state_early,
>        .load_state = virtio_mem_load_state_early,
> };
> 
> vfio handled it using a single header flag for either save_setup() or
> save_state(), then load_state() identifies it:
> 
>      data = qemu_get_be64(f);
>      ...
>          switch (data) {
>          case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
>          ...
>          case VFIO_MIG_FLAG_DEV_SETUP_STATE:
>          ...
> 
> Maybe play the same trick here?  The flag needs to be hard coded but at
> least not the vmsd.  Based on previous experience, I could have missed
> something else, though. :)

I could also remember "internally" if load_state() was already called 
throughout the migartion I think.

But, IIUC, it will still make analyze-migration.py produce wrong 
results, because of the vmdesc mismatch.

> 
> Or if you like go the other way I'm fine too.

IMHO my approach will be cleaner on the device side but will require 
migration code changes. I'll try getting that as clean as possible for 
now and resend. If there are more ideas on how to get the other approach 
running, I'll be happy to try.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-01-11 17:45 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 11:02 [PATCH v3 0/6] virtio-mem: Handle preallocation with migration David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
2022-12-23  9:34   ` David Hildenbrand
2023-01-05  1:27     ` Michael S. Tsirkin
2023-01-05  8:20       ` David Hildenbrand
2023-01-04 17:23   ` Peter Xu
2023-01-05  8:35     ` David Hildenbrand
2023-01-05 17:15       ` Peter Xu
2023-01-09 14:34         ` David Hildenbrand
2023-01-09 19:54           ` Peter Xu
2023-01-10 10:18             ` David Hildenbrand
2023-01-10 11:52               ` David Hildenbrand
2023-01-10 20:03                 ` Peter Xu
2023-01-11 13:48                   ` David Hildenbrand
2023-01-11 16:35                     ` Peter Xu
2023-01-11 16:58                       ` David Hildenbrand
2023-01-11 17:28                         ` Peter Xu
2023-01-11 17:44                           ` David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
2023-01-05 17:33   ` Dr. David Alan Gilbert
2022-12-22 11:02 ` [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy David Hildenbrand
2023-01-05 17:18   ` Peter Xu
2023-01-09 14:39     ` David Hildenbrand
2023-01-09 14:42       ` David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early David Hildenbrand
2022-12-22 11:02 ` [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.