qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] virtio-mem: Handle preallocation with migration
@ 2023-01-12 16:43 David Hildenbrand
  2023-01-12 16:43 ` [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save() David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

While playing with migration of virtio-mem with an ordinary file backing,
I realized that migration and prealloc doesn't currently work as expected
for virtio-mem. Further, Jing Qi reported that setup issues (insufficient
huge pages on the destination) result in QEMU getting killed with SIGBUS
instead of failing gracefully.

In contrast to ordinary memory backend preallocation, virtio-mem
preallocates memory before plugging blocks to the guest. Consequently,
when migrating we are not actually preallocating on the destination but
"only" migrate pages. Fix that be migrating the bitmap early, before any
RAM content, and use that information to preallocate memory early, before
migrating any RAM.

Postcopy needs some extra care, and I realized that prealloc+postcopy is
shaky in general. Let's at least try to mimic what ordinary
prealloc+postcopy does: temporarily allocate the memory, discard it, and
cross fingers that we'll still have sufficient memory when postcopy
actually tries placing pages.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Privoznik <mprivozn@redhat.com>

v3 -> v4:
- First 3 patches:
-- Minimze code changes and simplify
-- Save immutable device state during qemu_savevm_state_setup()
-- Don't use vmsd priorities, use a new flag
-- Split it logically up
- "migration/ram: Factor out check for advised postcopy"
-- Don't factor out postcopy_is_running()
- "virtio-mem: Migrate immutable properties early"
-- Adjust to changed vmsd interface
- "virtio-mem: Proper support for preallocation with migration"
-- Drop sanity check in virtio_mem_post_load_early()

v2 -> v3:
- New approach/rewrite, drop RB and TB of last patch

v1 -> v2:
- Added RBs and Tested-bys
- "virtio-mem: Fail if a memory backend with "prealloc=on" is specified"
-- Fail instead of warn
-- Adjust subject/description


David Hildenbrand (8):
  migration/savevm: Move more savevm handling into vmstate_save()
  migration/savevm: Prepare vmdesc json writer in
    qemu_savevm_state_setup()
  migration/savevm: Allow immutable device state to be migrated early
    (i.e., before RAM)
  migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and
    VMSTATE_BITMAP_TEST()
  migration/ram: Factor out check for advised postcopy
  virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  virtio-mem: Migrate immutable properties early
  virtio-mem: Proper support for preallocation with migration

 hw/core/machine.c              |   4 +-
 hw/virtio/virtio-mem.c         | 144 ++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-mem.h |   8 ++
 include/migration/misc.h       |   4 +-
 include/migration/vmstate.h    |  17 +++-
 migration/migration.c          |  11 +++
 migration/migration.h          |   4 +
 migration/ram.c                |   8 +-
 migration/savevm.c             | 105 +++++++++++++-----------
 9 files changed, 247 insertions(+), 58 deletions(-)

-- 
2.39.0



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save()
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
@ 2023-01-12 16:43 ` David Hildenbrand
  2023-01-12 16:58   ` Dr. David Alan Gilbert
  2023-01-12 16:43 ` [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

Let's move more code into vmstate_save(), reducing code duplication and
preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
have to move vmstate_save() to make the compiler happy.

We'll now also trace from qemu_save_device_state().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/savevm.c | 79 ++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..d8830297e4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -897,17 +897,6 @@ static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
     }
 }
 
-static int vmstate_save(QEMUFile *f, SaveStateEntry *se,
-                        JSONWriter *vmdesc)
-{
-    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
-    if (!se->vmsd) {
-        vmstate_save_old_style(f, se, vmdesc);
-        return 0;
-    }
-    return vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
-}
-
 /*
  * Write the header for device section (QEMU_VM_SECTION START/END/PART/FULL)
  */
@@ -941,6 +930,43 @@ static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
     }
 }
 
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
+{
+    int ret;
+
+    if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
+        return 0;
+    }
+    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+        trace_savevm_section_skip(se->idstr, se->section_id);
+        return 0;
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+    if (vmdesc) {
+        json_writer_start_object(vmdesc, NULL);
+        json_writer_str(vmdesc, "name", se->idstr);
+        json_writer_int64(vmdesc, "instance_id", se->instance_id);
+    }
+
+    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
+    if (!se->vmsd) {
+        vmstate_save_old_style(f, se, vmdesc);
+    } else {
+        ret = vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    trace_savevm_section_end(se->idstr, se->section_id, 0);
+    save_section_footer(f, se);
+    if (vmdesc) {
+        json_writer_end_object(vmdesc);
+    }
+    return 0;
+}
 /**
  * qemu_savevm_command_send: Send a 'QEMU_VM_COMMAND' type element with the
  *                           command and associated data.
@@ -1374,31 +1400,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
     json_writer_start_array(vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-
-        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
-            continue;
-        }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            trace_savevm_section_skip(se->idstr, se->section_id);
-            continue;
-        }
-
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
             return ret;
         }
-        trace_savevm_section_end(se->idstr, se->section_id, 0);
-        save_section_footer(f, se);
-
-        json_writer_end_object(vmdesc);
     }
 
     if (inactivate_disks) {
@@ -1594,21 +1600,10 @@ int qemu_save_device_state(QEMUFile *f)
         if (se->is_ram) {
             continue;
         }
-        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
-            continue;
-        }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            continue;
-        }
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
-
         ret = vmstate_save(f, se, NULL);
         if (ret) {
             return ret;
         }
-
-        save_section_footer(f, se);
     }
 
     qemu_put_byte(f, QEMU_VM_EOF);
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
  2023-01-12 16:43 ` [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save() David Hildenbrand
@ 2023-01-12 16:43 ` David Hildenbrand
  2023-01-12 17:43   ` Dr. David Alan Gilbert
  2023-01-12 16:43 ` [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

... and store it in the migration state. This is a preparation for
storing selected vmds's already in qemu_savevm_state_setup().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/migration.c |  4 ++++
 migration/migration.h |  4 ++++
 migration/savevm.c    | 18 ++++++++++++------
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..1d33a7efa0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
     s->vm_was_running = false;
     s->iteration_initial_bytes = 0;
     s->threshold_size = 0;
+
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;
 }
 
 int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -4445,6 +4448,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
+    json_writer_free(ms->vmdesc);
 }
 
 static void migration_instance_init(Object *obj)
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+    JSONWriter *vmdesc;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/savevm.c b/migration/savevm.c
index d8830297e4..ff2b8d0064 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
 #include "postcopy-ram.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
@@ -1189,10 +1188,16 @@ bool qemu_savevm_state_guest_unplug_pending(void)
 
 void qemu_savevm_state_setup(QEMUFile *f)
 {
+    MigrationState *ms = migrate_get_current();
     SaveStateEntry *se;
     Error *local_err = NULL;
     int ret;
 
+    ms->vmdesc = json_writer_new(false);
+    json_writer_start_object(ms->vmdesc, NULL);
+    json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(ms->vmdesc, "devices");
+
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_setup) {
@@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
@@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
     }
 
+    /* Free it now to detect any inconsistencies. */
+    json_writer_free(vmdesc);
+    ms->vmdesc = NULL;
+
     return 0;
 }
 
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
  2023-01-12 16:43 ` [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save() David Hildenbrand
  2023-01-12 16:43 ` [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() David Hildenbrand
@ 2023-01-12 16:43 ` David Hildenbrand
  2023-01-12 17:56   ` Dr. David Alan Gilbert
  2023-01-12 16:43 ` [PATCH v3 4/8] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content, and perform
sanity checks before touching anything on the destination. This
information is immutable on the migration source while migration is active,

We want to use this information for proper preallocation support with
migration: currently, we don't preallocate memory on the migration target,
and especially with hugetlb, we can easily run out of hugetlb pages during
RAM migration and will crash (SIGBUS) instead of catching this gracefully
via preallocation.

Migrating device state via a vmsd before we start iterating is currently
impossible: the only approach that would be possible is avoiding a vmsd
and migrating state manually during save_setup(), to be restored during
load_state().

Let's allow for migrating device state via a vmsd early, during the
setup phase in qemu_savevm_state_setup(). To keep it simple, we
indicate applicable vmds's using an "immutable" flag.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of such early state migration.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/vmstate.h |  5 +++++
 migration/savevm.c          | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..dd06c3abad 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -179,6 +179,11 @@ struct VMStateField {
 struct VMStateDescription {
     const char *name;
     int unmigratable;
+    /*
+     * The state is immutable while migration is active and is saved
+     * during the setup phase, to be restored early on the destination.
+     */
+    int immutable;
     int version_id;
     int minimum_version_id;
     MigrationPriority priority;
diff --git a/migration/savevm.c b/migration/savevm.c
index ff2b8d0064..536d6f662b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
 
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->immutable) {
+            ret = vmstate_save(f, se, ms->vmdesc);
+            if (ret) {
+                qemu_file_set_error(f, ret);
+                break;
+            }
+            continue;
+        }
+
         if (!se->ops || !se->ops->save_setup) {
             continue;
         }
@@ -1402,6 +1411,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->immutable) {
+            /* Already saved during qemu_savevm_state_setup(). */
+            continue;
+        }
+
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/8] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (2 preceding siblings ...)
  2023-01-12 16:43 ` [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
@ 2023-01-12 16:43 ` David Hildenbrand
  2023-01-12 16:44 ` [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

We'll make use of both next in the context of virtio-mem.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/vmstate.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index dd06c3abad..e4cd21397d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -710,8 +710,9 @@ extern const VMStateInfo vmstate_info_qlist;
  *        '_state' type
  *    That the pointer is right at the start of _tmp_type.
  */
-#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) {                 \
+#define VMSTATE_WITH_TMP_TEST(_state, _test, _tmp_type, _vmsd) {     \
     .name         = "tmp",                                           \
+    .field_exists = (_test),                                         \
     .size         = sizeof(_tmp_type) +                              \
                     QEMU_BUILD_BUG_ON_ZERO(offsetof(_tmp_type, parent) != 0) + \
                     type_check_pointer(_state,                       \
@@ -720,6 +721,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .info         = &vmstate_info_tmp,                               \
 }
 
+#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) \
+    VMSTATE_WITH_TMP_TEST(_state, NULL, _tmp_type, _vmsd)
+
 #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {              \
     .name         = "unused",                                        \
     .field_exists = (_test),                                         \
@@ -743,8 +747,9 @@ extern const VMStateInfo vmstate_info_qlist;
 /* _field_size should be a int32_t field in the _state struct giving the
  * size of the bitmap _field in bits.
  */
-#define VMSTATE_BITMAP(_field, _state, _version, _field_size) {      \
+#define VMSTATE_BITMAP_TEST(_field, _state, _test, _version, _field_size) { \
     .name         = (stringify(_field)),                             \
+    .field_exists = (_test),                                         \
     .version_id   = (_version),                                      \
     .size_offset  = vmstate_offset_value(_state, _field_size, int32_t),\
     .info         = &vmstate_info_bitmap,                            \
@@ -752,6 +757,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset       = offsetof(_state, _field),                        \
 }
 
+#define VMSTATE_BITMAP(_field, _state, _version, _field_size) \
+    VMSTATE_BITMAP_TEST(_field, _state, NULL, _version, _field_size)
+
 /* For migrating a QTAILQ.
  * Target QTAILQ needs be properly initialized.
  * _type: type of QTAILQ element
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (3 preceding siblings ...)
  2023-01-12 16:43 ` [PATCH v3 4/8] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
@ 2023-01-12 16:44 ` David Hildenbrand
  2023-01-12 18:23   ` Dr. David Alan Gilbert
  2023-01-12 16:44 ` [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

Let's factor out this check, to be used in virtio-mem context next.

While at it, fix a spelling error in a related comment.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/misc.h | 4 +++-
 migration/migration.c    | 7 +++++++
 migration/ram.c          | 8 +-------
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 465906710d..8b49841016 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -67,8 +67,10 @@ bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
 bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
-/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
+/* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+/* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
+bool migration_incoming_postcopy_advised(void);
 /* True if background snapshot is active */
 bool migration_in_bg_snapshot(void);
 
diff --git a/migration/migration.c b/migration/migration.c
index 1d33a7efa0..b7677c14a9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2094,6 +2094,13 @@ bool migration_in_incoming_postcopy(void)
     return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
 }
 
+bool migration_incoming_postcopy_advised(void)
+{
+    PostcopyState ps = postcopy_state_get();
+
+    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
+}
+
 bool migration_in_bg_snapshot(void)
 {
     MigrationState *s = migrate_get_current();
diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..e51a7ee0ce 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4091,12 +4091,6 @@ int ram_load_postcopy(QEMUFile *f, int channel)
     return ret;
 }
 
-static bool postcopy_is_advised(void)
-{
-    PostcopyState ps = postcopy_state_get();
-    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
-}
-
 static bool postcopy_is_running(void)
 {
     PostcopyState ps = postcopy_state_get();
@@ -4167,7 +4161,7 @@ static int ram_load_precopy(QEMUFile *f)
     MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
-    bool postcopy_advised = postcopy_is_advised();
+    bool postcopy_advised = migration_incoming_postcopy_advised();
     if (!migrate_use_compression()) {
         invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
     }
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (4 preceding siblings ...)
  2023-01-12 16:44 ` [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy David Hildenbrand
@ 2023-01-12 16:44 ` David Hildenbrand
  2023-01-12 18:33   ` Dr. David Alan Gilbert
  2023-01-12 16:44 ` [PATCH v3 7/8] virtio-mem: Migrate immutable properties early David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.

Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.

Tested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 1ed1f5a4af..02f7b5469a 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -772,6 +772,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "'%s' property specifies an unsupported memdev",
                    VIRTIO_MEM_MEMDEV_PROP);
         return;
+    } else if (vmem->memdev->prealloc) {
+        error_setg(errp, "'%s' property specifies a memdev with preallocation"
+                   " enabled: %s. Instead, specify 'prealloc=on' for the"
+                   " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP,
+                   object_get_canonical_path_component(OBJECT(vmem->memdev)));
+        return;
     }
 
     if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) ||
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 7/8] virtio-mem: Migrate immutable properties early
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (5 preceding siblings ...)
  2023-01-12 16:44 ` [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
@ 2023-01-12 16:44 ` David Hildenbrand
  2023-01-12 19:44   ` Dr. David Alan Gilbert
  2023-01-12 16:44 ` [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration David Hildenbrand
  2023-01-12 16:45 ` [PATCH v3 0/8] virtio-mem: Handle " David Hildenbrand
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik

The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.

Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.

In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.

Use a new device property to keep behavior of compat machines
unmodified.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/machine.c              |  4 ++-
 hw/virtio/virtio-mem.c         | 51 ++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-mem.h |  8 ++++++
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 616f3a207c..29b57f6448 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,7 +41,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+    { "virtio-mem", "x-early-migration", "false" },
+};
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 02f7b5469a..51666baa01 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -31,6 +31,8 @@
 #include CONFIG_DEVICES
 #include "trace.h"
 
+static const VMStateDescription vmstate_virtio_mem_device_early;
+
 /*
  * We only had legacy x86 guests that did not support
  * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
@@ -878,6 +880,10 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
 
     host_memory_backend_set_mapped(vmem->memdev, true);
     vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
+    if (vmem->early_migration) {
+        vmstate_register(VMSTATE_IF(vmem), VMSTATE_INSTANCE_ID_ANY,
+                         &vmstate_virtio_mem_device_early, vmem);
+    }
     qemu_register_reset(virtio_mem_system_reset, vmem);
 
     /*
@@ -899,6 +905,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      */
     memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
     qemu_unregister_reset(virtio_mem_system_reset, vmem);
+    if (vmem->early_migration) {
+        vmstate_unregister(VMSTATE_IF(vmem), &vmstate_virtio_mem_device_early,
+                           vmem);
+    }
     vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
     host_memory_backend_set_mapped(vmem->memdev, false);
     virtio_del_queue(vdev, 0);
@@ -1015,18 +1025,53 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = {
     },
 };
 
+static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+
+    /* With early migration, these fields were already migrated. */
+    return !vmem->early_migration;
+}
+
 static const VMStateDescription vmstate_virtio_mem_device = {
     .name = "virtio-mem-device",
     .minimum_version_id = 1,
     .version_id = 1,
     .priority = MIG_PRI_VIRTIO_MEM,
     .post_load = virtio_mem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists,
+                              VirtIOMEMMigSanityChecks,
+                              vmstate_virtio_mem_sanity_checks),
+        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
+        VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists),
+        VMSTATE_UINT64(requested_size, VirtIOMEM),
+        VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists,
+                            0, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+/*
+ * Transfer properties that are immutable while migration is active early,
+ * such that we have have this information around before migrating any RAM
+ * content.
+ *
+ * Note that virtio_mem_is_busy() makes sure these properties can no longer
+ * change on the migration source until migration completed.
+ *
+ * With QEMU compat machines, we transmit these properties later, via
+ * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists().
+ */
+static const VMStateDescription vmstate_virtio_mem_device_early = {
+    .name = "virtio-mem-device-early",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .immutable = 1,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
         VMSTATE_UINT64(size, VirtIOMEM),
-        VMSTATE_UINT64(requested_size, VirtIOMEM),
         VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
         VMSTATE_END_OF_LIST()
     },
@@ -1211,6 +1256,8 @@ static Property virtio_mem_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
                             unplugged_inaccessible, ON_OFF_AUTO_AUTO),
 #endif
+    DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM,
+                     early_migration, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 7745cfc1a3..f15e561785 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
 #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
 #define VIRTIO_MEM_ADDR_PROP "memaddr"
 #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
+#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration"
 #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
 
 struct VirtIOMEM {
@@ -74,6 +75,13 @@ struct VirtIOMEM {
     /* whether to prealloc memory when plugging new blocks */
     bool prealloc;
 
+    /*
+     * Whether we migrate properties that are immutable while migration is
+     * active early, before state of other devices and especially, before
+     * migrating any RAM content.
+     */
+    bool early_migration;
+
     /* notifiers to notify when "size" changes */
     NotifierList size_change_notifiers;
 
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (6 preceding siblings ...)
  2023-01-12 16:44 ` [PATCH v3 7/8] virtio-mem: Migrate immutable properties early David Hildenbrand
@ 2023-01-12 16:44 ` David Hildenbrand
  2023-01-12 19:50   ` Dr. David Alan Gilbert
  2023-01-12 16:45 ` [PATCH v3 0/8] virtio-mem: Handle " David Hildenbrand
  8 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: David Hildenbrand, Dr . David Alan Gilbert, Juan Quintela,
	Peter Xu, Michael S . Tsirkin, Michal Privoznik, Jing Qi

Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.

This is especially relevant for the following cases:

(1) User errors

With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.

(2) Excluded memory ranges during migration

For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.

To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.

Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.

Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.

Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:

  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
  qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
  qemu-system-x86_64: load of migration failed: Cannot allocate memory

And we can even introspect the early migration data, including the
bitmap:
  $ ./scripts/analyze-migration.py -f STATEFILE
  {
  "ram (2)": {
      "section sizes": {
          "0000:00:03.0/mem0": "0x0000000780000000",
          "0000:00:04.0/mem1": "0x0000000780000000",
          "pc.ram": "0x0000000100000000",
          "/rom@etc/acpi/tables": "0x0000000000020000",
          "pc.bios": "0x0000000000040000",
          "0000:00:02.0/e1000.rom": "0x0000000000040000",
          "pc.rom": "0x0000000000020000",
          "/rom@etc/table-loader": "0x0000000000001000",
          "/rom@etc/acpi/rsdp": "0x0000000000001000"
      }
  },
  "0000:00:03.0/virtio-mem-device-early (51)": {
      "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x0000000040000000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  "0000:00:04.0/virtio-mem-device-early (53)": {
      "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x00000001fa400000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  [...]

Reported-by: Jing Qi <jinqi@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 87 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 51666baa01..4c3720249c 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
+static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg,
+                                             virtio_mem_range_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size);
+    while (first_bit < vmem->bitmap_size) {
+        offset = first_bit * vmem->block_size;
+        last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * vmem->block_size;
+
+        ret = cb(vmem, arg, offset, size);
+        if (ret) {
+            break;
+        }
+        first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                  last_bit + 2);
+    }
+    return ret;
+}
+
 /*
  * Adjust the memory section to cover the intersection with the given range.
  *
@@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     RamDiscardListener *rdl;
     int ret;
 
+    if (vmem->prealloc && !vmem->early_migration) {
+        warn_report("Proper preallocation with migration requires a newer QEMU machine");
+    }
+
     /*
      * We started out with all memory discarded and our memory region is mapped
      * into an address space. Replay, now that we updated the bitmap.
@@ -957,6 +985,64 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     return virtio_mem_restore_unplugged(vmem);
 }
 
+static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg,
+                                        uint64_t offset, uint64_t size)
+{
+    void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset;
+    int fd = memory_region_get_fd(&vmem->memdev->mr);
+    Error *local_err = NULL;
+
+    qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -ENOMEM;
+    }
+    return 0;
+}
+
+static int virtio_mem_post_load_early(void *opaque, int version_id)
+{
+    VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+    RAMBlock *rb = vmem->memdev->mr.ram_block;
+    int ret;
+
+    if (!vmem->prealloc) {
+        return 0;
+    }
+
+    /*
+     * We restored the bitmap and verified that the basic properties
+     * match on source and destination, so we can go ahead and preallocate
+     * memory for all plugged memory blocks, before actual RAM migration starts
+     * touching this memory.
+     */
+    ret = virtio_mem_for_each_plugged_range(vmem, NULL,
+                                            virtio_mem_prealloc_range_cb);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * This is tricky: postcopy wants to start with a clean slate. On
+     * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily
+     * preallocated) RAM such that postcopy will work as expected later.
+     *
+     * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual
+     * RAM migration. So let's discard all memory again. This looks like an
+     * expensive NOP, but actually serves a purpose: we made sure that we
+     * were able to allocate all required backend memory once. We cannot
+     * guarantee that the backend memory we will free will remain free
+     * until we need it during postcopy, but at least we can catch the
+     * obvious setup issues this way.
+     */
+    if (migration_incoming_postcopy_advised()) {
+        if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
+            return -EBUSY;
+        }
+    }
+    return 0;
+}
+
 typedef struct VirtIOMEMMigSanityChecks {
     VirtIOMEM *parent;
     uint64_t addr;
@@ -1068,6 +1154,7 @@ static const VMStateDescription vmstate_virtio_mem_device_early = {
     .minimum_version_id = 1,
     .version_id = 1,
     .immutable = 1,
+    .post_load = virtio_mem_post_load_early,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-- 
2.39.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/8] virtio-mem: Handle preallocation with migration
  2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
                   ` (7 preceding siblings ...)
  2023-01-12 16:44 ` [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration David Hildenbrand
@ 2023-01-12 16:45 ` David Hildenbrand
  8 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 16:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr . David Alan Gilbert, Juan Quintela, Peter Xu,
	Michael S . Tsirkin, Michal Privoznik

Subject for this series should be "v4" ...

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save()
  2023-01-12 16:43 ` [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save() David Hildenbrand
@ 2023-01-12 16:58   ` Dr. David Alan Gilbert
  2023-01-12 17:49     ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 16:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> Let's move more code into vmstate_save(), reducing code duplication and
> preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
> have to move vmstate_save() to make the compiler happy.
> 
> We'll now also trace from qemu_save_device_state().

Mostly OK, but..

> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/savevm.c | 79 ++++++++++++++++++++++------------------------

Doesn't this also need to upate trace-events?

Dave

>  1 file changed, 37 insertions(+), 42 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index a0cdb714f7..d8830297e4 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -897,17 +897,6 @@ static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
>      }
>  }
>  
> -static int vmstate_save(QEMUFile *f, SaveStateEntry *se,
> -                        JSONWriter *vmdesc)
> -{
> -    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
> -    if (!se->vmsd) {
> -        vmstate_save_old_style(f, se, vmdesc);
> -        return 0;
> -    }
> -    return vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
> -}
> -
>  /*
>   * Write the header for device section (QEMU_VM_SECTION START/END/PART/FULL)
>   */
> @@ -941,6 +930,43 @@ static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
>      }
>  }
>  
> +static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
> +{
> +    int ret;
> +
> +    if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> +        return 0;
> +    }
> +    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
> +        trace_savevm_section_skip(se->idstr, se->section_id);
> +        return 0;
> +    }
> +
> +    trace_savevm_section_start(se->idstr, se->section_id);
> +    save_section_header(f, se, QEMU_VM_SECTION_FULL);
> +    if (vmdesc) {
> +        json_writer_start_object(vmdesc, NULL);
> +        json_writer_str(vmdesc, "name", se->idstr);
> +        json_writer_int64(vmdesc, "instance_id", se->instance_id);
> +    }
> +
> +    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
> +    if (!se->vmsd) {
> +        vmstate_save_old_style(f, se, vmdesc);
> +    } else {
> +        ret = vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
> +    trace_savevm_section_end(se->idstr, se->section_id, 0);
> +    save_section_footer(f, se);
> +    if (vmdesc) {
> +        json_writer_end_object(vmdesc);
> +    }
> +    return 0;
> +}
>  /**
>   * qemu_savevm_command_send: Send a 'QEMU_VM_COMMAND' type element with the
>   *                           command and associated data.
> @@ -1374,31 +1400,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>      json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>      json_writer_start_array(vmdesc, "devices");
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -
> -        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> -            continue;
> -        }
> -        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
> -            trace_savevm_section_skip(se->idstr, se->section_id);
> -            continue;
> -        }
> -
> -        trace_savevm_section_start(se->idstr, se->section_id);
> -
> -        json_writer_start_object(vmdesc, NULL);
> -        json_writer_str(vmdesc, "name", se->idstr);
> -        json_writer_int64(vmdesc, "instance_id", se->instance_id);
> -
> -        save_section_header(f, se, QEMU_VM_SECTION_FULL);
>          ret = vmstate_save(f, se, vmdesc);
>          if (ret) {
>              qemu_file_set_error(f, ret);
>              return ret;
>          }
> -        trace_savevm_section_end(se->idstr, se->section_id, 0);
> -        save_section_footer(f, se);
> -
> -        json_writer_end_object(vmdesc);
>      }
>  
>      if (inactivate_disks) {
> @@ -1594,21 +1600,10 @@ int qemu_save_device_state(QEMUFile *f)
>          if (se->is_ram) {
>              continue;
>          }
> -        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
> -            continue;
> -        }
> -        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
> -            continue;
> -        }
> -
> -        save_section_header(f, se, QEMU_VM_SECTION_FULL);
> -
>          ret = vmstate_save(f, se, NULL);
>          if (ret) {
>              return ret;
>          }
> -
> -        save_section_footer(f, se);
>      }
>  
>      qemu_put_byte(f, QEMU_VM_EOF);
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 16:43 ` [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() David Hildenbrand
@ 2023-01-12 17:43   ` Dr. David Alan Gilbert
  2023-01-12 17:47     ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 17:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> ... and store it in the migration state. This is a preparation for
> storing selected vmds's already in qemu_savevm_state_setup().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/migration.c |  4 ++++
>  migration/migration.h |  4 ++++
>  migration/savevm.c    | 18 ++++++++++++------
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 52b5d39244..1d33a7efa0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
>      s->vm_was_running = false;
>      s->iteration_initial_bytes = 0;
>      s->threshold_size = 0;
> +
> +    json_writer_free(s->vmdesc);
> +    s->vmdesc = NULL;
>  }
>  
>  int migrate_add_blocker_internal(Error *reason, Error **errp)
> @@ -4445,6 +4448,7 @@ static void migration_instance_finalize(Object *obj)
>      qemu_sem_destroy(&ms->rp_state.rp_sem);
>      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
>      error_free(ms->error);
> +    json_writer_free(ms->vmdesc);

I'm not sure this is happening when you think it is.
I *think* this only happens when qemu quits....

>  }
>  
>  static void migration_instance_init(Object *obj)
> diff --git a/migration/migration.h b/migration/migration.h
> index ae4ffd3454..66511ce532 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -17,6 +17,7 @@
>  #include "exec/cpu-common.h"
>  #include "hw/qdev-core.h"
>  #include "qapi/qapi-types-migration.h"
> +#include "qapi/qmp/json-writer.h"
>  #include "qemu/thread.h"
>  #include "qemu/coroutine_int.h"
>  #include "io/channel.h"
> @@ -366,6 +367,9 @@ struct MigrationState {
>       * This save hostname when out-going migration starts
>       */
>      char *hostname;
> +
> +    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
> +    JSONWriter *vmdesc;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d8830297e4..ff2b8d0064 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -42,7 +42,6 @@
>  #include "postcopy-ram.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-migration.h"
> -#include "qapi/qmp/json-writer.h"
>  #include "qapi/clone-visitor.h"
>  #include "qapi/qapi-builtin-visit.h"
>  #include "qapi/qmp/qerror.h"
> @@ -1189,10 +1188,16 @@ bool qemu_savevm_state_guest_unplug_pending(void)
>  
>  void qemu_savevm_state_setup(QEMUFile *f)
>  {
> +    MigrationState *ms = migrate_get_current();
>      SaveStateEntry *se;
>      Error *local_err = NULL;
>      int ret;
>  
> +    ms->vmdesc = json_writer_new(false);
> +    json_writer_start_object(ms->vmdesc, NULL);
> +    json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
> +    json_writer_start_array(ms->vmdesc, "devices");
> +
>      trace_savevm_state_setup();
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (!se->ops || !se->ops->save_setup) {
> @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>                                                      bool in_postcopy,
>                                                      bool inactivate_disks)
>  {
> -    g_autoptr(JSONWriter) vmdesc = NULL;
> +    MigrationState *ms = migrate_get_current();
> +    JSONWriter *vmdesc = ms->vmdesc;
>      int vmdesc_len;
>      SaveStateEntry *se;
>      int ret;
>  
> -    vmdesc = json_writer_new(false);
> -    json_writer_start_object(vmdesc, NULL);
> -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
> -    json_writer_start_array(vmdesc, "devices");
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          ret = vmstate_save(f, se, vmdesc);
>          if (ret) {
> @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>          qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>      }
>  
> +    /* Free it now to detect any inconsistencies. */
> +    json_writer_free(vmdesc);
> +    ms->vmdesc = NULL;

and this only happens when this succesfully exits;  so if this errors
out, and then you retry an outwards migration, I think you've leaked a
writer.

Dave

>      return 0;
>  }
>  
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 17:43   ` Dr. David Alan Gilbert
@ 2023-01-12 17:47     ` David Hildenbrand
  2023-01-12 18:40       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 17:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

On 12.01.23 18:43, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> ... and store it in the migration state. This is a preparation for
>> storing selected vmds's already in qemu_savevm_state_setup().
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   migration/migration.c |  4 ++++
>>   migration/migration.h |  4 ++++
>>   migration/savevm.c    | 18 ++++++++++++------
>>   3 files changed, 20 insertions(+), 6 deletions(-)
>>

[1]

>> diff --git a/migration/migration.c b/migration/migration.c
>> index 52b5d39244..1d33a7efa0 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
>>       s->vm_was_running = false;
>>       s->iteration_initial_bytes = 0;
>>       s->threshold_size = 0;
>> +
>> +    json_writer_free(s->vmdesc);
>> +    s->vmdesc = NULL;
>>   }

[...]

>>       trace_savevm_state_setup();
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>           if (!se->ops || !se->ops->save_setup) {
>> @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>                                                       bool in_postcopy,
>>                                                       bool inactivate_disks)
>>   {
>> -    g_autoptr(JSONWriter) vmdesc = NULL;
>> +    MigrationState *ms = migrate_get_current();
>> +    JSONWriter *vmdesc = ms->vmdesc;
>>       int vmdesc_len;
>>       SaveStateEntry *se;
>>       int ret;
>>   
>> -    vmdesc = json_writer_new(false);
>> -    json_writer_start_object(vmdesc, NULL);
>> -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>> -    json_writer_start_array(vmdesc, "devices");
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>           ret = vmstate_save(f, se, vmdesc);
>>           if (ret) {
>> @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>           qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>>       }
>>   
>> +    /* Free it now to detect any inconsistencies. */
>> +    json_writer_free(vmdesc);
>> +    ms->vmdesc = NULL;
> 
> and this only happens when this succesfully exits;  so if this errors
> out, and then you retry an outwards migration, I think you've leaked a
> writer.

Shouldn't the change [1] to migrate_init() cover that?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save()
  2023-01-12 16:58   ` Dr. David Alan Gilbert
@ 2023-01-12 17:49     ` David Hildenbrand
  2023-01-12 18:36       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 17:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

On 12.01.23 17:58, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Let's move more code into vmstate_save(), reducing code duplication and
>> preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
>> have to move vmstate_save() to make the compiler happy.
>>
>> We'll now also trace from qemu_save_device_state().
> 
> Mostly OK, but..
> 
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   migration/savevm.c | 79 ++++++++++++++++++++++------------------------
> 
> Doesn't this also need to upate trace-events?

The existing trace events from 
qemu_savevm_state_complete_precopy_non_iterable() are simply moved to 
vmstate_save(), so qemu_save_device_state() will implicitly use them.

So no update should be needed (no new events), or am I missing something?

Thanks!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 16:43 ` [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
@ 2023-01-12 17:56   ` Dr. David Alan Gilbert
  2023-01-12 18:21     ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 17:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> For virtio-mem, we want to have the plugged/unplugged state of memory
> blocks available before migrating any actual RAM content, and perform
> sanity checks before touching anything on the destination. This
> information is immutable on the migration source while migration is active,
> 
> We want to use this information for proper preallocation support with
> migration: currently, we don't preallocate memory on the migration target,
> and especially with hugetlb, we can easily run out of hugetlb pages during
> RAM migration and will crash (SIGBUS) instead of catching this gracefully
> via preallocation.
> 
> Migrating device state via a vmsd before we start iterating is currently
> impossible: the only approach that would be possible is avoiding a vmsd
> and migrating state manually during save_setup(), to be restored during
> load_state().
> 
> Let's allow for migrating device state via a vmsd early, during the
> setup phase in qemu_savevm_state_setup(). To keep it simple, we
> indicate applicable vmds's using an "immutable" flag.
> 
> Note that only very selected devices (i.e., ones seriously messing with
> RAM setup) are supposed to make use of such early state migration.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/migration/vmstate.h |  5 +++++
>  migration/savevm.c          | 14 ++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index ad24aa1934..dd06c3abad 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -179,6 +179,11 @@ struct VMStateField {
>  struct VMStateDescription {
>      const char *name;
>      int unmigratable;
> +    /*
> +     * The state is immutable while migration is active and is saved
> +     * during the setup phase, to be restored early on the destination.
> +     */
> +    int immutable;

A bool would be nicer (as it would for unmigratable above)

>      int version_id;
>      int minimum_version_id;
>      MigrationPriority priority;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ff2b8d0064..536d6f662b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
>  
>      trace_savevm_state_setup();
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (se->vmsd && se->vmsd->immutable) {
> +            ret = vmstate_save(f, se, ms->vmdesc);
> +            if (ret) {
> +                qemu_file_set_error(f, ret);
> +                break;
> +            }
> +            continue;
> +        }
> +

Does this give you the ordering you want? i.e. there's no guarantee here
that immutables come first?

Dave


>          if (!se->ops || !se->ops->save_setup) {
>              continue;
>          }
> @@ -1402,6 +1411,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>      int ret;
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (se->vmsd && se->vmsd->immutable) {
> +            /* Already saved during qemu_savevm_state_setup(). */
> +            continue;
> +        }
> +
>          ret = vmstate_save(f, se, vmdesc);
>          if (ret) {
>              qemu_file_set_error(f, ret);
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 17:56   ` Dr. David Alan Gilbert
@ 2023-01-12 18:21     ` David Hildenbrand
  2023-01-12 19:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-12 18:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

On 12.01.23 18:56, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> For virtio-mem, we want to have the plugged/unplugged state of memory
>> blocks available before migrating any actual RAM content, and perform
>> sanity checks before touching anything on the destination. This
>> information is immutable on the migration source while migration is active,
>>
>> We want to use this information for proper preallocation support with
>> migration: currently, we don't preallocate memory on the migration target,
>> and especially with hugetlb, we can easily run out of hugetlb pages during
>> RAM migration and will crash (SIGBUS) instead of catching this gracefully
>> via preallocation.
>>
>> Migrating device state via a vmsd before we start iterating is currently
>> impossible: the only approach that would be possible is avoiding a vmsd
>> and migrating state manually during save_setup(), to be restored during
>> load_state().
>>
>> Let's allow for migrating device state via a vmsd early, during the
>> setup phase in qemu_savevm_state_setup(). To keep it simple, we
>> indicate applicable vmds's using an "immutable" flag.
>>
>> Note that only very selected devices (i.e., ones seriously messing with
>> RAM setup) are supposed to make use of such early state migration.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   include/migration/vmstate.h |  5 +++++
>>   migration/savevm.c          | 14 ++++++++++++++
>>   2 files changed, 19 insertions(+)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index ad24aa1934..dd06c3abad 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -179,6 +179,11 @@ struct VMStateField {
>>   struct VMStateDescription {
>>       const char *name;
>>       int unmigratable;
>> +    /*
>> +     * The state is immutable while migration is active and is saved
>> +     * during the setup phase, to be restored early on the destination.
>> +     */
>> +    int immutable;
> 
> A bool would be nicer (as it would for unmigratable above)

Yes, I chose an int for consistency with "unmigratable". I can turn that 
into a bool.

I'd even include a cleanup patch for unmigratable if it wouldn't be ...

$ git grep "unmigratable \=" | wc -l
29

> 
>>       int version_id;
>>       int minimum_version_id;
>>       MigrationPriority priority;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index ff2b8d0064..536d6f662b 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>   
>>       trace_savevm_state_setup();
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> +        if (se->vmsd && se->vmsd->immutable) {
>> +            ret = vmstate_save(f, se, ms->vmdesc);
>> +            if (ret) {
>> +                qemu_file_set_error(f, ret);
>> +                break;
>> +            }
>> +            continue;
>> +        }
>> +
> 
> Does this give you the ordering you want? i.e. there's no guarantee here
> that immutables come first?

Yes, for virtio-mem at least this is fine. There are no real ordering 
requirements in regard to save_setup().

I guess one could use vmstate priorities to affect the ordering, if 
required.

So for my use case this is good enough, any suggestions? Thanks.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy
  2023-01-12 16:44 ` [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy David Hildenbrand
@ 2023-01-12 18:23   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 18:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> Let's factor out this check, to be used in virtio-mem context next.
> 
> While at it, fix a spelling error in a related comment.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/migration/misc.h | 4 +++-
>  migration/migration.c    | 7 +++++++
>  migration/ram.c          | 8 +-------
>  3 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 465906710d..8b49841016 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -67,8 +67,10 @@ bool migration_has_failed(MigrationState *);
>  /* ...and after the device transmission */
>  bool migration_in_postcopy_after_devices(MigrationState *);
>  void migration_global_dump(Monitor *mon);
> -/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
> +/* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
>  bool migration_in_incoming_postcopy(void);
> +/* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
> +bool migration_incoming_postcopy_advised(void);
>  /* True if background snapshot is active */
>  bool migration_in_bg_snapshot(void);
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 1d33a7efa0..b7677c14a9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2094,6 +2094,13 @@ bool migration_in_incoming_postcopy(void)
>      return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
>  }
>  
> +bool migration_incoming_postcopy_advised(void)
> +{
> +    PostcopyState ps = postcopy_state_get();
> +
> +    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
> +}
> +
>  bool migration_in_bg_snapshot(void)
>  {
>      MigrationState *s = migrate_get_current();
> diff --git a/migration/ram.c b/migration/ram.c
> index 334309f1c6..e51a7ee0ce 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4091,12 +4091,6 @@ int ram_load_postcopy(QEMUFile *f, int channel)
>      return ret;
>  }
>  
> -static bool postcopy_is_advised(void)
> -{
> -    PostcopyState ps = postcopy_state_get();
> -    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
> -}
> -
>  static bool postcopy_is_running(void)
>  {
>      PostcopyState ps = postcopy_state_get();
> @@ -4167,7 +4161,7 @@ static int ram_load_precopy(QEMUFile *f)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
>      /* ADVISE is earlier, it shows the source has the postcopy capability on */
> -    bool postcopy_advised = postcopy_is_advised();
> +    bool postcopy_advised = migration_incoming_postcopy_advised();
>      if (!migrate_use_compression()) {
>          invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
>      }
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  2023-01-12 16:44 ` [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
@ 2023-01-12 18:33   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 18:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> "prealloc=on" for the memory backend does not work as expected, as
> virtio-mem will simply discard all preallocated memory immediately again.
> In the best case, it's an expensive NOP. In the worst case, it's an
> unexpected allocation error.
> 
> Instead, "prealloc=on" should be specified for the virtio-mem device only,
> such that virtio-mem will try preallocating memory before plugging
> memory dynamically to the guest. Fail if such a memory backend is
> provided.
> 
> Tested-by: Michal Privoznik <mprivozn@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  hw/virtio/virtio-mem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 1ed1f5a4af..02f7b5469a 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -772,6 +772,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>          error_setg(errp, "'%s' property specifies an unsupported memdev",
>                     VIRTIO_MEM_MEMDEV_PROP);
>          return;
> +    } else if (vmem->memdev->prealloc) {
> +        error_setg(errp, "'%s' property specifies a memdev with preallocation"
> +                   " enabled: %s. Instead, specify 'prealloc=on' for the"
> +                   " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP,
> +                   object_get_canonical_path_component(OBJECT(vmem->memdev)));
> +        return;
>      }
>  
>      if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) ||
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save()
  2023-01-12 17:49     ` David Hildenbrand
@ 2023-01-12 18:36       ` Dr. David Alan Gilbert
  2023-01-13 12:59         ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 18:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> On 12.01.23 17:58, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > Let's move more code into vmstate_save(), reducing code duplication and
> > > preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
> > > have to move vmstate_save() to make the compiler happy.
> > > 
> > > We'll now also trace from qemu_save_device_state().
> > 
> > Mostly OK, but..
> > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > >   migration/savevm.c | 79 ++++++++++++++++++++++------------------------
> > 
> > Doesn't this also need to upate trace-events?
> 
> The existing trace events from
> qemu_savevm_state_complete_precopy_non_iterable() are simply moved to
> vmstate_save(), so qemu_save_device_state() will implicitly use them.
> 
> So no update should be needed (no new events), or am I missing something?

Aren't you losing the trace_savevm_state_setup() trace?

Dave

> Thanks!
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 17:47     ` David Hildenbrand
@ 2023-01-12 18:40       ` Dr. David Alan Gilbert
  2023-01-12 22:06         ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 18:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> On 12.01.23 18:43, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > ... and store it in the migration state. This is a preparation for
> > > storing selected vmds's already in qemu_savevm_state_setup().
> > > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > >   migration/migration.c |  4 ++++
> > >   migration/migration.h |  4 ++++
> > >   migration/savevm.c    | 18 ++++++++++++------
> > >   3 files changed, 20 insertions(+), 6 deletions(-)
> > > 
> 
> [1]
> 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 52b5d39244..1d33a7efa0 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
> > >       s->vm_was_running = false;
> > >       s->iteration_initial_bytes = 0;
> > >       s->threshold_size = 0;
> > > +
> > > +    json_writer_free(s->vmdesc);
> > > +    s->vmdesc = NULL;
> > >   }
> 
> [...]
> 
> > >       trace_savevm_state_setup();
> > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > >           if (!se->ops || !se->ops->save_setup) {
> > > @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
> > >                                                       bool in_postcopy,
> > >                                                       bool inactivate_disks)
> > >   {
> > > -    g_autoptr(JSONWriter) vmdesc = NULL;
> > > +    MigrationState *ms = migrate_get_current();
> > > +    JSONWriter *vmdesc = ms->vmdesc;
> > >       int vmdesc_len;
> > >       SaveStateEntry *se;
> > >       int ret;
> > > -    vmdesc = json_writer_new(false);
> > > -    json_writer_start_object(vmdesc, NULL);
> > > -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
> > > -    json_writer_start_array(vmdesc, "devices");
> > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > >           ret = vmstate_save(f, se, vmdesc);
> > >           if (ret) {
> > > @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
> > >           qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
> > >       }
> > > +    /* Free it now to detect any inconsistencies. */
> > > +    json_writer_free(vmdesc);
> > > +    ms->vmdesc = NULL;
> > 
> > and this only happens when this succesfully exits;  so if this errors
> > out, and then you retry an outwards migration, I think you've leaked a
> > writer.
> 
> Shouldn't the change [1] to migrate_init() cover that?

Hmm OK, yes it does - I guess it does mean you keep the allocation
around for a bit longer, but that's OK in practice since normally you'll
be quitting soon.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -- 
> Thanks,
> 
> David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/8] virtio-mem: Migrate immutable properties early
  2023-01-12 16:44 ` [PATCH v3 7/8] virtio-mem: Migrate immutable properties early David Hildenbrand
@ 2023-01-12 19:44   ` Dr. David Alan Gilbert
  2023-01-13 13:59     ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 19:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> The bitmap and the size are immutable while migration is active: see
> virtio_mem_is_busy(). We can migrate this information early, before
> migrating any actual RAM content. Further, all information we need for
> sanity checks is immutable as well.
> 
> Having this information in place early will, for example, allow for
> properly preallocating memory before touching these memory locations
> during RAM migration: this way, we can make sure that all memory was
> actually preallocated and that any user errors (e.g., insufficient
> hugetlb pages) can be handled gracefully.
> 
> In contrast, usable_region_size and requested_size can theoretically
> still be modified on the source while the VM is running. Keep migrating
> these properties the usual, late, way.
> 
> Use a new device property to keep behavior of compat machines
> unmodified.

Can you get me a migration file from this? I want to try and understand
what happens when you have the vmstate_register together with the ->vmsd -
I'm not quite sure what ends up in the output.  Preferably for a VM with
two virtio-mem's.

Dave


> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/core/machine.c              |  4 ++-
>  hw/virtio/virtio-mem.c         | 51 ++++++++++++++++++++++++++++++++--
>  include/hw/virtio/virtio-mem.h |  8 ++++++
>  3 files changed, 60 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 616f3a207c..29b57f6448 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -41,7 +41,9 @@
>  #include "hw/virtio/virtio-pci.h"
>  #include "qom/object_interfaces.h"
>  
> -GlobalProperty hw_compat_7_2[] = {};
> +GlobalProperty hw_compat_7_2[] = {
> +    { "virtio-mem", "x-early-migration", "false" },
> +};
>  const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
>  
>  GlobalProperty hw_compat_7_1[] = {
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 02f7b5469a..51666baa01 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -31,6 +31,8 @@
>  #include CONFIG_DEVICES
>  #include "trace.h"
>  
> +static const VMStateDescription vmstate_virtio_mem_device_early;
> +
>  /*
>   * We only had legacy x86 guests that did not support
>   * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
> @@ -878,6 +880,10 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>  
>      host_memory_backend_set_mapped(vmem->memdev, true);
>      vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
> +    if (vmem->early_migration) {
> +        vmstate_register(VMSTATE_IF(vmem), VMSTATE_INSTANCE_ID_ANY,
> +                         &vmstate_virtio_mem_device_early, vmem);
> +    }
>      qemu_register_reset(virtio_mem_system_reset, vmem);
>  
>      /*
> @@ -899,6 +905,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>       */
>      memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
>      qemu_unregister_reset(virtio_mem_system_reset, vmem);
> +    if (vmem->early_migration) {
> +        vmstate_unregister(VMSTATE_IF(vmem), &vmstate_virtio_mem_device_early,
> +                           vmem);
> +    }
>      vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
>      host_memory_backend_set_mapped(vmem->memdev, false);
>      virtio_del_queue(vdev, 0);
> @@ -1015,18 +1025,53 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = {
>      },
>  };
>  
> +static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id)
> +{
> +    const VirtIOMEM *vmem = VIRTIO_MEM(opaque);
> +
> +    /* With early migration, these fields were already migrated. */
> +    return !vmem->early_migration;
> +}
> +
>  static const VMStateDescription vmstate_virtio_mem_device = {
>      .name = "virtio-mem-device",
>      .minimum_version_id = 1,
>      .version_id = 1,
>      .priority = MIG_PRI_VIRTIO_MEM,
>      .post_load = virtio_mem_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists,
> +                              VirtIOMEMMigSanityChecks,
> +                              vmstate_virtio_mem_sanity_checks),
> +        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
> +        VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists),
> +        VMSTATE_UINT64(requested_size, VirtIOMEM),
> +        VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists,
> +                            0, bitmap_size),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +/*
> + * Transfer properties that are immutable while migration is active early,
> + * such that we have have this information around before migrating any RAM
> + * content.
> + *
> + * Note that virtio_mem_is_busy() makes sure these properties can no longer
> + * change on the migration source until migration completed.
> + *
> + * With QEMU compat machines, we transmit these properties later, via
> + * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists().
> + */
> +static const VMStateDescription vmstate_virtio_mem_device_early = {
> +    .name = "virtio-mem-device-early",
> +    .minimum_version_id = 1,
> +    .version_id = 1,
> +    .immutable = 1,
>      .fields = (VMStateField[]) {
>          VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
>                           vmstate_virtio_mem_sanity_checks),
> -        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
>          VMSTATE_UINT64(size, VirtIOMEM),
> -        VMSTATE_UINT64(requested_size, VirtIOMEM),
>          VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
>          VMSTATE_END_OF_LIST()
>      },
> @@ -1211,6 +1256,8 @@ static Property virtio_mem_properties[] = {
>      DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
>                              unplugged_inaccessible, ON_OFF_AUTO_AUTO),
>  #endif
> +    DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM,
> +                     early_migration, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
> index 7745cfc1a3..f15e561785 100644
> --- a/include/hw/virtio/virtio-mem.h
> +++ b/include/hw/virtio/virtio-mem.h
> @@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
>  #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
>  #define VIRTIO_MEM_ADDR_PROP "memaddr"
>  #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
> +#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration"
>  #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
>  
>  struct VirtIOMEM {
> @@ -74,6 +75,13 @@ struct VirtIOMEM {
>      /* whether to prealloc memory when plugging new blocks */
>      bool prealloc;
>  
> +    /*
> +     * Whether we migrate properties that are immutable while migration is
> +     * active early, before state of other devices and especially, before
> +     * migrating any RAM content.
> +     */
> +    bool early_migration;
> +
>      /* notifiers to notify when "size" changes */
>      NotifierList size_change_notifiers;
>  
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration
  2023-01-12 16:44 ` [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration David Hildenbrand
@ 2023-01-12 19:50   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 19:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik, Jing Qi

* David Hildenbrand (david@redhat.com) wrote:
> Ordinary memory preallocation runs when QEMU starts up and creates the
> memory backends, before processing the incoming migration stream. With
> virtio-mem, we don't know which memory blocks to preallocate before
> migration started. Now that we migrate the virtio-mem bitmap early, before
> migrating any RAM content, we can safely preallocate memory for all plugged
> memory blocks before migrating any RAM content.
> 
> This is especially relevant for the following cases:
> 
> (1) User errors
> 
> With hugetlb/files, if we don't have sufficient backend memory available on
> the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
> when running out of backend memory. Preallocating memory before actual
> RAM migration allows for failing gracefully and informing the user about
> the setup problem.
> 
> (2) Excluded memory ranges during migration
> 
> For example, virtio-balloon free page hinting will exclude some pages
> from getting migrated. In that case, we won't crash during RAM
> migration, but later, when running the VM on the destination, which is
> bad.
> 
> To fix this for new QEMU machines that migrate the bitmap early,
> preallocate the memory early, before any RAM migration. Warn with old
> QEMU machines.
> 
> Getting postcopy right is a bit tricky, but we essentially now implement
> the same (problematic) preallocation logic as ordinary preallocation:
> preallocate memory early and discard it again before precopy starts. During
> ordinary preallocation, discarding of RAM happens when postcopy is advised.
> As the state (bitmap) is loaded after postcopy was advised but before
> postcopy starts listening, we have to discard memory we preallocated
> immediately again ourselves.
> 
> Note that nothing (not even hugetlb reservations) guarantees for postcopy
> that backend memory (especially, hugetlb pages) are still free after they
> were freed ones while discarding RAM. Still, allocating that memory at
> least once helps catching some basic setup problems.
> 
> Before this change, trying to restore a VM when insufficient hugetlb
> pages are around results in the process crashing to to a "Bus error"
> (SIGBUS). With this change, QEMU fails gracefully:
> 
>   qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
>   qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
>   qemu-system-x86_64: load of migration failed: Cannot allocate memory
> 
> And we can even introspect the early migration data, including the
> bitmap:
>   $ ./scripts/analyze-migration.py -f STATEFILE
>   {
>   "ram (2)": {
>       "section sizes": {
>           "0000:00:03.0/mem0": "0x0000000780000000",
>           "0000:00:04.0/mem1": "0x0000000780000000",
>           "pc.ram": "0x0000000100000000",
>           "/rom@etc/acpi/tables": "0x0000000000020000",
>           "pc.bios": "0x0000000000040000",
>           "0000:00:02.0/e1000.rom": "0x0000000000040000",
>           "pc.rom": "0x0000000000020000",
>           "/rom@etc/table-loader": "0x0000000000001000",
>           "/rom@etc/acpi/rsdp": "0x0000000000001000"
>       }
>   },
>   "0000:00:03.0/virtio-mem-device-early (51)": {
>       "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
>       "size": "0x0000000040000000",
>       "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
>   },
>   "0000:00:04.0/virtio-mem-device-early (53)": {
>       "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
>       "size": "0x00000001fa400000",
>       "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
>   },
>   [...]
> 
> Reported-by: Jing Qi <jinqi@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  hw/virtio/virtio-mem.c | 87 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 51666baa01..4c3720249c 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
>      return ret;
>  }
>  
> +static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg,
> +                                             virtio_mem_range_cb cb)
> +{
> +    unsigned long first_bit, last_bit;
> +    uint64_t offset, size;
> +    int ret = 0;
> +
> +    first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size);
> +    while (first_bit < vmem->bitmap_size) {
> +        offset = first_bit * vmem->block_size;
> +        last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
> +                                      first_bit + 1) - 1;
> +        size = (last_bit - first_bit + 1) * vmem->block_size;
> +
> +        ret = cb(vmem, arg, offset, size);
> +        if (ret) {
> +            break;
> +        }
> +        first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
> +                                  last_bit + 2);
> +    }
> +    return ret;
> +}
> +
>  /*
>   * Adjust the memory section to cover the intersection with the given range.
>   *
> @@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int version_id)
>      RamDiscardListener *rdl;
>      int ret;
>  
> +    if (vmem->prealloc && !vmem->early_migration) {
> +        warn_report("Proper preallocation with migration requires a newer QEMU machine");
> +    }
> +
>      /*
>       * We started out with all memory discarded and our memory region is mapped
>       * into an address space. Replay, now that we updated the bitmap.
> @@ -957,6 +985,64 @@ static int virtio_mem_post_load(void *opaque, int version_id)
>      return virtio_mem_restore_unplugged(vmem);
>  }
>  
> +static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg,
> +                                        uint64_t offset, uint64_t size)
> +{
> +    void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset;
> +    int fd = memory_region_get_fd(&vmem->memdev->mr);
> +    Error *local_err = NULL;
> +
> +    qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -ENOMEM;
> +    }
> +    return 0;
> +}
> +
> +static int virtio_mem_post_load_early(void *opaque, int version_id)
> +{
> +    VirtIOMEM *vmem = VIRTIO_MEM(opaque);
> +    RAMBlock *rb = vmem->memdev->mr.ram_block;
> +    int ret;
> +
> +    if (!vmem->prealloc) {
> +        return 0;
> +    }
> +
> +    /*
> +     * We restored the bitmap and verified that the basic properties
> +     * match on source and destination, so we can go ahead and preallocate
> +     * memory for all plugged memory blocks, before actual RAM migration starts
> +     * touching this memory.
> +     */
> +    ret = virtio_mem_for_each_plugged_range(vmem, NULL,
> +                                            virtio_mem_prealloc_range_cb);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    /*
> +     * This is tricky: postcopy wants to start with a clean slate. On
> +     * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily
> +     * preallocated) RAM such that postcopy will work as expected later.
> +     *
> +     * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual
> +     * RAM migration. So let's discard all memory again. This looks like an
> +     * expensive NOP, but actually serves a purpose: we made sure that we
> +     * were able to allocate all required backend memory once. We cannot
> +     * guarantee that the backend memory we will free will remain free
> +     * until we need it during postcopy, but at least we can catch the
> +     * obvious setup issues this way.
> +     */
> +    if (migration_incoming_postcopy_advised()) {
> +        if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
> +            return -EBUSY;
> +        }
> +    }
> +    return 0;
> +}
> +
>  typedef struct VirtIOMEMMigSanityChecks {
>      VirtIOMEM *parent;
>      uint64_t addr;
> @@ -1068,6 +1154,7 @@ static const VMStateDescription vmstate_virtio_mem_device_early = {
>      .minimum_version_id = 1,
>      .version_id = 1,
>      .immutable = 1,
> +    .post_load = virtio_mem_post_load_early,
>      .fields = (VMStateField[]) {
>          VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
>                           vmstate_virtio_mem_sanity_checks),
> -- 
> 2.39.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 18:21     ` David Hildenbrand
@ 2023-01-12 19:52       ` Dr. David Alan Gilbert
  2023-01-12 22:14         ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Dr. David Alan Gilbert @ 2023-01-12 19:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

* David Hildenbrand (david@redhat.com) wrote:
> On 12.01.23 18:56, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > For virtio-mem, we want to have the plugged/unplugged state of memory
> > > blocks available before migrating any actual RAM content, and perform
> > > sanity checks before touching anything on the destination. This
> > > information is immutable on the migration source while migration is active,
> > > 
> > > We want to use this information for proper preallocation support with
> > > migration: currently, we don't preallocate memory on the migration target,
> > > and especially with hugetlb, we can easily run out of hugetlb pages during
> > > RAM migration and will crash (SIGBUS) instead of catching this gracefully
> > > via preallocation.
> > > 
> > > Migrating device state via a vmsd before we start iterating is currently
> > > impossible: the only approach that would be possible is avoiding a vmsd
> > > and migrating state manually during save_setup(), to be restored during
> > > load_state().
> > > 
> > > Let's allow for migrating device state via a vmsd early, during the
> > > setup phase in qemu_savevm_state_setup(). To keep it simple, we
> > > indicate applicable vmds's using an "immutable" flag.
> > > 
> > > Note that only very selected devices (i.e., ones seriously messing with
> > > RAM setup) are supposed to make use of such early state migration.
> > > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > >   include/migration/vmstate.h |  5 +++++
> > >   migration/savevm.c          | 14 ++++++++++++++
> > >   2 files changed, 19 insertions(+)
> > > 
> > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > index ad24aa1934..dd06c3abad 100644
> > > --- a/include/migration/vmstate.h
> > > +++ b/include/migration/vmstate.h
> > > @@ -179,6 +179,11 @@ struct VMStateField {
> > >   struct VMStateDescription {
> > >       const char *name;
> > >       int unmigratable;
> > > +    /*
> > > +     * The state is immutable while migration is active and is saved
> > > +     * during the setup phase, to be restored early on the destination.
> > > +     */
> > > +    int immutable;
> > 
> > A bool would be nicer (as it would for unmigratable above)
> 
> Yes, I chose an int for consistency with "unmigratable". I can turn that
> into a bool.
> 
> I'd even include a cleanup patch for unmigratable if it wouldn't be ...
> 
> $ git grep "unmigratable \=" | wc -l
> 29

It might be OK if you just change the declaration; I mean '1' is pretty
close to true? (I think...)
Anyway, at least make the new one a bool.

> > >       int version_id;
> > >       int minimum_version_id;
> > >       MigrationPriority priority;
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index ff2b8d0064..536d6f662b 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
> > >       trace_savevm_state_setup();
> > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > +        if (se->vmsd && se->vmsd->immutable) {
> > > +            ret = vmstate_save(f, se, ms->vmdesc);
> > > +            if (ret) {
> > > +                qemu_file_set_error(f, ret);
> > > +                break;
> > > +            }
> > > +            continue;
> > > +        }
> > > +
> > 
> > Does this give you the ordering you want? i.e. there's no guarantee here
> > that immutables come first?
> 
> Yes, for virtio-mem at least this is fine. There are no real ordering
> requirements in regard to save_setup().
> 
> I guess one could use vmstate priorities to affect the ordering, if
> required.
> 
> So for my use case this is good enough, any suggestions? Thanks.

OK, but consider whether it might be better just to have a separate
QTAILQ_FOREACH look in savevm_state_setup that first does all the
immutables, and then all the setups.

Dave

> -- 
> Thanks,
> 
> David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 18:40       ` Dr. David Alan Gilbert
@ 2023-01-12 22:06         ` Peter Xu
  2023-01-13 13:01           ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2023-01-12 22:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Hildenbrand, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Jan 12, 2023 at 06:40:00PM +0000, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
> > On 12.01.23 18:43, Dr. David Alan Gilbert wrote:
> > > * David Hildenbrand (david@redhat.com) wrote:
> > > > ... and store it in the migration state. This is a preparation for
> > > > storing selected vmds's already in qemu_savevm_state_setup().
> > > > 
> > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > ---
> > > >   migration/migration.c |  4 ++++
> > > >   migration/migration.h |  4 ++++
> > > >   migration/savevm.c    | 18 ++++++++++++------
> > > >   3 files changed, 20 insertions(+), 6 deletions(-)
> > > > 
> > 
> > [1]
> > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 52b5d39244..1d33a7efa0 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
> > > >       s->vm_was_running = false;
> > > >       s->iteration_initial_bytes = 0;
> > > >       s->threshold_size = 0;
> > > > +
> > > > +    json_writer_free(s->vmdesc);
> > > > +    s->vmdesc = NULL;
> > > >   }
> > 
> > [...]
> > 
> > > >       trace_savevm_state_setup();
> > > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > >           if (!se->ops || !se->ops->save_setup) {
> > > > @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
> > > >                                                       bool in_postcopy,
> > > >                                                       bool inactivate_disks)
> > > >   {
> > > > -    g_autoptr(JSONWriter) vmdesc = NULL;
> > > > +    MigrationState *ms = migrate_get_current();
> > > > +    JSONWriter *vmdesc = ms->vmdesc;
> > > >       int vmdesc_len;
> > > >       SaveStateEntry *se;
> > > >       int ret;
> > > > -    vmdesc = json_writer_new(false);
> > > > -    json_writer_start_object(vmdesc, NULL);
> > > > -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
> > > > -    json_writer_start_array(vmdesc, "devices");
> > > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > >           ret = vmstate_save(f, se, vmdesc);
> > > >           if (ret) {
> > > > @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
> > > >           qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
> > > >       }
> > > > +    /* Free it now to detect any inconsistencies. */
> > > > +    json_writer_free(vmdesc);
> > > > +    ms->vmdesc = NULL;
> > > 
> > > and this only happens when this succesfully exits;  so if this errors
> > > out, and then you retry an outwards migration, I think you've leaked a
> > > writer.
> > 
> > Shouldn't the change [1] to migrate_init() cover that?
> 
> Hmm OK, yes it does - I guess it does mean you keep the allocation
> around for a bit longer, but that's OK in practice since normally you'll
> be quitting soon.

Instead of json_writer_free() here and there, how about free it in
migrate_fd_cleanup() once and for all?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 19:52       ` Dr. David Alan Gilbert
@ 2023-01-12 22:14         ` Peter Xu
  2023-01-12 22:28           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2023-01-12 22:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Hildenbrand, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Jan 12, 2023 at 07:52:41PM +0000, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
> > On 12.01.23 18:56, Dr. David Alan Gilbert wrote:
> > > * David Hildenbrand (david@redhat.com) wrote:
> > > > For virtio-mem, we want to have the plugged/unplugged state of memory
> > > > blocks available before migrating any actual RAM content, and perform
> > > > sanity checks before touching anything on the destination. This
> > > > information is immutable on the migration source while migration is active,
> > > > 
> > > > We want to use this information for proper preallocation support with
> > > > migration: currently, we don't preallocate memory on the migration target,
> > > > and especially with hugetlb, we can easily run out of hugetlb pages during
> > > > RAM migration and will crash (SIGBUS) instead of catching this gracefully
> > > > via preallocation.
> > > > 
> > > > Migrating device state via a vmsd before we start iterating is currently
> > > > impossible: the only approach that would be possible is avoiding a vmsd
> > > > and migrating state manually during save_setup(), to be restored during
> > > > load_state().
> > > > 
> > > > Let's allow for migrating device state via a vmsd early, during the
> > > > setup phase in qemu_savevm_state_setup(). To keep it simple, we
> > > > indicate applicable vmds's using an "immutable" flag.
> > > > 
> > > > Note that only very selected devices (i.e., ones seriously messing with
> > > > RAM setup) are supposed to make use of such early state migration.
> > > > 
> > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > ---
> > > >   include/migration/vmstate.h |  5 +++++
> > > >   migration/savevm.c          | 14 ++++++++++++++
> > > >   2 files changed, 19 insertions(+)
> > > > 
> > > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > > index ad24aa1934..dd06c3abad 100644
> > > > --- a/include/migration/vmstate.h
> > > > +++ b/include/migration/vmstate.h
> > > > @@ -179,6 +179,11 @@ struct VMStateField {
> > > >   struct VMStateDescription {
> > > >       const char *name;
> > > >       int unmigratable;
> > > > +    /*
> > > > +     * The state is immutable while migration is active and is saved
> > > > +     * during the setup phase, to be restored early on the destination.
> > > > +     */
> > > > +    int immutable;
> > > 
> > > A bool would be nicer (as it would for unmigratable above)
> > 
> > Yes, I chose an int for consistency with "unmigratable". I can turn that
> > into a bool.
> > 
> > I'd even include a cleanup patch for unmigratable if it wouldn't be ...
> > 
> > $ git grep "unmigratable \=" | wc -l
> > 29
> 
> It might be OK if you just change the declaration; I mean '1' is pretty
> close to true? (I think...)
> Anyway, at least make the new one a bool.

Agreed bool is better.  Can we rename it to something like "early_setup"?
"immutable" isn't clear on its most important attribute (on when it'll be
migrated).  Meanwhile I'd hope we can comment that explicitly.  I'd go with:

  /*
   * This VMSD describes something that should be sent during setup phase
   * of migration.  It plays similar role as save_setup() for explicitly
   * registered vmstate entries, the only difference is the vmsd will be
   * sent right at the start of migration.
   */
  bool early_setup;

> 
> > > >       int version_id;
> > > >       int minimum_version_id;
> > > >       MigrationPriority priority;
> > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > index ff2b8d0064..536d6f662b 100644
> > > > --- a/migration/savevm.c
> > > > +++ b/migration/savevm.c
> > > > @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
> > > >       trace_savevm_state_setup();
> > > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > > +        if (se->vmsd && se->vmsd->immutable) {
> > > > +            ret = vmstate_save(f, se, ms->vmdesc);
> > > > +            if (ret) {
> > > > +                qemu_file_set_error(f, ret);
> > > > +                break;
> > > > +            }
> > > > +            continue;
> > > > +        }
> > > > +
> > > 
> > > Does this give you the ordering you want? i.e. there's no guarantee here
> > > that immutables come first?
> > 
> > Yes, for virtio-mem at least this is fine. There are no real ordering
> > requirements in regard to save_setup().
> > 
> > I guess one could use vmstate priorities to affect the ordering, if
> > required.
> > 
> > So for my use case this is good enough, any suggestions? Thanks.
> 
> OK, but consider whether it might be better just to have a separate
> QTAILQ_FOREACH look in savevm_state_setup that first does all the
> immutables, and then all the setups.

After patch 1 the order may not matter iiuc, because each call to the
immutable vmsds calls the new vmstate_save() which will always send
QEMU_VM_SECTION_FULL and footers along the vmsd.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 22:14         ` Peter Xu
@ 2023-01-12 22:28           ` Peter Xu
  2023-01-13 13:47             ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2023-01-12 22:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Hildenbrand, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Thu, Jan 12, 2023 at 05:14:57PM -0500, Peter Xu wrote:
> On Thu, Jan 12, 2023 at 07:52:41PM +0000, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > On 12.01.23 18:56, Dr. David Alan Gilbert wrote:
> > > > * David Hildenbrand (david@redhat.com) wrote:
> > > > > For virtio-mem, we want to have the plugged/unplugged state of memory
> > > > > blocks available before migrating any actual RAM content, and perform
> > > > > sanity checks before touching anything on the destination. This
> > > > > information is immutable on the migration source while migration is active,
> > > > > 
> > > > > We want to use this information for proper preallocation support with
> > > > > migration: currently, we don't preallocate memory on the migration target,
> > > > > and especially with hugetlb, we can easily run out of hugetlb pages during
> > > > > RAM migration and will crash (SIGBUS) instead of catching this gracefully
> > > > > via preallocation.
> > > > > 
> > > > > Migrating device state via a vmsd before we start iterating is currently
> > > > > impossible: the only approach that would be possible is avoiding a vmsd
> > > > > and migrating state manually during save_setup(), to be restored during
> > > > > load_state().
> > > > > 
> > > > > Let's allow for migrating device state via a vmsd early, during the
> > > > > setup phase in qemu_savevm_state_setup(). To keep it simple, we
> > > > > indicate applicable vmds's using an "immutable" flag.
> > > > > 
> > > > > Note that only very selected devices (i.e., ones seriously messing with
> > > > > RAM setup) are supposed to make use of such early state migration.
> > > > > 
> > > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > > ---
> > > > >   include/migration/vmstate.h |  5 +++++
> > > > >   migration/savevm.c          | 14 ++++++++++++++
> > > > >   2 files changed, 19 insertions(+)
> > > > > 
> > > > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > > > index ad24aa1934..dd06c3abad 100644
> > > > > --- a/include/migration/vmstate.h
> > > > > +++ b/include/migration/vmstate.h
> > > > > @@ -179,6 +179,11 @@ struct VMStateField {
> > > > >   struct VMStateDescription {
> > > > >       const char *name;
> > > > >       int unmigratable;
> > > > > +    /*
> > > > > +     * The state is immutable while migration is active and is saved
> > > > > +     * during the setup phase, to be restored early on the destination.
> > > > > +     */
> > > > > +    int immutable;
> > > > 
> > > > A bool would be nicer (as it would for unmigratable above)
> > > 
> > > Yes, I chose an int for consistency with "unmigratable". I can turn that
> > > into a bool.
> > > 
> > > I'd even include a cleanup patch for unmigratable if it wouldn't be ...
> > > 
> > > $ git grep "unmigratable \=" | wc -l
> > > 29
> > 
> > It might be OK if you just change the declaration; I mean '1' is pretty
> > close to true? (I think...)
> > Anyway, at least make the new one a bool.
> 
> Agreed bool is better.  Can we rename it to something like "early_setup"?
> "immutable" isn't clear on its most important attribute (on when it'll be
> migrated).  Meanwhile I'd hope we can comment that explicitly.  I'd go with:
> 
>   /*
>    * This VMSD describes something that should be sent during setup phase
>    * of migration.  It plays similar role as save_setup() for explicitly
>    * registered vmstate entries, the only difference is the vmsd will be
>    * sent right at the start of migration.
>    */
>   bool early_setup;

Let me try some even better wording..

    /*
     * This VMSD describes something that should be sent during setup phase
     * of migration.  It plays similar role as save_setup() for explicitly
     * registered vmstate entries, so it can be seen as a way to describe
     * save_setup() in vmsd structures.
     *
     * One SaveStateEntry should either have the save_setup() specified or
     * the vmsd with early_setup set to true.  It should never have both
     * things set.
     */
    bool early_setup;

There's one tricky thing that we'll send QEMU_VM_SECTION_START for
save_setup() entries but QEMU_VM_SECTION_FULL for vmsd early_setup
entries.

David, do you think we can slightly modify your new version of
vmstate_save() so as to pass in the section_type?  I think it'll be even
cleaner to send QEMU_VM_SECTION_START for the early vmsds too.  I assume
this shouldn't affect your goal and anything else.

> 
> > 
> > > > >       int version_id;
> > > > >       int minimum_version_id;
> > > > >       MigrationPriority priority;
> > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > index ff2b8d0064..536d6f662b 100644
> > > > > --- a/migration/savevm.c
> > > > > +++ b/migration/savevm.c
> > > > > @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
> > > > >       trace_savevm_state_setup();
> > > > >       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > > > > +        if (se->vmsd && se->vmsd->immutable) {
> > > > > +            ret = vmstate_save(f, se, ms->vmdesc);
> > > > > +            if (ret) {
> > > > > +                qemu_file_set_error(f, ret);
> > > > > +                break;
> > > > > +            }
> > > > > +            continue;
> > > > > +        }
> > > > > +
> > > > 
> > > > Does this give you the ordering you want? i.e. there's no guarantee here
> > > > that immutables come first?
> > > 
> > > Yes, for virtio-mem at least this is fine. There are no real ordering
> > > requirements in regard to save_setup().
> > > 
> > > I guess one could use vmstate priorities to affect the ordering, if
> > > required.
> > > 
> > > So for my use case this is good enough, any suggestions? Thanks.
> > 
> > OK, but consider whether it might be better just to have a separate
> > QTAILQ_FOREACH look in savevm_state_setup that first does all the
> > immutables, and then all the setups.
> 
> After patch 1 the order may not matter iiuc, because each call to the
> immutable vmsds calls the new vmstate_save() which will always send
> QEMU_VM_SECTION_FULL and footers along the vmsd.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save()
  2023-01-12 18:36       ` Dr. David Alan Gilbert
@ 2023-01-13 12:59         ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 12:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

On 12.01.23 19:36, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> On 12.01.23 17:58, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (david@redhat.com) wrote:
>>>> Let's move more code into vmstate_save(), reducing code duplication and
>>>> preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
>>>> have to move vmstate_save() to make the compiler happy.
>>>>
>>>> We'll now also trace from qemu_save_device_state().
>>>
>>> Mostly OK, but..
>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>    migration/savevm.c | 79 ++++++++++++++++++++++------------------------
>>>
>>> Doesn't this also need to upate trace-events?
>>
>> The existing trace events from
>> qemu_savevm_state_complete_precopy_non_iterable() are simply moved to
>> vmstate_save(), so qemu_save_device_state() will implicitly use them.
>>
>> So no update should be needed (no new events), or am I missing something?
> 
> Aren't you losing the trace_savevm_state_setup() trace?

trace_savevm_state_setup() is called from qemu_savevm_state_setup() 
before/after this change.

Calling it from qemu_save_device_state() would be wrong: they skip the 
setup phase and don't call any save_setup() -- skipping all "se->is_ram".

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-12 22:06         ` Peter Xu
@ 2023-01-13 13:01           ` David Hildenbrand
  2023-01-13 13:05             ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 13:01 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Michael S . Tsirkin, Michal Privoznik

On 12.01.23 23:06, Peter Xu wrote:
> On Thu, Jan 12, 2023 at 06:40:00PM +0000, Dr. David Alan Gilbert wrote:
>> * David Hildenbrand (david@redhat.com) wrote:
>>> On 12.01.23 18:43, Dr. David Alan Gilbert wrote:
>>>> * David Hildenbrand (david@redhat.com) wrote:
>>>>> ... and store it in the migration state. This is a preparation for
>>>>> storing selected vmds's already in qemu_savevm_state_setup().
>>>>>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>    migration/migration.c |  4 ++++
>>>>>    migration/migration.h |  4 ++++
>>>>>    migration/savevm.c    | 18 ++++++++++++------
>>>>>    3 files changed, 20 insertions(+), 6 deletions(-)
>>>>>
>>>
>>> [1]
>>>
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index 52b5d39244..1d33a7efa0 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
>>>>>        s->vm_was_running = false;
>>>>>        s->iteration_initial_bytes = 0;
>>>>>        s->threshold_size = 0;
>>>>> +
>>>>> +    json_writer_free(s->vmdesc);
>>>>> +    s->vmdesc = NULL;
>>>>>    }
>>>
>>> [...]
>>>
>>>>>        trace_savevm_state_setup();
>>>>>        QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>>>>            if (!se->ops || !se->ops->save_setup) {
>>>>> @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>>>>                                                        bool in_postcopy,
>>>>>                                                        bool inactivate_disks)
>>>>>    {
>>>>> -    g_autoptr(JSONWriter) vmdesc = NULL;
>>>>> +    MigrationState *ms = migrate_get_current();
>>>>> +    JSONWriter *vmdesc = ms->vmdesc;
>>>>>        int vmdesc_len;
>>>>>        SaveStateEntry *se;
>>>>>        int ret;
>>>>> -    vmdesc = json_writer_new(false);
>>>>> -    json_writer_start_object(vmdesc, NULL);
>>>>> -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>>>>> -    json_writer_start_array(vmdesc, "devices");
>>>>>        QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>>>>            ret = vmstate_save(f, se, vmdesc);
>>>>>            if (ret) {
>>>>> @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>>>>            qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>>>>>        }
>>>>> +    /* Free it now to detect any inconsistencies. */
>>>>> +    json_writer_free(vmdesc);
>>>>> +    ms->vmdesc = NULL;
>>>>
>>>> and this only happens when this succesfully exits;  so if this errors
>>>> out, and then you retry an outwards migration, I think you've leaked a
>>>> writer.
>>>
>>> Shouldn't the change [1] to migrate_init() cover that?
>>
>> Hmm OK, yes it does - I guess it does mean you keep the allocation
>> around for a bit longer, but that's OK in practice since normally you'll
>> be quitting soon.
> 
> Instead of json_writer_free() here and there, how about free it in
> migrate_fd_cleanup() once and for all?
> 

Sure, if that works. I assume I can get rid of the migrate_init() and 
migration_instance_finalize() change then, correct?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-01-13 13:01           ` David Hildenbrand
@ 2023-01-13 13:05             ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 13:05 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Michael S . Tsirkin, Michal Privoznik

On 13.01.23 14:01, David Hildenbrand wrote:
> On 12.01.23 23:06, Peter Xu wrote:
>> On Thu, Jan 12, 2023 at 06:40:00PM +0000, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (david@redhat.com) wrote:
>>>> On 12.01.23 18:43, Dr. David Alan Gilbert wrote:
>>>>> * David Hildenbrand (david@redhat.com) wrote:
>>>>>> ... and store it in the migration state. This is a preparation for
>>>>>> storing selected vmds's already in qemu_savevm_state_setup().
>>>>>>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>     migration/migration.c |  4 ++++
>>>>>>     migration/migration.h |  4 ++++
>>>>>>     migration/savevm.c    | 18 ++++++++++++------
>>>>>>     3 files changed, 20 insertions(+), 6 deletions(-)
>>>>>>
>>>>
>>>> [1]
>>>>
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index 52b5d39244..1d33a7efa0 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
>>>>>>         s->vm_was_running = false;
>>>>>>         s->iteration_initial_bytes = 0;
>>>>>>         s->threshold_size = 0;
>>>>>> +
>>>>>> +    json_writer_free(s->vmdesc);
>>>>>> +    s->vmdesc = NULL;
>>>>>>     }
>>>>
>>>> [...]
>>>>
>>>>>>         trace_savevm_state_setup();
>>>>>>         QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>>>>>             if (!se->ops || !se->ops->save_setup) {
>>>>>> @@ -1390,15 +1395,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>>>>>                                                         bool in_postcopy,
>>>>>>                                                         bool inactivate_disks)
>>>>>>     {
>>>>>> -    g_autoptr(JSONWriter) vmdesc = NULL;
>>>>>> +    MigrationState *ms = migrate_get_current();
>>>>>> +    JSONWriter *vmdesc = ms->vmdesc;
>>>>>>         int vmdesc_len;
>>>>>>         SaveStateEntry *se;
>>>>>>         int ret;
>>>>>> -    vmdesc = json_writer_new(false);
>>>>>> -    json_writer_start_object(vmdesc, NULL);
>>>>>> -    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
>>>>>> -    json_writer_start_array(vmdesc, "devices");
>>>>>>         QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>>>>>             ret = vmstate_save(f, se, vmdesc);
>>>>>>             if (ret) {
>>>>>> @@ -1433,6 +1435,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>>>>>             qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
>>>>>>         }
>>>>>> +    /* Free it now to detect any inconsistencies. */
>>>>>> +    json_writer_free(vmdesc);
>>>>>> +    ms->vmdesc = NULL;
>>>>>
>>>>> and this only happens when this succesfully exits;  so if this errors
>>>>> out, and then you retry an outwards migration, I think you've leaked a
>>>>> writer.
>>>>
>>>> Shouldn't the change [1] to migrate_init() cover that?
>>>
>>> Hmm OK, yes it does - I guess it does mean you keep the allocation
>>> around for a bit longer, but that's OK in practice since normally you'll
>>> be quitting soon.
>>
>> Instead of json_writer_free() here and there, how about free it in
>> migrate_fd_cleanup() once and for all?
>>
> 
> Sure, if that works. I assume I can get rid of the migrate_init() and
> migration_instance_finalize() change then, correct?
> 

Yeah, that should be much better and matches how we handle the other 
members:

diff --git a/migration/migration.c b/migration/migration.c
index 1d33a7efa0..fcd2f20d7c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1903,6 +1903,8 @@ static void migrate_fd_cleanup(MigrationState *s)

      g_free(s->hostname);
      s->hostname = NULL;
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;

      qemu_savevm_state_cleanup();

@@ -2170,9 +2172,6 @@ void migrate_init(MigrationState *s)
      s->vm_was_running = false;
      s->iteration_initial_bytes = 0;
      s->threshold_size = 0;
-
-    json_writer_free(s->vmdesc);
-    s->vmdesc = NULL;
  }

  int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -4448,7 +4447,6 @@ static void migration_instance_finalize(Object *obj)
      qemu_sem_destroy(&ms->rp_state.rp_sem);
      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
      error_free(ms->error);
-    json_writer_free(ms->vmdesc);
  }

  static void migration_instance_init(Object *obj)


-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-12 22:28           ` Peter Xu
@ 2023-01-13 13:47             ` David Hildenbrand
  2023-01-13 15:20               ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 13:47 UTC (permalink / raw)
  To: Peter Xu, Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Michael S . Tsirkin, Michal Privoznik

[...]

>>> It might be OK if you just change the declaration; I mean '1' is pretty
>>> close to true? (I think...)
>>> Anyway, at least make the new one a bool.
>>
>> Agreed bool is better.  Can we rename it to something like "early_setup"?
>> "immutable" isn't clear on its most important attribute (on when it'll be
>> migrated).  Meanwhile I'd hope we can comment that explicitly.  I'd go with:
>>
>>    /*
>>     * This VMSD describes something that should be sent during setup phase
>>     * of migration.  It plays similar role as save_setup() for explicitly
>>     * registered vmstate entries, the only difference is the vmsd will be
>>     * sent right at the start of migration.
>>     */
>>    bool early_setup;
> 
> Let me try some even better wording..
> 
>      /*
>       * This VMSD describes something that should be sent during setup phase
>       * of migration.  It plays similar role as save_setup() for explicitly
>       * registered vmstate entries, so it can be seen as a way to describe
>       * save_setup() in vmsd structures.
>       *
>       * One SaveStateEntry should either have the save_setup() specified or
>       * the vmsd with early_setup set to true.  It should never have both
>       * things set.
>       */
>      bool early_setup;
> 

Thanks, I'll use that.

> There's one tricky thing that we'll send QEMU_VM_SECTION_START for
> save_setup() entries but QEMU_VM_SECTION_FULL for vmsd early_setup
> entries.

I think that makes sense for now, though: we only transmit a VMSD and 
VMSDs are transmitted once and are not iterable.

In comparison, for iterable things we expect a

QEMU_VM_SECTION_START
0..X QEMU_VM_SECTION_PART
QEMU_VM_SECTION_END


I assume you're thinking about "mixing" save_state() with an early vmsd 
in a SaveStateEntry. I don't think something like that would currently 
work (I'm pretty sure the core would have a hard time figuring out if to 
restore a vmsd or whether to send the input to load_state()?), neither 
can it be configured: we wither have se->ops or se->vmsd.

> 
> David, do you think we can slightly modify your new version of
> vmstate_save() so as to pass in the section_type?  I think it'll be even
> cleaner to send QEMU_VM_SECTION_START for the early vmsds too.  I assume
> this shouldn't affect your goal and anything else.

I'd prefer to not go down that path for now. QEMU_VM_SECTION_START 
without QEMU_VM_SECTION_PART and QEMU_VM_SECTION_END feels pretty 
incomplete and wrong to me.

If we want to do that in the future, we should conditionally send 
QEMU_VM_SECTION_START only if we have se->ops I assume?

> 
>>
>>>
>>>>>>        int version_id;
>>>>>>        int minimum_version_id;
>>>>>>        MigrationPriority priority;
>>>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>>>> index ff2b8d0064..536d6f662b 100644
>>>>>> --- a/migration/savevm.c
>>>>>> +++ b/migration/savevm.c
>>>>>> @@ -1200,6 +1200,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>>>>>        trace_savevm_state_setup();
>>>>>>        QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>>>>> +        if (se->vmsd && se->vmsd->immutable) {
>>>>>> +            ret = vmstate_save(f, se, ms->vmdesc);
>>>>>> +            if (ret) {
>>>>>> +                qemu_file_set_error(f, ret);
>>>>>> +                break;
>>>>>> +            }
>>>>>> +            continue;
>>>>>> +        }
>>>>>> +
>>>>>
>>>>> Does this give you the ordering you want? i.e. there's no guarantee here
>>>>> that immutables come first?
>>>>
>>>> Yes, for virtio-mem at least this is fine. There are no real ordering
>>>> requirements in regard to save_setup().
>>>>
>>>> I guess one could use vmstate priorities to affect the ordering, if
>>>> required.
>>>>
>>>> So for my use case this is good enough, any suggestions? Thanks.
>>>
>>> OK, but consider whether it might be better just to have a separate
>>> QTAILQ_FOREACH look in savevm_state_setup that first does all the
>>> immutables, and then all the setups.
>>
>> After patch 1 the order may not matter iiuc, because each call to the
>> immutable vmsds calls the new vmstate_save() which will always send
>> QEMU_VM_SECTION_FULL and footers along the vmsd.

Agreed. I'll leave it like that for now.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/8] virtio-mem: Migrate immutable properties early
  2023-01-12 19:44   ` Dr. David Alan Gilbert
@ 2023-01-13 13:59     ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 13:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Peter Xu, Michael S . Tsirkin,
	Michal Privoznik

On 12.01.23 20:44, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> The bitmap and the size are immutable while migration is active: see
>> virtio_mem_is_busy(). We can migrate this information early, before
>> migrating any actual RAM content. Further, all information we need for
>> sanity checks is immutable as well.
>>
>> Having this information in place early will, for example, allow for
>> properly preallocating memory before touching these memory locations
>> during RAM migration: this way, we can make sure that all memory was
>> actually preallocated and that any user errors (e.g., insufficient
>> hugetlb pages) can be handled gracefully.
>>
>> In contrast, usable_region_size and requested_size can theoretically
>> still be modified on the source while the VM is running. Keep migrating
>> these properties the usual, late, way.
>>
>> Use a new device property to keep behavior of compat machines
>> unmodified.
> 
> Can you get me a migration file from this? I want to try and understand
> what happens when you have the vmstate_register together with the ->vmsd -
> I'm not quite sure what ends up in the output.  Preferably for a VM with
> two virtio-mem's.

Sure, here is the stripped output from analyze-migration.py:

     "ram (2)": {
         "section sizes": {
             "0000:00:03.0/mem0": "0x0000000780000000",
             "0000:00:04.0/mem1": "0x0000000780000000",
             "pc.ram": "0x0000000100000000",
             "/rom@etc/acpi/tables": "0x0000000000020000",
             "pc.bios": "0x0000000000040000",
             "0000:00:02.0/e1000.rom": "0x0000000000040000",
             "pc.rom": "0x0000000000020000",
             "/rom@etc/table-loader": "0x0000000000001000",
             "/rom@etc/acpi/rsdp": "0x0000000000001000"
         }
     },
     "0000:00:03.0/virtio-mem-device-early (51)": {
         "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
         "size": "0x0000000040000000",
         "bitmap": "ff ff ff ff [...] "
     },
     "0000:00:04.0/virtio-mem-device-early (53)": {
         "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
         "size": "0x00000001fa400000",
         "bitmap": "ff ff ff ff [...] "
     },
     "timer (0)": {
         "cpu_ticks_offset": "0x00000073f5ba3d28",
         "unused": "00 00 00 00 00 00 00 00",
         "cpu_clock_offset": "0x00000026b744e29c"
     },
[...]
     "serial (50)": {
         "state": {
             "divider": "0x0001",
             "rbr": "0x00",
             "ier": "0x05",
             "iir": "0xc1",
             "lcr": "0x13",
             "mcr": "0x0b",
             "lsr": "0x60",
             "msr": "0xb0",
             "scr": "0x00",
             "fcr_vmstate": "0x81"
         }
     },
     "0000:00:03.0/virtio-mem (52)": {
         "virtio": "00 00 00 02 f4 1a 58 10 07 01 10 00 01 00 ff [...]"
     "0000:00:04.0/virtio-mem (54)": {
         "virtio": "00 00 00 02 f4 1a 58 10 07 01 10 00 01 00 ff [...]"

The data of both "virtio" blobs is extremely large, a lot 0x00 -- no idea what virtio
core stores in there.

Note that vmstate_virtio_mem_device ("virtio-mem-device") will be included by virtio core in the
"virtio" blob.

I can send you a full savevm file privately, just ping me.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-13 13:47             ` David Hildenbrand
@ 2023-01-13 15:20               ` Peter Xu
  2023-01-13 15:27                 ` Peter Xu
  2023-01-13 15:28                 ` David Hildenbrand
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Xu @ 2023-01-13 15:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Fri, Jan 13, 2023 at 02:47:24PM +0100, David Hildenbrand wrote:
> I'd prefer to not go down that path for now. QEMU_VM_SECTION_START without
> QEMU_VM_SECTION_PART and QEMU_VM_SECTION_END feels pretty incomplete and
> wrong to me.

That's fine.

> 
> If we want to do that in the future, we should conditionally send
> QEMU_VM_SECTION_START only if we have se->ops I assume?

Yes.  START/FULL frames are mostly replaceable afaiu in the stream ABI, so
we always have space to change no matter what.  Let's leave that as-is.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-13 15:20               ` Peter Xu
@ 2023-01-13 15:27                 ` Peter Xu
  2023-01-16 10:35                   ` David Hildenbrand
  2023-01-13 15:28                 ` David Hildenbrand
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2023-01-13 15:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Fri, Jan 13, 2023 at 10:20:31AM -0500, Peter Xu wrote:
> On Fri, Jan 13, 2023 at 02:47:24PM +0100, David Hildenbrand wrote:
> > I'd prefer to not go down that path for now. QEMU_VM_SECTION_START without
> > QEMU_VM_SECTION_PART and QEMU_VM_SECTION_END feels pretty incomplete and
> > wrong to me.
> 
> That's fine.
> 
> > 
> > If we want to do that in the future, we should conditionally send
> > QEMU_VM_SECTION_START only if we have se->ops I assume?
> 
> Yes.  START/FULL frames are mostly replaceable afaiu in the stream ABI, so
> we always have space to change no matter what.  Let's leave that as-is.

If so, please consider adding one more paragraph describing the difference
in vmsd early_setup comments (on using FULL for early vmsd and START for
save_setup), hopefully it'll make things clearer.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-13 15:20               ` Peter Xu
  2023-01-13 15:27                 ` Peter Xu
@ 2023-01-13 15:28                 ` David Hildenbrand
  1 sibling, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-13 15:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 13.01.23 16:20, Peter Xu wrote:
> On Fri, Jan 13, 2023 at 02:47:24PM +0100, David Hildenbrand wrote:
>> I'd prefer to not go down that path for now. QEMU_VM_SECTION_START without
>> QEMU_VM_SECTION_PART and QEMU_VM_SECTION_END feels pretty incomplete and
>> wrong to me.
> 
> That's fine.
> 
>>
>> If we want to do that in the future, we should conditionally send
>> QEMU_VM_SECTION_START only if we have se->ops I assume?
> 
> Yes.  START/FULL frames are mostly replaceable afaiu in the stream ABI, so
> we always have space to change no matter what.  Let's leave that as-is.

Thanks Peter! I'll send a new version early next week.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-13 15:27                 ` Peter Xu
@ 2023-01-16 10:35                   ` David Hildenbrand
  2023-01-16 14:56                     ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: David Hildenbrand @ 2023-01-16 10:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 13.01.23 16:27, Peter Xu wrote:
> On Fri, Jan 13, 2023 at 10:20:31AM -0500, Peter Xu wrote:
>> On Fri, Jan 13, 2023 at 02:47:24PM +0100, David Hildenbrand wrote:
>>> I'd prefer to not go down that path for now. QEMU_VM_SECTION_START without
>>> QEMU_VM_SECTION_PART and QEMU_VM_SECTION_END feels pretty incomplete and
>>> wrong to me.
>>
>> That's fine.
>>
>>>
>>> If we want to do that in the future, we should conditionally send
>>> QEMU_VM_SECTION_START only if we have se->ops I assume?
>>
>> Yes.  START/FULL frames are mostly replaceable afaiu in the stream ABI, so
>> we always have space to change no matter what.  Let's leave that as-is.
> 
> If so, please consider adding one more paragraph describing the difference
> in vmsd early_setup comments (on using FULL for early vmsd and START for
> save_setup), hopefully it'll make things clearer.

What about the following:

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 7bc0cd9de9..cc910cab0f 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -188,6 +188,11 @@ struct VMStateDescription {
       * One SaveStateEntry should either have the save_setup() specified or
       * the vmsd with early_setup set to true. It should never have both
       * things set.
+     *
+     * Note that for now, a SaveStateEntry cannot have a VMSD and
+     * operations (e.g., save_setup()) set at the same time. For this reason,
+     * also early_setup VMSDs are migrated in a QEMU_VM_SECTION_FULL section,
+     * while save_setup() data is migrated in a QEMU_VM_SECTION_START section.
       */
      bool early_setup;
      int version_id;

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-16 10:35                   ` David Hildenbrand
@ 2023-01-16 14:56                     ` Peter Xu
  2023-01-16 14:57                       ` David Hildenbrand
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2023-01-16 14:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On Mon, Jan 16, 2023 at 11:35:22AM +0100, David Hildenbrand wrote:
> What about the following:
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 7bc0cd9de9..cc910cab0f 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -188,6 +188,11 @@ struct VMStateDescription {
>       * One SaveStateEntry should either have the save_setup() specified or
>       * the vmsd with early_setup set to true. It should never have both
>       * things set.
> +     *
> +     * Note that for now, a SaveStateEntry cannot have a VMSD and
> +     * operations (e.g., save_setup()) set at the same time. For this reason,

This slightly duplicates with above?

> +     * also early_setup VMSDs are migrated in a QEMU_VM_SECTION_FULL section,
> +     * while save_setup() data is migrated in a QEMU_VM_SECTION_START section.
>       */

This looks good.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-01-16 14:56                     ` Peter Xu
@ 2023-01-16 14:57                       ` David Hildenbrand
  0 siblings, 0 replies; 37+ messages in thread
From: David Hildenbrand @ 2023-01-16 14:57 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, qemu-devel, Juan Quintela,
	Michael S . Tsirkin, Michal Privoznik

On 16.01.23 15:56, Peter Xu wrote:
> On Mon, Jan 16, 2023 at 11:35:22AM +0100, David Hildenbrand wrote:
>> What about the following:
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 7bc0cd9de9..cc910cab0f 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -188,6 +188,11 @@ struct VMStateDescription {
>>        * One SaveStateEntry should either have the save_setup() specified or
>>        * the vmsd with early_setup set to true. It should never have both
>>        * things set.
>> +     *
>> +     * Note that for now, a SaveStateEntry cannot have a VMSD and
>> +     * operations (e.g., save_setup()) set at the same time. For this reason,
> 
> This slightly duplicates with above?

Right, will merge both sections and simplify.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-01-16 14:58 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 16:43 [PATCH v3 0/8] virtio-mem: Handle preallocation with migration David Hildenbrand
2023-01-12 16:43 ` [PATCH v3 1/8] migration/savevm: Move more savevm handling into vmstate_save() David Hildenbrand
2023-01-12 16:58   ` Dr. David Alan Gilbert
2023-01-12 17:49     ` David Hildenbrand
2023-01-12 18:36       ` Dr. David Alan Gilbert
2023-01-13 12:59         ` David Hildenbrand
2023-01-12 16:43 ` [PATCH v3 2/8] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() David Hildenbrand
2023-01-12 17:43   ` Dr. David Alan Gilbert
2023-01-12 17:47     ` David Hildenbrand
2023-01-12 18:40       ` Dr. David Alan Gilbert
2023-01-12 22:06         ` Peter Xu
2023-01-13 13:01           ` David Hildenbrand
2023-01-13 13:05             ` David Hildenbrand
2023-01-12 16:43 ` [PATCH v3 3/8] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) David Hildenbrand
2023-01-12 17:56   ` Dr. David Alan Gilbert
2023-01-12 18:21     ` David Hildenbrand
2023-01-12 19:52       ` Dr. David Alan Gilbert
2023-01-12 22:14         ` Peter Xu
2023-01-12 22:28           ` Peter Xu
2023-01-13 13:47             ` David Hildenbrand
2023-01-13 15:20               ` Peter Xu
2023-01-13 15:27                 ` Peter Xu
2023-01-16 10:35                   ` David Hildenbrand
2023-01-16 14:56                     ` Peter Xu
2023-01-16 14:57                       ` David Hildenbrand
2023-01-13 15:28                 ` David Hildenbrand
2023-01-12 16:43 ` [PATCH v3 4/8] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() David Hildenbrand
2023-01-12 16:44 ` [PATCH v3 5/8] migration/ram: Factor out check for advised postcopy David Hildenbrand
2023-01-12 18:23   ` Dr. David Alan Gilbert
2023-01-12 16:44 ` [PATCH v3 6/8] virtio-mem: Fail if a memory backend with "prealloc=on" is specified David Hildenbrand
2023-01-12 18:33   ` Dr. David Alan Gilbert
2023-01-12 16:44 ` [PATCH v3 7/8] virtio-mem: Migrate immutable properties early David Hildenbrand
2023-01-12 19:44   ` Dr. David Alan Gilbert
2023-01-13 13:59     ` David Hildenbrand
2023-01-12 16:44 ` [PATCH v3 8/8] virtio-mem: Proper support for preallocation with migration David Hildenbrand
2023-01-12 19:50   ` Dr. David Alan Gilbert
2023-01-12 16:45 ` [PATCH v3 0/8] virtio-mem: Handle " David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).