qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/7] migration patches for VFIO
@ 2022-10-03  3:15 Juan Quintela
  2022-10-03  3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

Hi

VFIO migration has several requirements:
- the size of the state is only known when the guest is stopped
- they need to send possible lots of data.

this series only address the 1st set of problems.

What they do:
- res_compatible parameter was not used anywhere, just add that information to res_postcopy.
- Remove QEMUFILE parameter from save_live_pending
- Split save_live_pending into
  * save_pending_estimate(): the pending state size without trying too hard
  * save_pending_exact(): the real pending state size, it is called with the guest stopped.
- Now save_pending_* don't need the threshold parameter
- HACK a way to stop the guest before moving there.

ToDo:
- autoconverge test is broken, no real clue why, but it is possible that the test is wrong.

- Make an artifact to be able to send massive amount of data in the save state stage (probably more multifd channels).

- Be able to not having to start the guest between cheking the state pending size and migration_completion().

Please review.

Thanks, Juan.

Juan Quintela (7):
  migration: Remove res_compatible parameter
  migration: No save_live_pending() method uses the QEMUFile parameter
  migration: Block migration comment or code is wrong
  migration: Split save_live_pending() into state_pending_*
  migration: Remove unused threshold_size parameter
  migration: simplify migration_iteration_run()
  migration: call qemu_savevm_state_pending_exact() with the guest
    stopped

 docs/devel/migration.rst       | 18 ++++++------
 docs/devel/vfio-migration.rst  |  4 +--
 include/migration/register.h   | 29 ++++++++++---------
 migration/savevm.h             |  8 +++---
 hw/s390x/s390-stattrib.c       | 11 ++++---
 hw/vfio/migration.c            | 17 +++++------
 migration/block-dirty-bitmap.c | 14 ++++-----
 migration/block.c              | 17 ++++++-----
 migration/migration.c          | 52 ++++++++++++++++++++++------------
 migration/ram.c                | 35 ++++++++++++++++-------
 migration/savevm.c             | 37 +++++++++++++++++-------
 tests/qtest/migration-test.c   |  3 +-
 hw/vfio/trace-events           |  2 +-
 migration/trace-events         |  7 +++--
 14 files changed, 148 insertions(+), 106 deletions(-)

-- 
2.37.2



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 1/7] migration: Remove res_compatible parameter
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-11-22 13:54   ` Dr. David Alan Gilbert
  2022-10-03  3:15 ` [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

It was only used for RAM, and in that case, it means that this amount
of data was sent for memory.  Just delete the field in all callers.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/register.h   | 20 ++++++++++----------
 migration/savevm.h             |  4 +---
 hw/s390x/s390-stattrib.c       |  6 ++----
 hw/vfio/migration.c            | 10 ++++------
 migration/block-dirty-bitmap.c |  7 +++----
 migration/block.c              |  7 +++----
 migration/migration.c          |  9 ++++-----
 migration/ram.c                |  8 +++-----
 migration/savevm.c             | 14 +++++---------
 hw/vfio/trace-events           |  2 +-
 migration/trace-events         |  2 +-
 11 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..1950fee6a8 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -48,18 +48,18 @@ typedef struct SaveVMHandlers {
     int (*save_setup)(QEMUFile *f, void *opaque);
     void (*save_live_pending)(QEMUFile *f, void *opaque,
                               uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only);
+                              uint64_t *rest_precopy,
+                              uint64_t *rest_postcopy);
     /* Note for save_live_pending:
-     * - res_precopy_only is for data which must be migrated in precopy phase
-     *     or in stopped state, in other words - before target vm start
-     * - res_compatible is for data which may be migrated in any phase
-     * - res_postcopy_only is for data which must be migrated in postcopy phase
-     *     or in stopped state, in other words - after source vm stop
+     * - res_precopy is for data which must be migrated in precopy
+     *     phase or in stopped state, in other words - before target
+     *     vm start
+     * - res_postcopy is for data which must be migrated in postcopy
+     *     phase or in stopped state, in other words - after source vm
+     *     stop
      *
-     * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the
-     * whole amount of pending data.
+     * Sum of res_precopy and res_postcopy is the whole amount of
+     * pending data.
      */
 
 
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..9bd55c336c 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -41,9 +41,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only);
+                               uint64_t *res_precopy, uint64_t *res_postcopy);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9eda1c3b2a..ee60b53da4 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -183,16 +183,14 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
 }
 
 static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
+                              uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
     long long res = sac->get_dirtycount(sas);
 
     if (res >= 0) {
-        *res_precopy_only += res;
+        *res_precopy += res;
     }
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3de4252111..3423f113f0 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -458,9 +458,8 @@ static void vfio_save_cleanup(void *opaque)
 
 static void vfio_save_pending(QEMUFile *f, void *opaque,
                               uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
+                              uint64_t *res_precopy,
+                              uint64_t *res_postcopy)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -471,10 +470,9 @@ static void vfio_save_pending(QEMUFile *f, void *opaque,
         return;
     }
 
-    *res_precopy_only += migration->pending_bytes;
+    *res_precopy += migration->pending_bytes;
 
-    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
-                            *res_postcopy_only, *res_compatible);
+    trace_vfio_save_pending(vbasedev->name, *res_precopy, *res_postcopy);
 }
 
 static int vfio_save_iterate(QEMUFile *f, void *opaque)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9aba7d9c22..dfea546330 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -763,9 +763,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 
 static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
                                       uint64_t max_size,
-                                      uint64_t *res_precopy_only,
-                                      uint64_t *res_compatible,
-                                      uint64_t *res_postcopy_only)
+                                      uint64_t *res_precopy,
+                                      uint64_t *res_postcopy)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
@@ -785,7 +784,7 @@ static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
 
     trace_dirty_bitmap_save_pending(pending, max_size);
 
-    *res_postcopy_only += pending;
+    *res_postcopy += pending;
 }
 
 /* First occurrence of this bitmap. It should be created if doesn't exist */
diff --git a/migration/block.c b/migration/block.c
index 3577c815a9..4ae8f837b0 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -863,9 +863,8 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 }
 
 static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only)
+                               uint64_t *res_precopy,
+                               uint64_t *res_postcopy)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -886,7 +885,7 @@ static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     trace_migration_block_save_pending(pending);
     /* We don't do postcopy */
-    *res_precopy_only += pending;
+    *res_precopy += pending;
 }
 
 static int block_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/migration/migration.c b/migration/migration.c
index bb8bbddfe4..440aa62f16 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3734,15 +3734,14 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-    uint64_t pending_size, pend_pre, pend_compat, pend_post;
+    uint64_t pending_size, pend_pre, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
     qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
-                              &pend_compat, &pend_post);
-    pending_size = pend_pre + pend_compat + pend_post;
+                              &pend_post);
+    pending_size = pend_pre + pend_post;
 
-    trace_migrate_pending(pending_size, s->threshold_size,
-                          pend_pre, pend_compat, pend_post);
+    trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
 
     if (pending_size && pending_size >= s->threshold_size) {
         /* Still a significant amount to transfer */
diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..20167e1102 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 
 static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
-                             uint64_t *res_precopy_only,
-                             uint64_t *res_compatible,
-                             uint64_t *res_postcopy_only)
+                             uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     RAMState **temp = opaque;
     RAMState *rs = *temp;
@@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     if (migrate_postcopy_ram()) {
         /* We can do postcopy, and all the data is postcopiable */
-        *res_compatible += remaining_size;
+        *res_postcopy += remaining_size;
     } else {
-        *res_precopy_only += remaining_size;
+        *res_precopy += remaining_size;
     }
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c..a752ff9ea1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1472,16 +1472,13 @@ flush:
  * for units that can't do postcopy.
  */
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only)
+                               uint64_t *res_precopy,
+                               uint64_t *res_postcopy)
 {
     SaveStateEntry *se;
 
-    *res_precopy_only = 0;
-    *res_compatible = 0;
-    *res_postcopy_only = 0;
-
+    *res_precopy = 0;
+    *res_postcopy = 0;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -1493,8 +1490,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
             }
         }
         se->ops->save_live_pending(f, se->opaque, threshold_size,
-                                   res_precopy_only, res_compatible,
-                                   res_postcopy_only);
+                                   res_precopy, res_postcopy);
     }
 }
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9e00..a21cbd2a56 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -157,7 +157,7 @@ vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
 vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
+vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
 vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
diff --git a/migration/trace-events b/migration/trace-events
index 57003edcbd..f2a873fd6c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -150,7 +150,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
  2022-10-03  3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-11-22 17:48   ` Dr. David Alan Gilbert
  2022-10-03  3:15 ` [RFC 3/7] migration: Block migration comment or code is wrong Juan Quintela
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

So remove it everywhere.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/register.h   | 6 ++----
 migration/savevm.h             | 2 +-
 hw/s390x/s390-stattrib.c       | 2 +-
 hw/vfio/migration.c            | 6 ++----
 migration/block-dirty-bitmap.c | 5 ++---
 migration/block.c              | 2 +-
 migration/migration.c          | 3 +--
 migration/ram.c                | 2 +-
 migration/savevm.c             | 5 ++---
 9 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 1950fee6a8..5b5424ed8f 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,10 +46,8 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_setup)(QEMUFile *f, void *opaque);
-    void (*save_live_pending)(QEMUFile *f, void *opaque,
-                              uint64_t threshold_size,
-                              uint64_t *rest_precopy,
-                              uint64_t *rest_postcopy);
+    void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
+                              uint64_t *rest_precopy, uint64_t *rest_postcopy);
     /* Note for save_live_pending:
      * - res_precopy is for data which must be migrated in precopy
      *     phase or in stopped state, in other words - before target
diff --git a/migration/savevm.h b/migration/savevm.h
index 9bd55c336c..98fae6f9b3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+void qemu_savevm_state_pending(uint64_t max_size,
                                uint64_t *res_precopy, uint64_t *res_postcopy);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index ee60b53da4..9b74eeadf3 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void cmma_save_pending(void *opaque, uint64_t max_size,
                               uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3423f113f0..760d5f3c5c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,10 +456,8 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(QEMUFile *f, void *opaque,
-                              uint64_t threshold_size,
-                              uint64_t *res_precopy,
-                              uint64_t *res_postcopy)
+static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
+                              uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index dfea546330..a445bdc3c3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -761,9 +761,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
-                                      uint64_t max_size,
-                                      uint64_t *res_precopy,
+static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
+                                      uint64_t *res_precopy, 
                                       uint64_t *res_postcopy)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
diff --git a/migration/block.c b/migration/block.c
index 4ae8f837b0..b3d680af75 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void block_save_pending(void *opaque, uint64_t max_size,
                                uint64_t *res_precopy,
                                uint64_t *res_postcopy)
 {
diff --git a/migration/migration.c b/migration/migration.c
index 440aa62f16..038fc58a96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3737,8 +3737,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pending_size, pend_pre, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
-                              &pend_post);
+    qemu_savevm_state_pending(s->threshold_size, &pend_pre, &pend_post);
     pending_size = pend_pre + pend_post;
 
     trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
diff --git a/migration/ram.c b/migration/ram.c
index 20167e1102..48a31b87c8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3434,7 +3434,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void ram_save_pending(void *opaque, uint64_t max_size,
                              uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     RAMState **temp = opaque;
diff --git a/migration/savevm.c b/migration/savevm.c
index a752ff9ea1..d937ab0b2e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1471,8 +1471,7 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
-                               uint64_t *res_precopy,
+void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
                                uint64_t *res_postcopy)
 {
     SaveStateEntry *se;
@@ -1489,7 +1488,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->save_live_pending(f, se->opaque, threshold_size,
+        se->ops->save_live_pending(se->opaque, threshold_size,
                                    res_precopy, res_postcopy);
     }
 }
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 3/7] migration: Block migration comment or code is wrong
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
  2022-10-03  3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
  2022-10-03  3:15 ` [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-10-03 19:12   ` Stefan Hajnoczi
  2022-10-03  3:15 ` [RFC 4/7] migration: Split save_live_pending() into state_pending_* Juan Quintela
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b3d680af75..39ce4003c6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -879,8 +879,8 @@ static void block_save_pending(void *opaque, uint64_t max_size,
     blk_mig_unlock();
 
     /* Report at least one block pending during bulk phase */
-    if (pending <= max_size && !block_mig_state.bulk_completed) {
-        pending = max_size + BLK_MIG_BLOCK_SIZE;
+    if (!pending && !block_mig_state.bulk_completed) {
+        pending = BLK_MIG_BLOCK_SIZE;
     }
 
     trace_migration_block_save_pending(pending);
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 4/7] migration: Split save_live_pending() into state_pending_*
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
                   ` (2 preceding siblings ...)
  2022-10-03  3:15 ` [RFC 3/7] migration: Block migration comment or code is wrong Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-11-22 20:08   ` Dr. David Alan Gilbert
  2022-10-03  3:15 ` [RFC 5/7] migration: Remove unused threshold_size parameter Juan Quintela
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

We split the function into to:

- state_pending_estimate: We estimate the remaining state size without
  stopping the machine.

- state pending_exact: We calculate the exact amount of remaining
  state.

The only "device" that implements different functions for _estimate()
and _exact() is ram.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 docs/devel/migration.rst       | 18 ++++++++++--------
 docs/devel/vfio-migration.rst  |  4 ++--
 include/migration/register.h   | 12 ++++++++----
 migration/savevm.h             |  8 ++++++--
 hw/s390x/s390-stattrib.c       |  7 ++++---
 hw/vfio/migration.c            |  9 +++++----
 migration/block-dirty-bitmap.c | 11 ++++++-----
 migration/block.c              | 11 ++++++-----
 migration/migration.c          | 13 +++++++++----
 migration/ram.c                | 31 ++++++++++++++++++++++++-------
 migration/savevm.c             | 34 +++++++++++++++++++++++++++++-----
 hw/vfio/trace-events           |  2 +-
 migration/trace-events         |  7 ++++---
 13 files changed, 114 insertions(+), 53 deletions(-)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 3e9656d8e0..6f65c23b47 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -482,15 +482,17 @@ An iterative device must provide:
   - A ``load_setup`` function that initialises the data structures on the
     destination.
 
-  - A ``save_live_pending`` function that is called repeatedly and must
-    indicate how much more data the iterative data must save.  The core
-    migration code will use this to determine when to pause the CPUs
-    and complete the migration.
+  - A ``state_pending_exact`` function that indicates how much more
+    data we must save.  The core migration code will use this to
+    determine when to pause the CPUs and complete the migration.
 
-  - A ``save_live_iterate`` function (called after ``save_live_pending``
-    when there is significant data still to be sent).  It should send
-    a chunk of data until the point that stream bandwidth limits tell it
-    to stop.  Each call generates one section.
+  - A ``state_pending_estimate`` function that indicates how much more
+    data we must save.  When the estimated amount is smaller than the
+    threshold, we call ``state_pending_exact``.
+
+  - A ``save_live_iterate`` function should send a chunk of data until
+    the point that stream bandwidth limits tell it to stop.  Each call
+    generates one section.
 
   - A ``save_live_complete_precopy`` function that must transmit the
     last section for the device containing any remaining data.
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..673057c90d 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach as follows:
 * A ``load_setup`` function that sets up the migration region on the
   destination and sets _RESUMING flag in the VFIO device state.
 
-* A ``save_live_pending`` function that reads pending_bytes from the vendor
+* A ``state_pending_exact`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
@@ -114,7 +114,7 @@ Live migration save path
                     (RUNNING, _SETUP, _RUNNING|_SAVING)
                                   |
                     (RUNNING, _ACTIVE, _RUNNING|_SAVING)
-             If device is active, get pending_bytes by .save_live_pending()
+             If device is active, get pending_bytes by .state_pending_exact()
           If total pending_bytes >= threshold_size, call .save_live_iterate()
                   Data of VFIO device for pre-copy phase is copied
         Iterate till total pending bytes converge and are less than threshold
diff --git a/include/migration/register.h b/include/migration/register.h
index 5b5424ed8f..313b8e1c3b 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,9 +46,7 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_setup)(QEMUFile *f, void *opaque);
-    void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
-                              uint64_t *rest_precopy, uint64_t *rest_postcopy);
-    /* Note for save_live_pending:
+    /* Note for state_pending_*:
      * - res_precopy is for data which must be migrated in precopy
      *     phase or in stopped state, in other words - before target
      *     vm start
@@ -59,7 +57,13 @@ typedef struct SaveVMHandlers {
      * Sum of res_precopy and res_postcopy is the whole amount of
      * pending data.
      */
-
+    /* This calculate the exact remaining data to transfer */
+    void (*state_pending_exact)(void *opaque,  uint64_t threshold_size,
+                                uint64_t *rest_precopy, uint64_t *rest_postcopy);
+    /* This estimates the remaining data to transfer */
+    void (*state_pending_estimate)(void *opaque,  uint64_t threshold_size,
+                                   uint64_t *rest_precopy,
+                                   uint64_t *rest_postcopy);
 
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
diff --git a/migration/savevm.h b/migration/savevm.h
index 98fae6f9b3..613f85e717 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,8 +40,12 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending(uint64_t max_size,
-                               uint64_t *res_precopy, uint64_t *res_postcopy);
+void qemu_savevm_state_pending_exact(uint64_t max_size,
+                                     uint64_t *res_precopy,
+                                     uint64_t *res_postcopy);
+void qemu_savevm_state_pending_estimate(uint64_t max_size,
+                                        uint64_t *res_precopy,
+                                        uint64_t *res_postcopy);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9b74eeadf3..dfb95eb20c 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,8 +182,8 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_save_pending(void *opaque, uint64_t max_size,
-                              uint64_t *res_precopy, uint64_t *res_postcopy)
+static void cmma_state_pending(void *opaque, uint64_t *res_precopy,
+                               uint64_t *res_postcopy)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
@@ -369,7 +369,8 @@ static SaveVMHandlers savevm_s390_stattrib_handlers = {
     .save_setup = cmma_save_setup,
     .save_live_iterate = cmma_save_iterate,
     .save_live_complete_precopy = cmma_save_complete,
-    .save_live_pending = cmma_save_pending,
+    .state_pending_exact = cmma_state_pending,
+    .state_pending_estimate = cmma_state_pending,
     .save_cleanup = cmma_save_cleanup,
     .load_state = cmma_load,
     .is_active = cmma_active,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 760d5f3c5c..680cf4df6e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,8 +456,8 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
-                              uint64_t *res_precopy, uint64_t *res_postcopy)
+static void vfio_state_pending(void *opaque,  uint64_t threshold_size,
+                               uint64_t *res_precopy, uint64_t *res_postcopy)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -470,7 +470,7 @@ static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
 
     *res_precopy += migration->pending_bytes;
 
-    trace_vfio_save_pending(vbasedev->name, *res_precopy, *res_postcopy);
+    trace_vfio_state_pending(vbasedev->name, *res_precopy, *res_postcopy);
 }
 
 static int vfio_save_iterate(QEMUFile *f, void *opaque)
@@ -681,7 +681,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_vfio_handlers = {
     .save_setup = vfio_save_setup,
     .save_cleanup = vfio_save_cleanup,
-    .save_live_pending = vfio_save_pending,
+    .state_pending_exact = vfio_state_pending,
+    .state_pending_estimate = vfio_state_pending,
     .save_live_iterate = vfio_save_iterate,
     .save_live_complete_precopy = vfio_save_complete_precopy,
     .save_state = vfio_save_state,
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index a445bdc3c3..5b24007650 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -761,9 +761,9 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
-                                      uint64_t *res_precopy, 
-                                      uint64_t *res_postcopy)
+static void dirty_bitmap_state_pending(void *opaque, uint64_t max_size,
+                                       uint64_t *res_precopy,
+                                       uint64_t *res_postcopy)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
@@ -781,7 +781,7 @@ static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
 
     qemu_mutex_unlock_iothread();
 
-    trace_dirty_bitmap_save_pending(pending, max_size);
+    trace_dirty_bitmap_state_pending(pending);
 
     *res_postcopy += pending;
 }
@@ -1250,7 +1250,8 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_live_complete_postcopy = dirty_bitmap_save_complete,
     .save_live_complete_precopy = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
-    .save_live_pending = dirty_bitmap_save_pending,
+    .state_pending_exact = dirty_bitmap_state_pending,
+    .state_pending_estimate = dirty_bitmap_state_pending,
     .save_live_iterate = dirty_bitmap_save_iterate,
     .is_active_iterate = dirty_bitmap_is_active_iterate,
     .load_state = dirty_bitmap_load,
diff --git a/migration/block.c b/migration/block.c
index 39ce4003c6..8e6ad1c468 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -862,9 +862,9 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_save_pending(void *opaque, uint64_t max_size,
-                               uint64_t *res_precopy,
-                               uint64_t *res_postcopy)
+static void block_state_pending(void *opaque, uint64_t max_size,
+                                uint64_t *res_precopy,
+                                uint64_t *res_postcopy)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -883,7 +883,7 @@ static void block_save_pending(void *opaque, uint64_t max_size,
         pending = BLK_MIG_BLOCK_SIZE;
     }
 
-    trace_migration_block_save_pending(pending);
+    trace_migration_block_state_pending(pending);
     /* We don't do postcopy */
     *res_precopy += pending;
 }
@@ -1018,7 +1018,8 @@ static SaveVMHandlers savevm_block_handlers = {
     .save_setup = block_save_setup,
     .save_live_iterate = block_save_iterate,
     .save_live_complete_precopy = block_save_complete,
-    .save_live_pending = block_save_pending,
+    .state_pending_exact = block_state_pending,
+    .state_pending_estimate = block_state_pending,
     .load_state = block_load,
     .save_cleanup = block_migration_cleanup,
     .is_active = block_is_active,
diff --git a/migration/migration.c b/migration/migration.c
index 038fc58a96..4676568699 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3734,13 +3734,18 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-    uint64_t pending_size, pend_pre, pend_post;
+    uint64_t pend_pre, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending(s->threshold_size, &pend_pre, &pend_post);
-    pending_size = pend_pre + pend_post;
+    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre, &pend_post);
+    uint64_t pending_size = pend_pre + pend_post;
+    trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
 
-    trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
+    if (pend_pre <= s->threshold_size) {
+        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre, &pend_post);
+        pending_size = pend_pre + pend_post;
+        trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
+    }
 
     if (pending_size && pending_size >= s->threshold_size) {
         /* Still a significant amount to transfer */
diff --git a/migration/ram.c b/migration/ram.c
index 48a31b87c8..8d989d51db 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3434,17 +3434,33 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_save_pending(void *opaque, uint64_t max_size,
-                             uint64_t *res_precopy, uint64_t *res_postcopy)
+static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
+                                       uint64_t *res_precopy,
+                                       uint64_t *res_postcopy)
 {
     RAMState **temp = opaque;
     RAMState *rs = *temp;
-    uint64_t remaining_size;
 
-    remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
 
-    if (!migration_in_postcopy() &&
-        remaining_size < max_size) {
+    if (migrate_postcopy_ram()) {
+        /* We can do postcopy, and all the data is postcopiable */
+        *res_postcopy += remaining_size;
+    } else {
+        *res_precopy += remaining_size;
+    }
+}
+
+static void ram_state_pending_exact(void *opaque, uint64_t max_size,
+                                    uint64_t *res_precopy,
+                                    uint64_t *res_postcopy)
+{
+    RAMState **temp = opaque;
+    RAMState *rs = *temp;
+
+    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+
+    if (!migration_in_postcopy()) {
         qemu_mutex_lock_iothread();
         WITH_RCU_READ_LOCK_GUARD() {
             migration_bitmap_sync_precopy(rs);
@@ -4600,7 +4616,8 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
-    .save_live_pending = ram_save_pending,
+    .state_pending_exact = ram_state_pending_exact,
+    .state_pending_estimate = ram_state_pending_estimate,
     .load_state = ram_load,
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
diff --git a/migration/savevm.c b/migration/savevm.c
index d937ab0b2e..976ece3f3f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1471,8 +1471,9 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
-                               uint64_t *res_postcopy)
+void qemu_savevm_state_pending_exact(uint64_t threshold_size,
+                                     uint64_t *res_precopy,
+                                     uint64_t *res_postcopy)
 {
     SaveStateEntry *se;
 
@@ -1480,7 +1481,7 @@ void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
     *res_postcopy = 0;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_pending) {
+        if (!se->ops || !se->ops->state_pending_exact) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1488,8 +1489,31 @@ void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
                 continue;
             }
         }
-        se->ops->save_live_pending(se->opaque, threshold_size,
-                                   res_precopy, res_postcopy);
+        se->ops->state_pending_exact(se->opaque, threshold_size,
+                                     res_precopy, res_postcopy);
+    }
+}
+
+void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
+                                        uint64_t *res_precopy,
+                                        uint64_t *res_postcopy)
+{
+    SaveStateEntry *se;
+
+    *res_precopy = 0;
+    *res_postcopy = 0;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->state_pending_estimate) {
+            continue;
+        }
+        if (se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        se->ops->state_pending_estimate(se->opaque, threshold_size,
+                                        res_precopy, res_postcopy);
     }
 }
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a21cbd2a56..90a8aecb37 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -157,7 +157,7 @@ vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
 vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
+vfio_state_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
 vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
diff --git a/migration/trace-events b/migration/trace-events
index f2a873fd6c..84352f310a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -150,7 +150,8 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_estimate(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_exact(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
@@ -330,7 +331,7 @@ send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors, uin
 dirty_bitmap_save_iterate(int in_postcopy) "in postcopy: %d"
 dirty_bitmap_save_complete_enter(void) ""
 dirty_bitmap_save_complete_finish(void) ""
-dirty_bitmap_save_pending(uint64_t pending, uint64_t max_size) "pending %" PRIu64 " max: %" PRIu64
+dirty_bitmap_state_pending(uint64_t pending) "pending %" PRIu64
 dirty_bitmap_load_complete(void) ""
 dirty_bitmap_load_bits_enter(uint64_t first_sector, uint32_t nr_sectors) "chunk: %" PRIu64 " %" PRIu32
 dirty_bitmap_load_bits_zeroes(void) ""
@@ -355,7 +356,7 @@ migration_block_save_device_dirty(int64_t sector) "Error reading sector %" PRId6
 migration_block_flush_blks(const char *action, int submitted, int read_done, int transferred) "%s submitted %d read_done %d transferred %d"
 migration_block_save(const char *mig_stage, int submitted, int transferred) "Enter save live %s submitted %d transferred %d"
 migration_block_save_complete(void) "Block migration completed"
-migration_block_save_pending(uint64_t pending) "Enter save live pending  %" PRIu64
+migration_block_state_pending(uint64_t pending) "Enter save live pending  %" PRIu64
 
 # page_cache.c
 migration_pagecache_init(int64_t max_num_items) "Setting cache buckets to %" PRId64
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 5/7] migration: Remove unused threshold_size parameter
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
                   ` (3 preceding siblings ...)
  2022-10-03  3:15 ` [RFC 4/7] migration: Split save_live_pending() into state_pending_* Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-11-23 16:38   ` Dr. David Alan Gilbert
  2022-10-03  3:15 ` [RFC 6/7] migration: simplify migration_iteration_run() Juan Quintela
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

Until previous commit, save_live_pending() was used for ram.  Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/register.h   |  7 +++----
 migration/savevm.h             |  6 ++----
 hw/vfio/migration.c            |  6 +++---
 migration/block-dirty-bitmap.c |  3 +--
 migration/block.c              |  3 +--
 migration/migration.c          |  4 ++--
 migration/ram.c                |  6 ++----
 migration/savevm.c             | 12 ++++--------
 8 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 313b8e1c3b..5f08204fb2 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -58,11 +58,10 @@ typedef struct SaveVMHandlers {
      * pending data.
      */
     /* This calculate the exact remaining data to transfer */
-    void (*state_pending_exact)(void *opaque,  uint64_t threshold_size,
-                                uint64_t *rest_precopy, uint64_t *rest_postcopy);
+    void (*state_pending_exact)(void *opaque, uint64_t *rest_precopy,
+                                uint64_t *rest_postcopy);
     /* This estimates the remaining data to transfer */
-    void (*state_pending_estimate)(void *opaque,  uint64_t threshold_size,
-                                   uint64_t *rest_precopy,
+    void (*state_pending_estimate)(void *opaque, uint64_t *rest_precopy,
                                    uint64_t *rest_postcopy);
 
     LoadStateHandler *load_state;
diff --git a/migration/savevm.h b/migration/savevm.h
index 613f85e717..24f2d2a28b 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,11 +40,9 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending_exact(uint64_t max_size,
-                                     uint64_t *res_precopy,
+void qemu_savevm_state_pending_exact(uint64_t *res_precopy,
                                      uint64_t *res_postcopy);
-void qemu_savevm_state_pending_estimate(uint64_t max_size,
-                                        uint64_t *res_precopy,
+void qemu_savevm_state_pending_estimate(uint64_t *res_precopy,
                                         uint64_t *res_postcopy);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 680cf4df6e..d9598ce070 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,8 +456,8 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_state_pending(void *opaque,  uint64_t threshold_size,
-                               uint64_t *res_precopy, uint64_t *res_postcopy)
+static void vfio_state_pending(void *opaque, uint64_t *res_precopy,
+                               uint64_t *res_postcopy)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -511,7 +511,7 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     }
 
     /*
-     * Reset pending_bytes as .save_live_pending is not called during savevm or
+     * Reset pending_bytes as .state_pending_* is not called during savevm or
      * snapshot case, in such case vfio_update_pending() at the start of this
      * function updates pending_bytes.
      */
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 5b24007650..8a11577252 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -761,8 +761,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_state_pending(void *opaque, uint64_t max_size,
-                                       uint64_t *res_precopy,
+static void dirty_bitmap_state_pending(void *opaque, uint64_t *res_precopy,
                                        uint64_t *res_postcopy)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
diff --git a/migration/block.c b/migration/block.c
index 8e6ad1c468..4f1f7c0f8d 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -862,8 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_state_pending(void *opaque, uint64_t max_size,
-                                uint64_t *res_precopy,
+static void block_state_pending(void *opaque, uint64_t *res_precopy,
                                 uint64_t *res_postcopy)
 {
     /* Estimate pending number of bytes to send */
diff --git a/migration/migration.c b/migration/migration.c
index 4676568699..97fefd579e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3737,12 +3737,12 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pend_pre, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre, &pend_post);
+    qemu_savevm_state_pending_estimate(&pend_pre, &pend_post);
     uint64_t pending_size = pend_pre + pend_post;
     trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
 
     if (pend_pre <= s->threshold_size) {
-        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre, &pend_post);
+        qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
         pending_size = pend_pre + pend_post;
         trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
     }
diff --git a/migration/ram.c b/migration/ram.c
index 8d989d51db..53a5dd64a8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3434,8 +3434,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
-                                       uint64_t *res_precopy,
+static void ram_state_pending_estimate(void *opaque, uint64_t *res_precopy,
                                        uint64_t *res_postcopy)
 {
     RAMState **temp = opaque;
@@ -3451,8 +3450,7 @@ static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
     }
 }
 
-static void ram_state_pending_exact(void *opaque, uint64_t max_size,
-                                    uint64_t *res_precopy,
+static void ram_state_pending_exact(void *opaque, uint64_t *res_precopy,
                                     uint64_t *res_postcopy)
 {
     RAMState **temp = opaque;
diff --git a/migration/savevm.c b/migration/savevm.c
index 976ece3f3f..1e0328e088 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1471,8 +1471,7 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending_exact(uint64_t threshold_size,
-                                     uint64_t *res_precopy,
+void qemu_savevm_state_pending_exact(uint64_t *res_precopy,
                                      uint64_t *res_postcopy)
 {
     SaveStateEntry *se;
@@ -1489,13 +1488,11 @@ void qemu_savevm_state_pending_exact(uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->state_pending_exact(se->opaque, threshold_size,
-                                     res_precopy, res_postcopy);
+        se->ops->state_pending_exact(se->opaque, res_precopy, res_postcopy);
     }
 }
 
-void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
-                                        uint64_t *res_precopy,
+void qemu_savevm_state_pending_estimate(uint64_t *res_precopy,
                                         uint64_t *res_postcopy)
 {
     SaveStateEntry *se;
@@ -1512,8 +1509,7 @@ void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->state_pending_estimate(se->opaque, threshold_size,
-                                        res_precopy, res_postcopy);
+        se->ops->state_pending_estimate(se->opaque, res_precopy, res_postcopy);
     }
 }
 
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 6/7] migration: simplify migration_iteration_run()
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
                   ` (4 preceding siblings ...)
  2022-10-03  3:15 ` [RFC 5/7] migration: Remove unused threshold_size parameter Juan Quintela
@ 2022-10-03  3:15 ` Juan Quintela
  2022-11-23 17:39   ` Dr. David Alan Gilbert
  2022-10-03  3:16 ` [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped Juan Quintela
  2022-10-20 14:56 ` [RFC 0/7] migration patches for VFIO Yishai Hadas
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 97fefd579e..35e512887a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3747,23 +3747,23 @@ static MigIterateState migration_iteration_run(MigrationState *s)
         trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
     }
 
-    if (pending_size && pending_size >= s->threshold_size) {
-        /* Still a significant amount to transfer */
-        if (!in_postcopy && pend_pre <= s->threshold_size &&
-            qatomic_read(&s->start_postcopy)) {
-            if (postcopy_start(s)) {
-                error_report("%s: postcopy failed to start", __func__);
-            }
-            return MIG_ITERATE_SKIP;
-        }
-        /* Just another iteration step */
-        qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
-    } else {
+    if (pending_size < s->threshold_size) {
         trace_migration_thread_low_pending(pending_size);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
+    /* Still a significant amount to transfer */
+    if (!in_postcopy && pend_pre <= s->threshold_size &&
+        qatomic_read(&s->start_postcopy)) {
+        if (postcopy_start(s)) {
+            error_report("%s: postcopy failed to start", __func__);
+        }
+        return MIG_ITERATE_SKIP;
+    }
+
+    /* Just another iteration step */
+    qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
     return MIG_ITERATE_RESUME;
 }
 
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
                   ` (5 preceding siblings ...)
  2022-10-03  3:15 ` [RFC 6/7] migration: simplify migration_iteration_run() Juan Quintela
@ 2022-10-03  3:16 ` Juan Quintela
  2022-10-13 12:25   ` Joao Martins
  2022-10-20 14:56 ` [RFC 0/7] migration patches for VFIO Yishai Hadas
  7 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-03  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Juan Quintela, Paolo Bonzini, qemu-block, Eric Farman,
	Richard Henderson, David Hildenbrand

HACK ahead.

There are devices that require the guest to be stopped to tell us what
is the size of its state.  So we need to stop the vm "before" we
cal the functions.

It is a hack because:
- we are "starting" the guest again to stop it in migration_complete()
  I know, I know, but it is not trivial to get all the information
  easily to migration_complete(), so this hack.

- auto_converge test fails with this hack.  I think that it is related
  to previous problem.  We start the guest when it is supposed to be
  stopped for convergence reasons.

- All experiments that I did to do the proper thing failed with having
  the iothread_locked() or try to unlock() it when not locked.

- several of the pending functions are using the iothread lock
  themselves, so I need to split it to have two versions (one for the
  _estimate() case with the iothread lock), and another for the
  _exact() case without the iothread_lock().  I want comments about
  this approach before I try to continue on this direction.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c        | 13 +++++++++++++
 tests/qtest/migration-test.c |  3 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 35e512887a..7374884818 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3742,7 +3742,20 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
 
     if (pend_pre <= s->threshold_size) {
+        int old_state = s->state;
+        qemu_mutex_lock_iothread();
+        // is this really necessary?  it works for me both ways.
+        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+        s->vm_was_running = runstate_is_running();
+        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        qemu_mutex_unlock_iothread();
         qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
+        qemu_mutex_lock_iothread();
+        runstate_set(old_state);
+        if (s->vm_was_running) {
+            vm_start();
+        }
+        qemu_mutex_unlock_iothread();
         pending_size = pend_pre + pend_post;
         trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
     }
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0d153d6b5e..0541a842ec 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2564,7 +2564,8 @@ int main(int argc, char **argv)
     qtest_add_func("/migration/validate_uuid_dst_not_set",
                    test_validate_uuid_dst_not_set);
 
-    qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
+    if (0)
+        qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
     qtest_add_func("/migration/multifd/tcp/plain/none",
                    test_multifd_tcp_none);
     qtest_add_func("/migration/multifd/tcp/plain/cancel",
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC 3/7] migration: Block migration comment or code is wrong
  2022-10-03  3:15 ` [RFC 3/7] migration: Block migration comment or code is wrong Juan Quintela
@ 2022-10-03 19:12   ` Stefan Hajnoczi
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2022-10-03 19:12 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Fam Zheng, qemu-s390x,
	Cornelia Huck, Thomas Huth, Vladimir Sementsov-Ogievskiy,
	Laurent Vivier, John Snow, Dr. David Alan Gilbert,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Mon, Oct 03, 2022 at 05:15:56AM +0200, Juan Quintela wrote:
> And it appears that what is wrong is the code. During bulk stage we
> need to make sure that some block is dirty, but no games with
> max_size at all.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/block.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-03  3:16 ` [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped Juan Quintela
@ 2022-10-13 12:25   ` Joao Martins
  2022-10-13 16:08     ` Juan Quintela
  2022-10-14 19:49     ` Jason Gunthorpe
  0 siblings, 2 replies; 23+ messages in thread
From: Joao Martins @ 2022-10-13 12:25 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, Jason Gunthorpe, qemu-devel

+Avihai, +Jason

On 03/10/2022 04:16, Juan Quintela wrote:
> HACK ahead.
> 
> There are devices that require the guest to be stopped to tell us what
> is the size of its state. 

"... and have transferred said device state" if we are talking current vfio.

We can't query size of the data_fd right now

It would need a @data_size in addition to @data_fd in
vfio_device_feature_mig_state, or getting fseek supported over the fd

> So we need to stop the vm "before" we
> cal the functions.
> 
> It is a hack because:
> - we are "starting" the guest again to stop it in migration_complete()
>   I know, I know, but it is not trivial to get all the information
>   easily to migration_complete(), so this hack.
> 

Could you expand on that? Naively, I was assuming that by 'all information' the
locally stored counters in migration_iteration_run() that aren't present in
MigrateState?

> - auto_converge test fails with this hack.  I think that it is related
>   to previous problem.  We start the guest when it is supposed to be
>   stopped for convergence reasons.
> 
> - All experiments that I did to do the proper thing failed with having
>   the iothread_locked() or try to unlock() it when not locked.
> 
> - several of the pending functions are using the iothread lock
>   themselves, so I need to split it to have two versions (one for the
>   _estimate() case with the iothread lock), and another for the
>   _exact() case without the iothread_lock().  I want comments about
>   this approach before I try to continue on this direction.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  migration/migration.c        | 13 +++++++++++++
>  tests/qtest/migration-test.c |  3 ++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 35e512887a..7374884818 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3742,7 +3742,20 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>      trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>  
>      if (pend_pre <= s->threshold_size) {
> +        int old_state = s->state;
> +        qemu_mutex_lock_iothread();
> +        // is this really necessary?  it works for me both ways.
> +        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
> +        s->vm_was_running = runstate_is_running();
> +        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +        qemu_mutex_unlock_iothread();
>          qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
> +        qemu_mutex_lock_iothread();
> +        runstate_set(old_state);
> +        if (s->vm_was_running) {
> +            vm_start();
> +        }
> +        qemu_mutex_unlock_iothread();

Couldn't we just have an extra patch that just stores pend_pre and pending_size
in MigrateState, which would allow all this check to be moved into
migration_completion(). Or maybe that wasn't an option for some other reason?

Additionally what about having a migration helper function that
vfio_save_complete_precopy() callback needs to use into to check if the
expected-device state size meets the threshold/downtime as it is saving the
device state and otherwise fail earlier accordingly when saving beyond the
threshold?

It would allow supporting both the (current UAPI) case where you need to
transfer the state to get device state size (so checking against threshold_size
pending_pre constantly would allow to not violate the SLA) as well as any other
UAPI improvement to fseek()/data_size that lets you fail even earlier.

Seems like at least it keeps some of the rules (both under iothread lock) as
this patch


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-13 12:25   ` Joao Martins
@ 2022-10-13 16:08     ` Juan Quintela
  2022-10-14 10:36       ` Joao Martins
  2022-10-14 19:49     ` Jason Gunthorpe
  1 sibling, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-13 16:08 UTC (permalink / raw)
  To: Joao Martins
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, Jason Gunthorpe, qemu-devel

Joao Martins <joao.m.martins@oracle.com> wrote:
> +Avihai, +Jason
>
> On 03/10/2022 04:16, Juan Quintela wrote:
>> HACK ahead.
>> 
>> There are devices that require the guest to be stopped to tell us what
>> is the size of its state. 
>
> "... and have transferred said device state" if we are talking current vfio.

Yeap.

> We can't query size of the data_fd right now

Oops.  My understanding was that once the guest is stopped you can say
how big is it.  That is a deal breaker.  We can't continue if we don't
know the size.  Copying the whole state to know the size looks too much.

> It would need a @data_size in addition to @data_fd in
> vfio_device_feature_mig_state, or getting fseek supported over the fd

Ok, we can decided how to do it, but I think that putting fseek into it
will only make things more complicated.

>> So we need to stop the vm "before" we
>> cal the functions.
>> 
>> It is a hack because:
>> - we are "starting" the guest again to stop it in migration_complete()
>>   I know, I know, but it is not trivial to get all the information
>>   easily to migration_complete(), so this hack.
>> 
>
> Could you expand on that? Naively, I was assuming that by 'all information' the
> locally stored counters in migration_iteration_run() that aren't present in
> MigrateState?

This is not related to vfio at all.

The problem is that current code assumes that the guest is still
running, and then do qemu_mutex_lock_iothread() and unlock() inside the
pending handlers.  To stop the guest, we need to lock the iothread
first.  So this was going to hang.  I fixed it doing the lock/unlock
twice.  That is the hack.  I will change the callers once that we decide
what is the right path ahead.  This is not a problem related to vfio. it
is a problem related about how we can stop/resume guests programatically
in the middle of qemu code.

>> - auto_converge test fails with this hack.  I think that it is related
>>   to previous problem.  We start the guest when it is supposed to be
>>   stopped for convergence reasons.
>> 
>> - All experiments that I did to do the proper thing failed with having
>>   the iothread_locked() or try to unlock() it when not locked.
>> 
>> - several of the pending functions are using the iothread lock
>>   themselves, so I need to split it to have two versions (one for the
>>   _estimate() case with the iothread lock), and another for the
>>   _exact() case without the iothread_lock().  I want comments about
>>   this approach before I try to continue on this direction.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  migration/migration.c        | 13 +++++++++++++
>>  tests/qtest/migration-test.c |  3 ++-
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 35e512887a..7374884818 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3742,7 +3742,20 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>>      trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>>  
>>      if (pend_pre <= s->threshold_size) {
>> +        int old_state = s->state;
>> +        qemu_mutex_lock_iothread();
>> +        // is this really necessary?  it works for me both ways.
>> +        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>> +        s->vm_was_running = runstate_is_running();
>> +        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> +        qemu_mutex_unlock_iothread();
>>          qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
>> +        qemu_mutex_lock_iothread();
>> +        runstate_set(old_state);
>> +        if (s->vm_was_running) {
>> +            vm_start();
>> +        }
>> +        qemu_mutex_unlock_iothread();
>
> Couldn't we just have an extra patch that just stores pend_pre and pending_size
> in MigrateState, which would allow all this check to be moved into
> migration_completion(). Or maybe that wasn't an option for some other reason?

This is not an option, because we don't have a way to get back from
migration_completion() to migrate_iteration_run() if there is not enough
space to send all the state.

> Additionally what about having a migration helper function that
> vfio_save_complete_precopy() callback needs to use into to check if the
> expected-device state size meets the threshold/downtime as it is saving the
> device state and otherwise fail earlier accordingly when saving beyond the
> threshold?

That is what I tried to do with the code.
See patch 4. ram.c

How I have two functions:
- ram_state_pending_estimate()
- ram_state_pending_exact()

1st should work without any locking, i.e. just best estimation without
too much work, second should give the real value.

What we had discussed before for vfio is that the device will "cache"
the value returned from previous ram_state_pending_exact().


> It would allow supporting both the (current UAPI) case where you need to
> transfer the state to get device state size (so checking against threshold_size
> pending_pre constantly would allow to not violate the SLA) as well as any other
> UAPI improvement to fseek()/data_size that lets you fail even earlier.
>
> Seems like at least it keeps some of the rules (both under iothread lock) as
> this patch

Coming to worse thing, you need to read the state into a local buffer
and then calculate the size in exact?  Looks overkill, but in your case,
I can't see other way to do it.

My understanding was that for new hardware you have an easy way to
calculate this value "if the guest was stopped".

My next series are a way to do in parallel the read/send of the state
with respect to the migration_thread(), but that is a different
problem.  I need a way to calculate sizes to start with, otherwise  I
have no way to decide if I can enter the completion stage (or for old
devices one can lie and assume than size is zero, but then you are going
to have bigger downtimes).

Later, Juan.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-13 16:08     ` Juan Quintela
@ 2022-10-14 10:36       ` Joao Martins
  2022-10-14 11:28         ` Juan Quintela
  0 siblings, 1 reply; 23+ messages in thread
From: Joao Martins @ 2022-10-14 10:36 UTC (permalink / raw)
  To: quintela
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, Jason Gunthorpe, qemu-devel

On 13/10/2022 17:08, Juan Quintela wrote:
> Oops.  My understanding was that once the guest is stopped you can say
> how big is it. 

It's worth keeping in mind that conceptually a VF won't stop (e.g. DMA) until
really told through VFIO. So, stopping CPUs (guest) just means that guest code
does not arm the VF for more I/O but still is a weak guarantee as VF still has
outstanding I/O to deal with until VFIO tells it to quiesce DMA (for devices
that support it).

> That is a deal breaker.  We can't continue if we don't
> know the size.  Copying the whole state to know the size looks too much.
> 

It's fair to say that the kernel could know the size of the state once the VF is
stopped but right now it is only on the STOP -> STOP_COPY arc (which is when it
is saved to kernel pages, regardless of userspace reading it). And it will block
read() until device has finished transferring it (unless you do a nonblocking
read ofc). Knowing the device state would need to be reflected in the vfio UAPI
and needs that adjustment. Providing total length ahead of device transfer might
depend on the hardware, but current vfio vendor drivers look capable of that
(from looking at the code).

>> It would need a @data_size in addition to @data_fd in
>> vfio_device_feature_mig_state, or getting fseek supported over the fd
> 
> Ok, we can decided how to do it, but I think that putting fseek into it
> will only make things more complicated.
>

fseek() was just a suggestion as a way to calculate file length (device state
length) with current file APIs:

	fseek(data_fd, 0L, SEEK_END);
	size = ftell(data_fd);

@data_size way is likely better (or simpler), but it would need to get an extra
u64 added into  VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE ioctl

I am sure there are other alternatives

>>> So we need to stop the vm "before" we
>>> cal the functions.
>>>
>>> It is a hack because:
>>> - we are "starting" the guest again to stop it in migration_complete()
>>>   I know, I know, but it is not trivial to get all the information
>>>   easily to migration_complete(), so this hack.
>>>
>>
>> Could you expand on that? Naively, I was assuming that by 'all information' the
>> locally stored counters in migration_iteration_run() that aren't present in
>> MigrateState?
> 
> This is not related to vfio at all.
> > The problem is that current code assumes that the guest is still
> running, and then do qemu_mutex_lock_iothread() and unlock() inside the
> pending handlers.  To stop the guest, we need to lock the iothread
> first.  So this was going to hang.  I fixed it doing the lock/unlock
> twice.  That is the hack.  I will change the callers once that we decide
> what is the right path ahead.  This is not a problem related to vfio. it
> is a problem related about how we can stop/resume guests programatically
> in the middle of qemu code.
>
/me nods

>>> - auto_converge test fails with this hack.  I think that it is related
>>>   to previous problem.  We start the guest when it is supposed to be
>>>   stopped for convergence reasons.
>>>
>>> - All experiments that I did to do the proper thing failed with having
>>>   the iothread_locked() or try to unlock() it when not locked.
>>>
>>> - several of the pending functions are using the iothread lock
>>>   themselves, so I need to split it to have two versions (one for the
>>>   _estimate() case with the iothread lock), and another for the
>>>   _exact() case without the iothread_lock().  I want comments about
>>>   this approach before I try to continue on this direction.
>>>
>>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>>> ---
>>>  migration/migration.c        | 13 +++++++++++++
>>>  tests/qtest/migration-test.c |  3 ++-
>>>  2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 35e512887a..7374884818 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3742,7 +3742,20 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>>>      trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>>>  
>>>      if (pend_pre <= s->threshold_size) {
>>> +        int old_state = s->state;
>>> +        qemu_mutex_lock_iothread();
>>> +        // is this really necessary?  it works for me both ways.
>>> +        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>>> +        s->vm_was_running = runstate_is_running();
>>> +        vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>>> +        qemu_mutex_unlock_iothread();
>>>          qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
>>> +        qemu_mutex_lock_iothread();
>>> +        runstate_set(old_state);
>>> +        if (s->vm_was_running) {
>>> +            vm_start();
>>> +        }
>>> +        qemu_mutex_unlock_iothread();
>>
>> Couldn't we just have an extra patch that just stores pend_pre and pending_size
>> in MigrateState, which would allow all this check to be moved into
>> migration_completion(). Or maybe that wasn't an option for some other reason?
> 
> This is not an option, because we don't have a way to get back from
> migration_completion() to migrate_iteration_run() if there is not enough
> space to send all the state.
> 
>> Additionally what about having a migration helper function that
>> vfio_save_complete_precopy() callback needs to use into to check if the
>> expected-device state size meets the threshold/downtime as it is saving the
>> device state and otherwise fail earlier accordingly when saving beyond the
>> threshold?
> 
> That is what I tried to do with the code.
> See patch 4. ram.c
> 
So what I was saying earlier was to have something like a:

	migration_check_pending(ms, device_state_size)

Which would call into migration core to check the SLA is still met. But callable
from the client (hw/vfio/migration) as opposed to the core calling into the
client. This allows that the client controls when to stop the VF

The copy to userspace one could probably be amortized via pread() at
at an arbritary offset to read 1 byte, and get an estimate size. Say you could
check that the size is readable with a @threshold_size + 1 offset without
necessarily copying the whole thing to userspace. If it reads succesfully it
would bailing off earlier (failing the migration) -- and it would avoid doing
the full copy to userspace.

But the one gotcha is the STOP -> STOP_COPY needs to
happen and that is what triggers the transfer the device state into
kernel-managed pages before userspace decides to do anything to access/copy said
state.

> How I have two functions:
> - ram_state_pending_estimate()
> - ram_state_pending_exact()
> 
> 1st should work without any locking, i.e. just best estimation without
> too much work, second should give the real value.
> 
Right -- I did notice that

> What we had discussed before for vfio is that the device will "cache"
> the value returned from previous ram_state_pending_exact().
>

A concern I see is that if we stop and resume the VF on a regular basis to
accommodate a calculation just to be made available throughout all migration
flow -- it is going to be a little invasive from guest performance/behaviour PoV?

Perhaps this check ought to be amortized at different major stage transitions of
migration (as opposed to any iteration) as this can end up happening very often
as soon as non-VFIO state + dirty pages get to a small enough threshold?

> 
>> It would allow supporting both the (current UAPI) case where you need to
>> transfer the state to get device state size (so checking against threshold_size
>> pending_pre constantly would allow to not violate the SLA) as well as any other
>> UAPI improvement to fseek()/data_size that lets you fail even earlier.
>>
>> Seems like at least it keeps some of the rules (both under iothread lock) as
>> this patch
> 
> Coming to worse thing, you need to read the state into a local buffer
> and then calculate the size in exact?  Looks overkill, but in your case,
> I can't see other way to do it.
> 
> My understanding was that for new hardware you have an easy way to
> calculate this value "if the guest was stopped".
> 
s/guest/VF

> My next series are a way to do in parallel the read/send of the state
> with respect to the migration_thread(), but that is a different
> problem. 

There's also non-blocking reads not sure it helps towards the asynchronous transfer?

> I need a way to calculate sizes to start with, 
> otherwise  I
> have no way to decide if I can enter the completion stage (or for old
> devices one can lie and assume than size is zero, 

to be fair that's what happens right now with migration v1 protocol -- it's not
unprecedented IIUC

> but then you are going
> to have bigger downtimes).
> 
> Later, Juan.
> 
Thanks


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-14 10:36       ` Joao Martins
@ 2022-10-14 11:28         ` Juan Quintela
  2022-10-14 12:29           ` Joao Martins
  0 siblings, 1 reply; 23+ messages in thread
From: Juan Quintela @ 2022-10-14 11:28 UTC (permalink / raw)
  To: Joao Martins
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, Jason Gunthorpe, qemu-devel

Joao Martins <joao.m.martins@oracle.com> wrote:
> On 13/10/2022 17:08, Juan Quintela wrote:
>> Oops.  My understanding was that once the guest is stopped you can say
>> how big is it. 

Hi

> It's worth keeping in mind that conceptually a VF won't stop (e.g. DMA) until
> really told through VFIO. So, stopping CPUs (guest) just means that guest code
> does not arm the VF for more I/O but still is a weak guarantee as VF still has
> outstanding I/O to deal with until VFIO tells it to quiesce DMA (for devices
> that support it).

How can we make sure about that?

i.e. I know I have a vfio device.  I need two things:
- in the iterative stage, I eed to check the size, but a estimate is ok.
  for example with RAM, we use whatever is the size of the dirty bitmap
  at that moment.
  If the estimated size is smaller than the theshold, we
   * stop the guest
   * sync dirty bitmap
   * now we test the (real) dirty bitmap size

How can we do something like that with a vfio device.


>> That is a deal breaker.  We can't continue if we don't
>> know the size.  Copying the whole state to know the size looks too much.
>> 
>
> It's fair to say that the kernel could know the size of the state once the VF is
> stopped but right now it is only on the STOP -> STOP_COPY arc (which is when it
> is saved to kernel pages, regardless of userspace reading it). And it will block
> read() until device has finished transferring it (unless you do a nonblocking
> read ofc). Knowing the device state would need to be reflected in the vfio UAPI
> and needs that adjustment. Providing total length ahead of device transfer might
> depend on the hardware, but current vfio vendor drivers look capable of that
> (from looking at the code).

I have no clue about vfio, so I need help here.  I can only provide
hooks.  But I expect that we should be able to convince firmware to give
us that information.

>>> It would need a @data_size in addition to @data_fd in
>>> vfio_device_feature_mig_state, or getting fseek supported over the fd
>> 
>> Ok, we can decided how to do it, but I think that putting fseek into it
>> will only make things more complicated.
>>
>
> fseek() was just a suggestion as a way to calculate file length (device state
> length) with current file APIs:
>
> 	fseek(data_fd, 0L, SEEK_END);
> 	size = ftell(data_fd);
>
> @data_size way is likely better (or simpler), but it would need to get an extra
> u64 added into  VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE ioctl
>
> I am sure there are other alternatives

My understanding from NVidia folks was that newer firmware have an ioctl
to return than information.

>>>> So we need to stop the vm "before" we
>>>> cal the functions.
>>>>
>>>> It is a hack because:
>>>> - we are "starting" the guest again to stop it in migration_complete()
>>>>   I know, I know, but it is not trivial to get all the information
>>>>   easily to migration_complete(), so this hack.
>>>>
>>>
>>> Could you expand on that? Naively, I was assuming that by 'all information' the
>>> locally stored counters in migration_iteration_run() that aren't present in
>>> MigrateState?
>> 
>> This is not related to vfio at all.
>> > The problem is that current code assumes that the guest is still
>> running, and then do qemu_mutex_lock_iothread() and unlock() inside the
>> pending handlers.  To stop the guest, we need to lock the iothread
>> first.  So this was going to hang.  I fixed it doing the lock/unlock
>> twice.  That is the hack.  I will change the callers once that we decide
>> what is the right path ahead.  This is not a problem related to vfio. it
>> is a problem related about how we can stop/resume guests programatically
>> in the middle of qemu code.
>>
> /me nods
>>> Couldn't we just have an extra patch that just stores pend_pre and pending_size
>>> in MigrateState, which would allow all this check to be moved into
>>> migration_completion(). Or maybe that wasn't an option for some other reason?
>> 
>> This is not an option, because we don't have a way to get back from
>> migration_completion() to migrate_iteration_run() if there is not enough
>> space to send all the state.
>> 
>>> Additionally what about having a migration helper function that
>>> vfio_save_complete_precopy() callback needs to use into to check if the
>>> expected-device state size meets the threshold/downtime as it is saving the
>>> device state and otherwise fail earlier accordingly when saving beyond the
>>> threshold?
>> 
>> That is what I tried to do with the code.
>> See patch 4. ram.c
>> 
> So what I was saying earlier was to have something like a:
>
> 	migration_check_pending(ms, device_state_size)
>
> Which would call into migration core to check the SLA is still met. But callable
> from the client (hw/vfio/migration) as opposed to the core calling into the
> client. This allows that the client controls when to stop the VF
>
> The copy to userspace one could probably be amortized via pread() at
> at an arbritary offset to read 1 byte, and get an estimate size. Say you could
> check that the size is readable with a @threshold_size + 1 offset without
> necessarily copying the whole thing to userspace. If it reads succesfully it
> would bailing off earlier (failing the migration) -- and it would avoid doing
> the full copy to userspace.
>
> But the one gotcha is the STOP -> STOP_COPY needs to
> happen and that is what triggers the transfer the device state into
> kernel-managed pages before userspace decides to do anything to access/copy said
> state.
>
>> How I have two functions:
>> - ram_state_pending_estimate()
>> - ram_state_pending_exact()
>> 
>> 1st should work without any locking, i.e. just best estimation without
>> too much work, second should give the real value.
>> 
> Right -- I did notice that
>
>> What we had discussed before for vfio is that the device will "cache"
>> the value returned from previous ram_state_pending_exact().
>>
>
> A concern I see is that if we stop and resume the VF on a regular basis to
> accommodate a calculation just to be made available throughout all migration
> flow -- it is going to be a little invasive from guest performance/behaviour PoV?

It is *kind* of invasive (make things slower), but we already have the
problem that current code, we are not counting the size of the devices
state on calculation, and we should.  This adds a hook for all devices
to include this information.

I see two options:
- we stop the guest for the calculations (it makes the last iteration
  downtime faster), but it makes the rest of the end of iterations
  slower.  Notice that we should not have so many iterations to start with.

> Perhaps this check ought to be amortized at different major stage transitions of
> migration (as opposed to any iteration) as this can end up happening very often
> as soon as non-VFIO state + dirty pages get to a small enough threshold?

This is the reason why we add the estimate size for the vfio.  if the
estimate is good enough, we shouldn't stop so many times.

>>> It would allow supporting both the (current UAPI) case where you need to
>>> transfer the state to get device state size (so checking against threshold_size
>>> pending_pre constantly would allow to not violate the SLA) as well as any other
>>> UAPI improvement to fseek()/data_size that lets you fail even earlier.
>>>
>>> Seems like at least it keeps some of the rules (both under iothread lock) as
>>> this patch
>> 
>> Coming to worse thing, you need to read the state into a local buffer
>> and then calculate the size in exact?  Looks overkill, but in your case,
>> I can't see other way to do it.
>> 
>> My understanding was that for new hardware you have an easy way to
>> calculate this value "if the guest was stopped".
>> 
> s/guest/VF

The use case that I was discussing the whole card was assigned to the
guest, not a VF.  I have no clue if that makes the problem easier of
more difficult.

>> My next series are a way to do in parallel the read/send of the state
>> with respect to the migration_thread(), but that is a different
>> problem. 
>
> There's also non-blocking reads not sure it helps towards the asynchronous transfer?

My current "cunning" plan is to create a new channel for each vfio
device, hat allow the device to do whatever they want, and using a
blocking interface.

>> I need a way to calculate sizes to start with, 
>> otherwise  I
>> have no way to decide if I can enter the completion stage (or for old
>> devices one can lie and assume than size is zero, 
>
> to be fair that's what happens right now with migration v1 protocol -- it's not
> unprecedented IIUC

Ok, thanks for the comments.

Later, Juan.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-14 11:28         ` Juan Quintela
@ 2022-10-14 12:29           ` Joao Martins
  2022-10-18 12:22             ` Jason Gunthorpe via
  0 siblings, 1 reply; 23+ messages in thread
From: Joao Martins @ 2022-10-14 12:29 UTC (permalink / raw)
  To: quintela
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, Jason Gunthorpe, qemu-devel

On 14/10/2022 12:28, Juan Quintela wrote:
> Joao Martins <joao.m.martins@oracle.com> wrote:
>> On 13/10/2022 17:08, Juan Quintela wrote:
>>> Oops.  My understanding was that once the guest is stopped you can say
>>> how big is it. 
> 
> Hi
> 
>> It's worth keeping in mind that conceptually a VF won't stop (e.g. DMA) until
>> really told through VFIO. So, stopping CPUs (guest) just means that guest code
>> does not arm the VF for more I/O but still is a weak guarantee as VF still has
>> outstanding I/O to deal with until VFIO tells it to quiesce DMA (for devices
>> that support it).
> 
> How can we make sure about that?
> 
> i.e. I know I have a vfio device.  I need two things:
> - in the iterative stage, I eed to check the size, but a estimate is ok.
>   for example with RAM, we use whatever is the size of the dirty bitmap
>   at that moment.
>   If the estimated size is smaller than the theshold, we
>    * stop the guest
>    * sync dirty bitmap
>    * now we test the (real) dirty bitmap size
> 
> How can we do something like that with a vfio device.
> 
You would have an extra intermediate step that stops the VF prior to asking
the device state length. What I am not sure is whether stopping
vCPUs can be skipped as an optimization. In theory you could, but perhaps
is best to be conservative and handling with same assumptions as any other guest
related state.

> 
>>> That is a deal breaker.  We can't continue if we don't
>>> know the size.  Copying the whole state to know the size looks too much.
>>>
>>
>> It's fair to say that the kernel could know the size of the state once the VF is
>> stopped but right now it is only on the STOP -> STOP_COPY arc (which is when it
>> is saved to kernel pages, regardless of userspace reading it). And it will block
>> read() until device has finished transferring it (unless you do a nonblocking
>> read ofc). Knowing the device state would need to be reflected in the vfio UAPI
>> and needs that adjustment. Providing total length ahead of device transfer might
>> depend on the hardware, but current vfio vendor drivers look capable of that
>> (from looking at the code).
> 
> I have no clue about vfio, so I need help here.  I can only provide
> hooks.  But I expect that we should be able to convince firmware to give
> us that information.
> 
The above was me talking about kernel interface.

>>>> It would need a @data_size in addition to @data_fd in
>>>> vfio_device_feature_mig_state, or getting fseek supported over the fd
>>>
>>> Ok, we can decided how to do it, but I think that putting fseek into it
>>> will only make things more complicated.
>>>
>>
>> fseek() was just a suggestion as a way to calculate file length (device state
>> length) with current file APIs:
>>
>> 	fseek(data_fd, 0L, SEEK_END);
>> 	size = ftell(data_fd);
>>
>> @data_size way is likely better (or simpler), but it would need to get an extra
>> u64 added into  VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE ioctl
>>
>> I am sure there are other alternatives
> 
> My understanding from NVidia folks was that newer firmware have an ioctl
> to return than information.
> 

Maybe there's something new. I was thinking from this here:

https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/mlx5/cmd.c#L47

mlx5vf_cmd_query_vhca_migration_state()

The hisilicon one looks to be always the same size

What both probably assume is that the device has been suspended prior to making
this call. We would use those interface to return this @data_size as said above
or whatever reasonable alternative folks had discussed.

>>>>> So we need to stop the vm "before" we
>>>>> cal the functions.
>>>>>
>>>>> It is a hack because:
>>>>> - we are "starting" the guest again to stop it in migration_complete()
>>>>>   I know, I know, but it is not trivial to get all the information
>>>>>   easily to migration_complete(), so this hack.
>>>>>
>>>>
>>>> Could you expand on that? Naively, I was assuming that by 'all information' the
>>>> locally stored counters in migration_iteration_run() that aren't present in
>>>> MigrateState?
>>>
>>> This is not related to vfio at all.
>>>> The problem is that current code assumes that the guest is still
>>> running, and then do qemu_mutex_lock_iothread() and unlock() inside the
>>> pending handlers.  To stop the guest, we need to lock the iothread
>>> first.  So this was going to hang.  I fixed it doing the lock/unlock
>>> twice.  That is the hack.  I will change the callers once that we decide
>>> what is the right path ahead.  This is not a problem related to vfio. it
>>> is a problem related about how we can stop/resume guests programatically
>>> in the middle of qemu code.
>>>
>> /me nods
>>>> Couldn't we just have an extra patch that just stores pend_pre and pending_size
>>>> in MigrateState, which would allow all this check to be moved into
>>>> migration_completion(). Or maybe that wasn't an option for some other reason?
>>>
>>> This is not an option, because we don't have a way to get back from
>>> migration_completion() to migrate_iteration_run() if there is not enough
>>> space to send all the state.
>>>
>>>> Additionally what about having a migration helper function that
>>>> vfio_save_complete_precopy() callback needs to use into to check if the
>>>> expected-device state size meets the threshold/downtime as it is saving the
>>>> device state and otherwise fail earlier accordingly when saving beyond the
>>>> threshold?
>>>
>>> That is what I tried to do with the code.
>>> See patch 4. ram.c
>>>
>> So what I was saying earlier was to have something like a:
>>
>> 	migration_check_pending(ms, device_state_size)
>>
>> Which would call into migration core to check the SLA is still met. But callable
>> from the client (hw/vfio/migration) as opposed to the core calling into the
>> client. This allows that the client controls when to stop the VF
>>
>> The copy to userspace one could probably be amortized via pread() at
>> at an arbritary offset to read 1 byte, and get an estimate size. Say you could
>> check that the size is readable with a @threshold_size + 1 offset without
>> necessarily copying the whole thing to userspace. If it reads succesfully it
>> would bailing off earlier (failing the migration) -- and it would avoid doing
>> the full copy to userspace.
>>
>> But the one gotcha is the STOP -> STOP_COPY needs to
>> happen and that is what triggers the transfer the device state into
>> kernel-managed pages before userspace decides to do anything to access/copy said
>> state.
>>
>>> How I have two functions:
>>> - ram_state_pending_estimate()
>>> - ram_state_pending_exact()
>>>
>>> 1st should work without any locking, i.e. just best estimation without
>>> too much work, second should give the real value.
>>>
>> Right -- I did notice that
>>
>>> What we had discussed before for vfio is that the device will "cache"
>>> the value returned from previous ram_state_pending_exact().
>>>
>>
>> A concern I see is that if we stop and resume the VF on a regular basis to
>> accommodate a calculation just to be made available throughout all migration
>> flow -- it is going to be a little invasive from guest performance/behaviour PoV?
> 
> It is *kind* of invasive (make things slower), but we already have the
> problem that current code, we are not counting the size of the devices
> state on calculation, and we should.  This adds a hook for all devices
> to include this information.
> 
I think one difference here is that the vmstate is all self-contained in
Qemu/VMM (except for accel-related state obtained from KVM). Here we are making
calls into hardware, stopping the VF, get the state to resume it again.

> I see two options:
> - we stop the guest for the calculations (it makes the last iteration
>   downtime faster), but it makes the rest of the end of iterations
>   slower.  Notice that we should not have so many iterations to start with.
> 
>> Perhaps this check ought to be amortized at different major stage transitions of
>> migration (as opposed to any iteration) as this can end up happening very often
>> as soon as non-VFIO state + dirty pages get to a small enough threshold?
> 
> This is the reason why we add the estimate size for the vfio.  if the
> estimate is good enough, we shouldn't stop so many times.
> 
I suppose, yes

>>>> It would allow supporting both the (current UAPI) case where you need to
>>>> transfer the state to get device state size (so checking against threshold_size
>>>> pending_pre constantly would allow to not violate the SLA) as well as any other
>>>> UAPI improvement to fseek()/data_size that lets you fail even earlier.
>>>>
>>>> Seems like at least it keeps some of the rules (both under iothread lock) as
>>>> this patch
>>>
>>> Coming to worse thing, you need to read the state into a local buffer
>>> and then calculate the size in exact?  Looks overkill, but in your case,
>>> I can't see other way to do it.
>>>
>>> My understanding was that for new hardware you have an easy way to
>>> calculate this value "if the guest was stopped".
>>>
>> s/guest/VF
> 
> The use case that I was discussing the whole card was assigned to the
> guest, not a VF.  I have no clue if that makes the problem easier of
> more difficult.
> 
>>> My next series are a way to do in parallel the read/send of the state
>>> with respect to the migration_thread(), but that is a different
>>> problem. 
>>
>> There's also non-blocking reads not sure it helps towards the asynchronous transfer?
> 
> My current "cunning" plan is to create a new channel for each vfio
> device, hat allow the device to do whatever they want, and using a
> blocking interface.
> 
Looks sensible

>>> I need a way to calculate sizes to start with, 
>>> otherwise  I
>>> have no way to decide if I can enter the completion stage (or for old
>>> devices one can lie and assume than size is zero, 
>>
>> to be fair that's what happens right now with migration v1 protocol -- it's not
>> unprecedented IIUC
> 
> Ok, thanks for the comments.
> 
Likewise

It would be nice if we could still move forward with v2 support, while newfound
gaps are addressed towards the road of turning it into a non experimental
feature. It would allow exercising a lot of the new interface, as the ecosystem
is a bit limiting :(


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-13 12:25   ` Joao Martins
  2022-10-13 16:08     ` Juan Quintela
@ 2022-10-14 19:49     ` Jason Gunthorpe
  1 sibling, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2022-10-14 19:49 UTC (permalink / raw)
  To: Joao Martins
  Cc: Juan Quintela, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, qemu-devel

On Thu, Oct 13, 2022 at 01:25:10PM +0100, Joao Martins wrote:

> It would allow supporting both the (current UAPI) case where you need to
> transfer the state to get device state size (so checking against threshold_size
> pending_pre constantly would allow to not violate the SLA) as well as any other
> UAPI improvement to fseek()/data_size that lets you fail even earlier.

We should not get fixated on missing part of the current kernel
support, if we need a new query or something to make qemu happy then
we will add it. We don't need to worry about supporting the kernel-with-no-query
case in qemu.

Jason


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-14 12:29           ` Joao Martins
@ 2022-10-18 12:22             ` Jason Gunthorpe via
  2022-10-19 15:51               ` Yishai Hadas
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe via @ 2022-10-18 12:22 UTC (permalink / raw)
  To: Joao Martins
  Cc: quintela, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, qemu-devel

On Fri, Oct 14, 2022 at 01:29:51PM +0100, Joao Martins wrote:
> On 14/10/2022 12:28, Juan Quintela wrote:
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >> On 13/10/2022 17:08, Juan Quintela wrote:
> >>> Oops.  My understanding was that once the guest is stopped you can say
> >>> how big is it. 
> > 
> > Hi
> > 
> >> It's worth keeping in mind that conceptually a VF won't stop (e.g. DMA) until
> >> really told through VFIO. So, stopping CPUs (guest) just means that guest code
> >> does not arm the VF for more I/O but still is a weak guarantee as VF still has
> >> outstanding I/O to deal with until VFIO tells it to quiesce DMA (for devices
> >> that support it).
> > 
> > How can we make sure about that?
> > 
> > i.e. I know I have a vfio device.  I need two things:
> > - in the iterative stage, I eed to check the size, but a estimate is ok.
> >   for example with RAM, we use whatever is the size of the dirty bitmap
> >   at that moment.
> >   If the estimated size is smaller than the theshold, we
> >    * stop the guest
> >    * sync dirty bitmap
> >    * now we test the (real) dirty bitmap size
> > 
> > How can we do something like that with a vfio device.
> > 
> You would have an extra intermediate step that stops the VF prior to asking
> the device state length. What I am not sure is whether stopping
> vCPUs can be skipped as an optimization.

It cannot, if you want to stop the VFIO device you must also stop the
vCPUs because the device is not required to respond properly to MMIO
operations when stopped.

> > My understanding from NVidia folks was that newer firmware have an ioctl
> > to return than information.
> 
> Maybe there's something new. I was thinking from this here:

Juan is talking about the ioctl we had in the pre-copy series.

I expect it to come into some different form to support this RFC.

Jason


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
  2022-10-18 12:22             ` Jason Gunthorpe via
@ 2022-10-19 15:51               ` Yishai Hadas
  0 siblings, 0 replies; 23+ messages in thread
From: Yishai Hadas @ 2022-10-19 15:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Joao Martins, quintela
  Cc: quintela, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Avihai Horon, qemu-devel, Maor Gottlieb

> From: Qemu-devel <qemu-devel-
> bounces+yishaih=nvidia.com@nongnu.org> On Behalf Of Jason Gunthorpe
> Sent: Tuesday, 18 October 2022 15:23
> To: Joao Martins <joao.m.martins@oracle.com>
> Cc: quintela@redhat.com; Alex Williamson <alex.williamson@redhat.com>;
> Eric Blake <eblake@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>;
> Fam Zheng <fam@euphon.net>; qemu-s390x@nongnu.org; Cornelia Huck
> <cohuck@redhat.com>; Thomas Huth <thuth@redhat.com>; Vladimir
> Sementsov-Ogievskiy <vsementsov@yandex-team.ru>; Laurent Vivier
> <lvivier@redhat.com>; John Snow <jsnow@redhat.com>; Dr. David Alan
> Gilbert <dgilbert@redhat.com>; Christian Borntraeger
> <borntraeger@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>; Paolo
> Bonzini <pbonzini@redhat.com>; qemu-block@nongnu.org; Eric Farman
> <farman@linux.ibm.com>; Richard Henderson
> <richard.henderson@linaro.org>; David Hildenbrand <david@redhat.com>;
> Avihai Horon <avihaih@nvidia.com>; qemu-devel@nongnu.org
> Subject: Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact()
> with the guest stopped
> 
> On Fri, Oct 14, 2022 at 01:29:51PM +0100, Joao Martins wrote:
> > On 14/10/2022 12:28, Juan Quintela wrote:
> > > Joao Martins <joao.m.martins@oracle.com> wrote:
> > >> On 13/10/2022 17:08, Juan Quintela wrote:
> > >>> Oops.  My understanding was that once the guest is stopped you can
> > >>> say how big is it.
> > >
> > > Hi
> > >
> > >> It's worth keeping in mind that conceptually a VF won't stop (e.g.
> > >> DMA) until really told through VFIO. So, stopping CPUs (guest) just
> > >> means that guest code does not arm the VF for more I/O but still is
> > >> a weak guarantee as VF still has outstanding I/O to deal with until
> > >> VFIO tells it to quiesce DMA (for devices that support it).
> > >
> > > How can we make sure about that?
> > >
> > > i.e. I know I have a vfio device.  I need two things:
> > > - in the iterative stage, I eed to check the size, but a estimate is ok.
> > >   for example with RAM, we use whatever is the size of the dirty bitmap
> > >   at that moment.
> > >   If the estimated size is smaller than the theshold, we
> > >    * stop the guest
> > >    * sync dirty bitmap
> > >    * now we test the (real) dirty bitmap size
> > >
> > > How can we do something like that with a vfio device.
> > >
> > You would have an extra intermediate step that stops the VF prior to
> > asking the device state length. What I am not sure is whether stopping
> > vCPUs can be skipped as an optimization.
> 
> It cannot, if you want to stop the VFIO device you must also stop the vCPUs
> because the device is not required to respond properly to MMIO operations
> when stopped.
> 
> > > My understanding from NVidia folks was that newer firmware have an
> > > ioctl to return than information.
> >
> > Maybe there's something new. I was thinking from this here:
> 
> Juan is talking about the ioctl we had in the pre-copy series.
> 
> I expect it to come into some different form to support this RFC.
> 

Do we really need to STOP the VM to get the exact data length that will be required to complete stop copy ?

Can't we simply go with some close estimation when the device is running and drop all the complexity in QEMU/Kernel to STOP and then RE-START the VM if the threshold didn't meet, etc.?

Yishai

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC 0/7] migration patches for VFIO
  2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
                   ` (6 preceding siblings ...)
  2022-10-03  3:16 ` [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped Juan Quintela
@ 2022-10-20 14:56 ` Yishai Hadas
  7 siblings, 0 replies; 23+ messages in thread
From: Yishai Hadas @ 2022-10-20 14:56 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Alex Williamson, Eric Blake, Stefan Hajnoczi, Fam Zheng,
	qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Dr. David Alan Gilbert, Christian Borntraeger, Halil Pasic,
	Paolo Bonzini, qemu-block, Eric Farman, Richard Henderson,
	David Hildenbrand, Jason Gunthorpe, Maor Gottlieb, Avihai Horon

> From: Qemu-devel <qemu-devel-
> bounces+yishaih=nvidia.com@nongnu.org> On Behalf Of Juan Quintela
> Sent: Monday, 3 October 2022 6:16
> To: qemu-devel@nongnu.org
> Cc: Alex Williamson <alex.williamson@redhat.com>; Eric Blake
> <eblake@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>; Fam
> Zheng <fam@euphon.net>; qemu-s390x@nongnu.org; Cornelia Huck
> <cohuck@redhat.com>; Thomas Huth <thuth@redhat.com>; Vladimir
> Sementsov-Ogievskiy <vsementsov@yandex-team.ru>; Laurent Vivier
> <lvivier@redhat.com>; John Snow <jsnow@redhat.com>; Dr. David Alan
> Gilbert <dgilbert@redhat.com>; Christian Borntraeger
> <borntraeger@linux.ibm.com>; Halil Pasic <pasic@linux.ibm.com>; Juan
> Quintela <quintela@redhat.com>; Paolo Bonzini <pbonzini@redhat.com>;
> qemu-block@nongnu.org; Eric Farman <farman@linux.ibm.com>; Richard
> Henderson <richard.henderson@linaro.org>; David Hildenbrand
> <david@redhat.com>
> Subject: [RFC 0/7] migration patches for VFIO
> 
> Hi
> 
> VFIO migration has several requirements:
> - the size of the state is only known when the guest is stopped

As was discussed in the conference call, I just sent a patch to the kernel mailing list to be able to get the state size in each state.

See:
https://patchwork.kernel.org/project/kvm/patch/20221020132109.112708-1-yishaih@nvidia.com/

This can drop the need to stop the guest and ask for that data.

So, I assume that you can drop some complexity and hacks from your RFC once you'll send the next series.

Specifically,
No need to stop the VM and re-start it in case the SLA can't meet, just read upon RUNNING the estimated data length that will be required to complete STOP_COPY and use it.

Yishai

> - they need to send possible lots of data.
> 
> this series only address the 1st set of problems.
> 
> What they do:
> - res_compatible parameter was not used anywhere, just add that
> information to res_postcopy.
> - Remove QEMUFILE parameter from save_live_pending
> - Split save_live_pending into
>   * save_pending_estimate(): the pending state size without trying too hard
>   * save_pending_exact(): the real pending state size, it is called with the
> guest stopped.
> - Now save_pending_* don't need the threshold parameter
> - HACK a way to stop the guest before moving there.
> 
> ToDo:
> - autoconverge test is broken, no real clue why, but it is possible that the test
> is wrong.
> 
> - Make an artifact to be able to send massive amount of data in the save
> state stage (probably more multifd channels).
> 
> - Be able to not having to start the guest between cheking the state pending
> size and migration_completion().
> 
> Please review.
> 
> Thanks, Juan.
> 
> Juan Quintela (7):
>   migration: Remove res_compatible parameter
>   migration: No save_live_pending() method uses the QEMUFile parameter
>   migration: Block migration comment or code is wrong
>   migration: Split save_live_pending() into state_pending_*
>   migration: Remove unused threshold_size parameter
>   migration: simplify migration_iteration_run()
>   migration: call qemu_savevm_state_pending_exact() with the guest
>     stopped
> 
>  docs/devel/migration.rst       | 18 ++++++------
>  docs/devel/vfio-migration.rst  |  4 +--
>  include/migration/register.h   | 29 ++++++++++---------
>  migration/savevm.h             |  8 +++---
>  hw/s390x/s390-stattrib.c       | 11 ++++---
>  hw/vfio/migration.c            | 17 +++++------
>  migration/block-dirty-bitmap.c | 14 ++++-----
>  migration/block.c              | 17 ++++++-----
>  migration/migration.c          | 52 ++++++++++++++++++++++------------
>  migration/ram.c                | 35 ++++++++++++++++-------
>  migration/savevm.c             | 37 +++++++++++++++++-------
>  tests/qtest/migration-test.c   |  3 +-
>  hw/vfio/trace-events           |  2 +-
>  migration/trace-events         |  7 +++--
>  14 files changed, 148 insertions(+), 106 deletions(-)
> 
> --
> 2.37.2
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 1/7] migration: Remove res_compatible parameter
  2022-10-03  3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
@ 2022-11-22 13:54   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2022-11-22 13:54 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

* Juan Quintela (quintela@redhat.com) wrote:
> It was only used for RAM, and in that case, it means that this amount
> of data was sent for memory.  Just delete the field in all callers.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  include/migration/register.h   | 20 ++++++++++----------
>  migration/savevm.h             |  4 +---
>  hw/s390x/s390-stattrib.c       |  6 ++----
>  hw/vfio/migration.c            | 10 ++++------
>  migration/block-dirty-bitmap.c |  7 +++----
>  migration/block.c              |  7 +++----
>  migration/migration.c          |  9 ++++-----
>  migration/ram.c                |  8 +++-----
>  migration/savevm.c             | 14 +++++---------
>  hw/vfio/trace-events           |  2 +-
>  migration/trace-events         |  2 +-
>  11 files changed, 37 insertions(+), 52 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index c1dcff0f90..1950fee6a8 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -48,18 +48,18 @@ typedef struct SaveVMHandlers {
>      int (*save_setup)(QEMUFile *f, void *opaque);
>      void (*save_live_pending)(QEMUFile *f, void *opaque,
>                                uint64_t threshold_size,
> -                              uint64_t *res_precopy_only,
> -                              uint64_t *res_compatible,
> -                              uint64_t *res_postcopy_only);
> +                              uint64_t *rest_precopy,
> +                              uint64_t *rest_postcopy);
>      /* Note for save_live_pending:
> -     * - res_precopy_only is for data which must be migrated in precopy phase
> -     *     or in stopped state, in other words - before target vm start
> -     * - res_compatible is for data which may be migrated in any phase
> -     * - res_postcopy_only is for data which must be migrated in postcopy phase
> -     *     or in stopped state, in other words - after source vm stop
> +     * - res_precopy is for data which must be migrated in precopy
> +     *     phase or in stopped state, in other words - before target
> +     *     vm start
> +     * - res_postcopy is for data which must be migrated in postcopy
> +     *     phase or in stopped state, in other words - after source vm
> +     *     stop
>       *
> -     * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the
> -     * whole amount of pending data.
> +     * Sum of res_precopy and res_postcopy is the whole amount of
> +     * pending data.

I'm not sure if this is correct; I think we really do have three
different kinds of device:
  a) Those that don't know how to postcopy
  b) Those that can only send data in postcopy (which I think is what
the dirty-bitmap block stuff does)
  c) Devices that know how to postcopy, like RAM
  
<snip>

> diff --git a/migration/ram.c b/migration/ram.c
> index dc1de9ddbc..20167e1102 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  }
>  
>  static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> -                             uint64_t *res_precopy_only,
> -                             uint64_t *res_compatible,
> -                             uint64_t *res_postcopy_only)
> +                             uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>      RAMState **temp = opaque;
>      RAMState *rs = *temp;
> @@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
>  
>      if (migrate_postcopy_ram()) {
>          /* We can do postcopy, and all the data is postcopiable */
> -        *res_compatible += remaining_size;
> +        *res_postcopy += remaining_size;

migrate_postcopy_ram() says that postcopy is enabled, not active, so here you are saying that
ram is 'postcopy only' according to your definition above.
I don't think it actually changes behaviour though at the moment; but it
doesn't feel right to assign that to something that says 'postcopy only'
migration_iteration_run doesn't enforce the postcopy-only part very
hard; it's just part of the threshold as far as I can tell.

Dave

>      } else {
> -        *res_precopy_only += remaining_size;
> +        *res_precopy += remaining_size;
>      }
>  }

> diff --git a/migration/migration.c b/migration/migration.c
> index bb8bbddfe4..440aa62f16 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3734,15 +3734,14 @@ typedef enum {
>   */
>  static MigIterateState migration_iteration_run(MigrationState *s)
>  {
> -    uint64_t pending_size, pend_pre, pend_compat, pend_post;
> +    uint64_t pending_size, pend_pre, pend_post;
>      bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  
>      qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
> -                              &pend_compat, &pend_post);
> -    pending_size = pend_pre + pend_compat + pend_post;
> +                              &pend_post);
> +    pending_size = pend_pre + pend_post;
>  
> -    trace_migrate_pending(pending_size, s->threshold_size,
> -                          pend_pre, pend_compat, pend_post);
> +    trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
>  
>      if (pending_size && pending_size >= s->threshold_size) {
>          /* Still a significant amount to transfer */
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 48e85c052c..a752ff9ea1 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1472,16 +1472,13 @@ flush:
>   * for units that can't do postcopy.
>   */
>  void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
> -                               uint64_t *res_precopy_only,
> -                               uint64_t *res_compatible,
> -                               uint64_t *res_postcopy_only)
> +                               uint64_t *res_precopy,
> +                               uint64_t *res_postcopy)
>  {
>      SaveStateEntry *se;
>  
> -    *res_precopy_only = 0;
> -    *res_compatible = 0;
> -    *res_postcopy_only = 0;
> -
> +    *res_precopy = 0;
> +    *res_postcopy = 0;
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (!se->ops || !se->ops->save_live_pending) {
> @@ -1493,8 +1490,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
>              }
>          }
>          se->ops->save_live_pending(f, se->opaque, threshold_size,
> -                                   res_precopy_only, res_compatible,
> -                                   res_postcopy_only);
> +                                   res_precopy, res_postcopy);
>      }
>  }
>  
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 73dffe9e00..a21cbd2a56 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -157,7 +157,7 @@ vfio_save_cleanup(const char *name) " (%s)"
>  vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
>  vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>  vfio_save_device_config_state(const char *name) " (%s)"
> -vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
> +vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
>  vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
>  vfio_save_complete_precopy(const char *name) " (%s)"
>  vfio_load_device_config_state(const char *name) " (%s)"
> diff --git a/migration/trace-events b/migration/trace-events
> index 57003edcbd..f2a873fd6c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -150,7 +150,7 @@ migrate_fd_cleanup(void) ""
>  migrate_fd_error(const char *error_desc) "error=%s"
>  migrate_fd_cancel(void) ""
>  migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
> -migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
> +migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
>  migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
>  migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
>  migration_completion_file_err(void) ""
> -- 
> 2.37.2
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter
  2022-10-03  3:15 ` [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
@ 2022-11-22 17:48   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2022-11-22 17:48 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

* Juan Quintela (quintela@redhat.com) wrote:
> So remove it everywhere.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/migration/register.h   | 6 ++----
>  migration/savevm.h             | 2 +-
>  hw/s390x/s390-stattrib.c       | 2 +-
>  hw/vfio/migration.c            | 6 ++----
>  migration/block-dirty-bitmap.c | 5 ++---
>  migration/block.c              | 2 +-
>  migration/migration.c          | 3 +--
>  migration/ram.c                | 2 +-
>  migration/savevm.c             | 5 ++---
>  9 files changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 1950fee6a8..5b5424ed8f 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,10 +46,8 @@ typedef struct SaveVMHandlers {
>  
>      /* This runs outside the iothread lock!  */
>      int (*save_setup)(QEMUFile *f, void *opaque);
> -    void (*save_live_pending)(QEMUFile *f, void *opaque,
> -                              uint64_t threshold_size,
> -                              uint64_t *rest_precopy,
> -                              uint64_t *rest_postcopy);
> +    void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> +                              uint64_t *rest_precopy, uint64_t *rest_postcopy);
>      /* Note for save_live_pending:
>       * - res_precopy is for data which must be migrated in precopy
>       *     phase or in stopped state, in other words - before target
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 9bd55c336c..98fae6f9b3 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
>                                         bool inactivate_disks);
> -void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> +void qemu_savevm_state_pending(uint64_t max_size,
>                                 uint64_t *res_precopy, uint64_t *res_postcopy);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
> diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
> index ee60b53da4..9b74eeadf3 100644
> --- a/hw/s390x/s390-stattrib.c
> +++ b/hw/s390x/s390-stattrib.c
> @@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void cmma_save_pending(void *opaque, uint64_t max_size,
>                                uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>      S390StAttribState *sas = S390_STATTRIB(opaque);
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3423f113f0..760d5f3c5c 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,10 +456,8 @@ static void vfio_save_cleanup(void *opaque)
>      trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_save_pending(QEMUFile *f, void *opaque,
> -                              uint64_t threshold_size,
> -                              uint64_t *res_precopy,
> -                              uint64_t *res_postcopy)
> +static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
> +                              uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index dfea546330..a445bdc3c3 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -761,9 +761,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
> -                                      uint64_t max_size,
> -                                      uint64_t *res_precopy,
> +static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
> +                                      uint64_t *res_precopy, 
>                                        uint64_t *res_postcopy)
>  {
>      DBMSaveState *s = &((DBMState *)opaque)->save;
> diff --git a/migration/block.c b/migration/block.c
> index 4ae8f837b0..b3d680af75 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void block_save_pending(void *opaque, uint64_t max_size,
>                                 uint64_t *res_precopy,
>                                 uint64_t *res_postcopy)
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index 440aa62f16..038fc58a96 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3737,8 +3737,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>      uint64_t pending_size, pend_pre, pend_post;
>      bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  
> -    qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
> -                              &pend_post);
> +    qemu_savevm_state_pending(s->threshold_size, &pend_pre, &pend_post);
>      pending_size = pend_pre + pend_post;
>  
>      trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
> diff --git a/migration/ram.c b/migration/ram.c
> index 20167e1102..48a31b87c8 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3434,7 +3434,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void ram_save_pending(void *opaque, uint64_t max_size,
>                               uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>      RAMState **temp = opaque;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index a752ff9ea1..d937ab0b2e 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1471,8 +1471,7 @@ flush:
>   * the result is split into the amount for units that can and
>   * for units that can't do postcopy.
>   */
> -void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
> -                               uint64_t *res_precopy,
> +void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
>                                 uint64_t *res_postcopy)
>  {
>      SaveStateEntry *se;
> @@ -1489,7 +1488,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
>                  continue;
>              }
>          }
> -        se->ops->save_live_pending(f, se->opaque, threshold_size,
> +        se->ops->save_live_pending(se->opaque, threshold_size,
>                                     res_precopy, res_postcopy);
>      }
>  }
> -- 
> 2.37.2
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 4/7] migration: Split save_live_pending() into state_pending_*
  2022-10-03  3:15 ` [RFC 4/7] migration: Split save_live_pending() into state_pending_* Juan Quintela
@ 2022-11-22 20:08   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2022-11-22 20:08 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

* Juan Quintela (quintela@redhat.com) wrote:
> We split the function into to:
> 
> - state_pending_estimate: We estimate the remaining state size without
>   stopping the machine.
> 
> - state pending_exact: We calculate the exact amount of remaining
>   state.
> 
> The only "device" that implements different functions for _estimate()
> and _exact() is ram.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Yeh I think that's OK; I'm a little worried whether you end up calling
the two functions in migration_iteration_run a lot, but still


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  docs/devel/migration.rst       | 18 ++++++++++--------
>  docs/devel/vfio-migration.rst  |  4 ++--
>  include/migration/register.h   | 12 ++++++++----
>  migration/savevm.h             |  8 ++++++--
>  hw/s390x/s390-stattrib.c       |  7 ++++---
>  hw/vfio/migration.c            |  9 +++++----
>  migration/block-dirty-bitmap.c | 11 ++++++-----
>  migration/block.c              | 11 ++++++-----
>  migration/migration.c          | 13 +++++++++----
>  migration/ram.c                | 31 ++++++++++++++++++++++++-------
>  migration/savevm.c             | 34 +++++++++++++++++++++++++++++-----
>  hw/vfio/trace-events           |  2 +-
>  migration/trace-events         |  7 ++++---
>  13 files changed, 114 insertions(+), 53 deletions(-)
> 
> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
> index 3e9656d8e0..6f65c23b47 100644
> --- a/docs/devel/migration.rst
> +++ b/docs/devel/migration.rst
> @@ -482,15 +482,17 @@ An iterative device must provide:
>    - A ``load_setup`` function that initialises the data structures on the
>      destination.
>  
> -  - A ``save_live_pending`` function that is called repeatedly and must
> -    indicate how much more data the iterative data must save.  The core
> -    migration code will use this to determine when to pause the CPUs
> -    and complete the migration.
> +  - A ``state_pending_exact`` function that indicates how much more
> +    data we must save.  The core migration code will use this to
> +    determine when to pause the CPUs and complete the migration.
>  
> -  - A ``save_live_iterate`` function (called after ``save_live_pending``
> -    when there is significant data still to be sent).  It should send
> -    a chunk of data until the point that stream bandwidth limits tell it
> -    to stop.  Each call generates one section.
> +  - A ``state_pending_estimate`` function that indicates how much more
> +    data we must save.  When the estimated amount is smaller than the
> +    threshold, we call ``state_pending_exact``.
> +
> +  - A ``save_live_iterate`` function should send a chunk of data until
> +    the point that stream bandwidth limits tell it to stop.  Each call
> +    generates one section.
>  
>    - A ``save_live_complete_precopy`` function that must transmit the
>      last section for the device containing any remaining data.
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 9ff6163c88..673057c90d 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach as follows:
>  * A ``load_setup`` function that sets up the migration region on the
>    destination and sets _RESUMING flag in the VFIO device state.
>  
> -* A ``save_live_pending`` function that reads pending_bytes from the vendor
> +* A ``state_pending_exact`` function that reads pending_bytes from the vendor
>    driver, which indicates the amount of data that the vendor driver has yet to
>    save for the VFIO device.
>  
> @@ -114,7 +114,7 @@ Live migration save path
>                      (RUNNING, _SETUP, _RUNNING|_SAVING)
>                                    |
>                      (RUNNING, _ACTIVE, _RUNNING|_SAVING)
> -             If device is active, get pending_bytes by .save_live_pending()
> +             If device is active, get pending_bytes by .state_pending_exact()
>            If total pending_bytes >= threshold_size, call .save_live_iterate()
>                    Data of VFIO device for pre-copy phase is copied
>          Iterate till total pending bytes converge and are less than threshold
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 5b5424ed8f..313b8e1c3b 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,9 +46,7 @@ typedef struct SaveVMHandlers {
>  
>      /* This runs outside the iothread lock!  */
>      int (*save_setup)(QEMUFile *f, void *opaque);
> -    void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> -                              uint64_t *rest_precopy, uint64_t *rest_postcopy);
> -    /* Note for save_live_pending:
> +    /* Note for state_pending_*:
>       * - res_precopy is for data which must be migrated in precopy
>       *     phase or in stopped state, in other words - before target
>       *     vm start
> @@ -59,7 +57,13 @@ typedef struct SaveVMHandlers {
>       * Sum of res_precopy and res_postcopy is the whole amount of
>       * pending data.
>       */
> -
> +    /* This calculate the exact remaining data to transfer */
> +    void (*state_pending_exact)(void *opaque,  uint64_t threshold_size,
> +                                uint64_t *rest_precopy, uint64_t *rest_postcopy);
> +    /* This estimates the remaining data to transfer */
> +    void (*state_pending_estimate)(void *opaque,  uint64_t threshold_size,
> +                                   uint64_t *rest_precopy,
> +                                   uint64_t *rest_postcopy);
>  
>      LoadStateHandler *load_state;
>      int (*load_setup)(QEMUFile *f, void *opaque);
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 98fae6f9b3..613f85e717 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -40,8 +40,12 @@ void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
>                                         bool inactivate_disks);
> -void qemu_savevm_state_pending(uint64_t max_size,
> -                               uint64_t *res_precopy, uint64_t *res_postcopy);
> +void qemu_savevm_state_pending_exact(uint64_t max_size,
> +                                     uint64_t *res_precopy,
> +                                     uint64_t *res_postcopy);
> +void qemu_savevm_state_pending_estimate(uint64_t max_size,
> +                                        uint64_t *res_precopy,
> +                                        uint64_t *res_postcopy);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
>  int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
> diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
> index 9b74eeadf3..dfb95eb20c 100644
> --- a/hw/s390x/s390-stattrib.c
> +++ b/hw/s390x/s390-stattrib.c
> @@ -182,8 +182,8 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void cmma_save_pending(void *opaque, uint64_t max_size,
> -                              uint64_t *res_precopy, uint64_t *res_postcopy)
> +static void cmma_state_pending(void *opaque, uint64_t *res_precopy,
> +                               uint64_t *res_postcopy)
>  {
>      S390StAttribState *sas = S390_STATTRIB(opaque);
>      S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
> @@ -369,7 +369,8 @@ static SaveVMHandlers savevm_s390_stattrib_handlers = {
>      .save_setup = cmma_save_setup,
>      .save_live_iterate = cmma_save_iterate,
>      .save_live_complete_precopy = cmma_save_complete,
> -    .save_live_pending = cmma_save_pending,
> +    .state_pending_exact = cmma_state_pending,
> +    .state_pending_estimate = cmma_state_pending,
>      .save_cleanup = cmma_save_cleanup,
>      .load_state = cmma_load,
>      .is_active = cmma_active,
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 760d5f3c5c..680cf4df6e 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,8 +456,8 @@ static void vfio_save_cleanup(void *opaque)
>      trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
> -                              uint64_t *res_precopy, uint64_t *res_postcopy)
> +static void vfio_state_pending(void *opaque,  uint64_t threshold_size,
> +                               uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
> @@ -470,7 +470,7 @@ static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
>  
>      *res_precopy += migration->pending_bytes;
>  
> -    trace_vfio_save_pending(vbasedev->name, *res_precopy, *res_postcopy);
> +    trace_vfio_state_pending(vbasedev->name, *res_precopy, *res_postcopy);
>  }
>  
>  static int vfio_save_iterate(QEMUFile *f, void *opaque)
> @@ -681,7 +681,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
>  static SaveVMHandlers savevm_vfio_handlers = {
>      .save_setup = vfio_save_setup,
>      .save_cleanup = vfio_save_cleanup,
> -    .save_live_pending = vfio_save_pending,
> +    .state_pending_exact = vfio_state_pending,
> +    .state_pending_estimate = vfio_state_pending,
>      .save_live_iterate = vfio_save_iterate,
>      .save_live_complete_precopy = vfio_save_complete_precopy,
>      .save_state = vfio_save_state,
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index a445bdc3c3..5b24007650 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -761,9 +761,9 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
> -                                      uint64_t *res_precopy, 
> -                                      uint64_t *res_postcopy)
> +static void dirty_bitmap_state_pending(void *opaque, uint64_t max_size,
> +                                       uint64_t *res_precopy,
> +                                       uint64_t *res_postcopy)
>  {
>      DBMSaveState *s = &((DBMState *)opaque)->save;
>      SaveBitmapState *dbms;
> @@ -781,7 +781,7 @@ static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
>  
>      qemu_mutex_unlock_iothread();
>  
> -    trace_dirty_bitmap_save_pending(pending, max_size);
> +    trace_dirty_bitmap_state_pending(pending);
>  
>      *res_postcopy += pending;
>  }
> @@ -1250,7 +1250,8 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
>      .save_live_complete_postcopy = dirty_bitmap_save_complete,
>      .save_live_complete_precopy = dirty_bitmap_save_complete,
>      .has_postcopy = dirty_bitmap_has_postcopy,
> -    .save_live_pending = dirty_bitmap_save_pending,
> +    .state_pending_exact = dirty_bitmap_state_pending,
> +    .state_pending_estimate = dirty_bitmap_state_pending,
>      .save_live_iterate = dirty_bitmap_save_iterate,
>      .is_active_iterate = dirty_bitmap_is_active_iterate,
>      .load_state = dirty_bitmap_load,
> diff --git a/migration/block.c b/migration/block.c
> index 39ce4003c6..8e6ad1c468 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -862,9 +862,9 @@ static int block_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void block_save_pending(void *opaque, uint64_t max_size,
> -                               uint64_t *res_precopy,
> -                               uint64_t *res_postcopy)
> +static void block_state_pending(void *opaque, uint64_t max_size,
> +                                uint64_t *res_precopy,
> +                                uint64_t *res_postcopy)
>  {
>      /* Estimate pending number of bytes to send */
>      uint64_t pending;
> @@ -883,7 +883,7 @@ static void block_save_pending(void *opaque, uint64_t max_size,
>          pending = BLK_MIG_BLOCK_SIZE;
>      }
>  
> -    trace_migration_block_save_pending(pending);
> +    trace_migration_block_state_pending(pending);
>      /* We don't do postcopy */
>      *res_precopy += pending;
>  }
> @@ -1018,7 +1018,8 @@ static SaveVMHandlers savevm_block_handlers = {
>      .save_setup = block_save_setup,
>      .save_live_iterate = block_save_iterate,
>      .save_live_complete_precopy = block_save_complete,
> -    .save_live_pending = block_save_pending,
> +    .state_pending_exact = block_state_pending,
> +    .state_pending_estimate = block_state_pending,
>      .load_state = block_load,
>      .save_cleanup = block_migration_cleanup,
>      .is_active = block_is_active,
> diff --git a/migration/migration.c b/migration/migration.c
> index 038fc58a96..4676568699 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3734,13 +3734,18 @@ typedef enum {
>   */
>  static MigIterateState migration_iteration_run(MigrationState *s)
>  {
> -    uint64_t pending_size, pend_pre, pend_post;
> +    uint64_t pend_pre, pend_post;
>      bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  
> -    qemu_savevm_state_pending(s->threshold_size, &pend_pre, &pend_post);
> -    pending_size = pend_pre + pend_post;
> +    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre, &pend_post);
> +    uint64_t pending_size = pend_pre + pend_post;
> +    trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>  
> -    trace_migrate_pending(pending_size, s->threshold_size, pend_pre, pend_post);
> +    if (pend_pre <= s->threshold_size) {
> +        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre, &pend_post);
> +        pending_size = pend_pre + pend_post;
> +        trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
> +    }
>  
>      if (pending_size && pending_size >= s->threshold_size) {
>          /* Still a significant amount to transfer */
> diff --git a/migration/ram.c b/migration/ram.c
> index 48a31b87c8..8d989d51db 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3434,17 +3434,33 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void ram_save_pending(void *opaque, uint64_t max_size,
> -                             uint64_t *res_precopy, uint64_t *res_postcopy)
> +static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
> +                                       uint64_t *res_precopy,
> +                                       uint64_t *res_postcopy)
>  {
>      RAMState **temp = opaque;
>      RAMState *rs = *temp;
> -    uint64_t remaining_size;
>  
> -    remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> +    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
>  
> -    if (!migration_in_postcopy() &&
> -        remaining_size < max_size) {
> +    if (migrate_postcopy_ram()) {
> +        /* We can do postcopy, and all the data is postcopiable */
> +        *res_postcopy += remaining_size;
> +    } else {
> +        *res_precopy += remaining_size;
> +    }
> +}
> +
> +static void ram_state_pending_exact(void *opaque, uint64_t max_size,
> +                                    uint64_t *res_precopy,
> +                                    uint64_t *res_postcopy)
> +{
> +    RAMState **temp = opaque;
> +    RAMState *rs = *temp;
> +
> +    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> +
> +    if (!migration_in_postcopy()) {
>          qemu_mutex_lock_iothread();
>          WITH_RCU_READ_LOCK_GUARD() {
>              migration_bitmap_sync_precopy(rs);
> @@ -4600,7 +4616,8 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .save_live_complete_postcopy = ram_save_complete,
>      .save_live_complete_precopy = ram_save_complete,
>      .has_postcopy = ram_has_postcopy,
> -    .save_live_pending = ram_save_pending,
> +    .state_pending_exact = ram_state_pending_exact,
> +    .state_pending_estimate = ram_state_pending_estimate,
>      .load_state = ram_load,
>      .save_cleanup = ram_save_cleanup,
>      .load_setup = ram_load_setup,
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d937ab0b2e..976ece3f3f 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1471,8 +1471,9 @@ flush:
>   * the result is split into the amount for units that can and
>   * for units that can't do postcopy.
>   */
> -void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
> -                               uint64_t *res_postcopy)
> +void qemu_savevm_state_pending_exact(uint64_t threshold_size,
> +                                     uint64_t *res_precopy,
> +                                     uint64_t *res_postcopy)
>  {
>      SaveStateEntry *se;
>  
> @@ -1480,7 +1481,7 @@ void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
>      *res_postcopy = 0;
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -        if (!se->ops || !se->ops->save_live_pending) {
> +        if (!se->ops || !se->ops->state_pending_exact) {
>              continue;
>          }
>          if (se->ops->is_active) {
> @@ -1488,8 +1489,31 @@ void qemu_savevm_state_pending(uint64_t threshold_size, uint64_t *res_precopy,
>                  continue;
>              }
>          }
> -        se->ops->save_live_pending(se->opaque, threshold_size,
> -                                   res_precopy, res_postcopy);
> +        se->ops->state_pending_exact(se->opaque, threshold_size,
> +                                     res_precopy, res_postcopy);
> +    }
> +}
> +
> +void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
> +                                        uint64_t *res_precopy,
> +                                        uint64_t *res_postcopy)
> +{
> +    SaveStateEntry *se;
> +
> +    *res_precopy = 0;
> +    *res_postcopy = 0;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->state_pending_estimate) {
> +            continue;
> +        }
> +        if (se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +        se->ops->state_pending_estimate(se->opaque, threshold_size,
> +                                        res_precopy, res_postcopy);
>      }
>  }
>  
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index a21cbd2a56..90a8aecb37 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -157,7 +157,7 @@ vfio_save_cleanup(const char *name) " (%s)"
>  vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
>  vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
>  vfio_save_device_config_state(const char *name) " (%s)"
> -vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
> +vfio_state_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
>  vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
>  vfio_save_complete_precopy(const char *name) " (%s)"
>  vfio_load_device_config_state(const char *name) " (%s)"
> diff --git a/migration/trace-events b/migration/trace-events
> index f2a873fd6c..84352f310a 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -150,7 +150,8 @@ migrate_fd_cleanup(void) ""
>  migrate_fd_error(const char *error_desc) "error=%s"
>  migrate_fd_cancel(void) ""
>  migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
> -migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
> +migrate_pending_estimate(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
> +migrate_pending_exact(uint64_t size, uint64_t max, uint64_t pre, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
>  migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
>  migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
>  migration_completion_file_err(void) ""
> @@ -330,7 +331,7 @@ send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors, uin
>  dirty_bitmap_save_iterate(int in_postcopy) "in postcopy: %d"
>  dirty_bitmap_save_complete_enter(void) ""
>  dirty_bitmap_save_complete_finish(void) ""
> -dirty_bitmap_save_pending(uint64_t pending, uint64_t max_size) "pending %" PRIu64 " max: %" PRIu64
> +dirty_bitmap_state_pending(uint64_t pending) "pending %" PRIu64
>  dirty_bitmap_load_complete(void) ""
>  dirty_bitmap_load_bits_enter(uint64_t first_sector, uint32_t nr_sectors) "chunk: %" PRIu64 " %" PRIu32
>  dirty_bitmap_load_bits_zeroes(void) ""
> @@ -355,7 +356,7 @@ migration_block_save_device_dirty(int64_t sector) "Error reading sector %" PRId6
>  migration_block_flush_blks(const char *action, int submitted, int read_done, int transferred) "%s submitted %d read_done %d transferred %d"
>  migration_block_save(const char *mig_stage, int submitted, int transferred) "Enter save live %s submitted %d transferred %d"
>  migration_block_save_complete(void) "Block migration completed"
> -migration_block_save_pending(uint64_t pending) "Enter save live pending  %" PRIu64
> +migration_block_state_pending(uint64_t pending) "Enter save live pending  %" PRIu64
>  
>  # page_cache.c
>  migration_pagecache_init(int64_t max_num_items) "Setting cache buckets to %" PRId64
> -- 
> 2.37.2
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 5/7] migration: Remove unused threshold_size parameter
  2022-10-03  3:15 ` [RFC 5/7] migration: Remove unused threshold_size parameter Juan Quintela
@ 2022-11-23 16:38   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2022-11-23 16:38 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

* Juan Quintela (quintela@redhat.com) wrote:
> Until previous commit, save_live_pending() was used for ram.  Now with
> the split into state_pending_estimate() and state_pending_exact() it
> is not needed anymore, so remove them.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  include/migration/register.h   |  7 +++----
>  migration/savevm.h             |  6 ++----
>  hw/vfio/migration.c            |  6 +++---
>  migration/block-dirty-bitmap.c |  3 +--
>  migration/block.c              |  3 +--
>  migration/migration.c          |  4 ++--
>  migration/ram.c                |  6 ++----
>  migration/savevm.c             | 12 ++++--------
>  8 files changed, 18 insertions(+), 29 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 313b8e1c3b..5f08204fb2 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -58,11 +58,10 @@ typedef struct SaveVMHandlers {
>       * pending data.
>       */
>      /* This calculate the exact remaining data to transfer */
> -    void (*state_pending_exact)(void *opaque,  uint64_t threshold_size,
> -                                uint64_t *rest_precopy, uint64_t *rest_postcopy);
> +    void (*state_pending_exact)(void *opaque, uint64_t *rest_precopy,
> +                                uint64_t *rest_postcopy);
>      /* This estimates the remaining data to transfer */
> -    void (*state_pending_estimate)(void *opaque,  uint64_t threshold_size,
> -                                   uint64_t *rest_precopy,
> +    void (*state_pending_estimate)(void *opaque, uint64_t *rest_precopy,
>                                     uint64_t *rest_postcopy);
>  
>      LoadStateHandler *load_state;
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 613f85e717..24f2d2a28b 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -40,11 +40,9 @@ void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
>                                         bool inactivate_disks);
> -void qemu_savevm_state_pending_exact(uint64_t max_size,
> -                                     uint64_t *res_precopy,
> +void qemu_savevm_state_pending_exact(uint64_t *res_precopy,
>                                       uint64_t *res_postcopy);
> -void qemu_savevm_state_pending_estimate(uint64_t max_size,
> -                                        uint64_t *res_precopy,
> +void qemu_savevm_state_pending_estimate(uint64_t *res_precopy,
>                                          uint64_t *res_postcopy);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 680cf4df6e..d9598ce070 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,8 +456,8 @@ static void vfio_save_cleanup(void *opaque)
>      trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_state_pending(void *opaque,  uint64_t threshold_size,
> -                               uint64_t *res_precopy, uint64_t *res_postcopy)
> +static void vfio_state_pending(void *opaque, uint64_t *res_precopy,
> +                               uint64_t *res_postcopy)
>  {
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
> @@ -511,7 +511,7 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>      }
>  
>      /*
> -     * Reset pending_bytes as .save_live_pending is not called during savevm or
> +     * Reset pending_bytes as .state_pending_* is not called during savevm or

That comment change feels like it lives in a different patch, other than
that:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>       * snapshot case, in such case vfio_update_pending() at the start of this
>       * function updates pending_bytes.
>       */
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 5b24007650..8a11577252 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -761,8 +761,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void dirty_bitmap_state_pending(void *opaque, uint64_t max_size,
> -                                       uint64_t *res_precopy,
> +static void dirty_bitmap_state_pending(void *opaque, uint64_t *res_precopy,
>                                         uint64_t *res_postcopy)
>  {
>      DBMSaveState *s = &((DBMState *)opaque)->save;
> diff --git a/migration/block.c b/migration/block.c
> index 8e6ad1c468..4f1f7c0f8d 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -862,8 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void block_state_pending(void *opaque, uint64_t max_size,
> -                                uint64_t *res_precopy,
> +static void block_state_pending(void *opaque, uint64_t *res_precopy,
>                                  uint64_t *res_postcopy)
>  {
>      /* Estimate pending number of bytes to send */
> diff --git a/migration/migration.c b/migration/migration.c
> index 4676568699..97fefd579e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3737,12 +3737,12 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>      uint64_t pend_pre, pend_post;
>      bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  
> -    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre, &pend_post);
> +    qemu_savevm_state_pending_estimate(&pend_pre, &pend_post);
>      uint64_t pending_size = pend_pre + pend_post;
>      trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>  
>      if (pend_pre <= s->threshold_size) {
> -        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre, &pend_post);
> +        qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
>          pending_size = pend_pre + pend_post;
>          trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
>      }
> diff --git a/migration/ram.c b/migration/ram.c
> index 8d989d51db..53a5dd64a8 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3434,8 +3434,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>      return 0;
>  }
>  
> -static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
> -                                       uint64_t *res_precopy,
> +static void ram_state_pending_estimate(void *opaque, uint64_t *res_precopy,
>                                         uint64_t *res_postcopy)
>  {
>      RAMState **temp = opaque;
> @@ -3451,8 +3450,7 @@ static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
>      }
>  }
>  
> -static void ram_state_pending_exact(void *opaque, uint64_t max_size,
> -                                    uint64_t *res_precopy,
> +static void ram_state_pending_exact(void *opaque, uint64_t *res_precopy,
>                                      uint64_t *res_postcopy)
>  {
>      RAMState **temp = opaque;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 976ece3f3f..1e0328e088 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1471,8 +1471,7 @@ flush:
>   * the result is split into the amount for units that can and
>   * for units that can't do postcopy.
>   */
> -void qemu_savevm_state_pending_exact(uint64_t threshold_size,
> -                                     uint64_t *res_precopy,
> +void qemu_savevm_state_pending_exact(uint64_t *res_precopy,
>                                       uint64_t *res_postcopy)
>  {
>      SaveStateEntry *se;
> @@ -1489,13 +1488,11 @@ void qemu_savevm_state_pending_exact(uint64_t threshold_size,
>                  continue;
>              }
>          }
> -        se->ops->state_pending_exact(se->opaque, threshold_size,
> -                                     res_precopy, res_postcopy);
> +        se->ops->state_pending_exact(se->opaque, res_precopy, res_postcopy);
>      }
>  }
>  
> -void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
> -                                        uint64_t *res_precopy,
> +void qemu_savevm_state_pending_estimate(uint64_t *res_precopy,
>                                          uint64_t *res_postcopy)
>  {
>      SaveStateEntry *se;
> @@ -1512,8 +1509,7 @@ void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
>                  continue;
>              }
>          }
> -        se->ops->state_pending_estimate(se->opaque, threshold_size,
> -                                        res_precopy, res_postcopy);
> +        se->ops->state_pending_estimate(se->opaque, res_precopy, res_postcopy);
>      }
>  }
>  
> -- 
> 2.37.2
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 6/7] migration: simplify migration_iteration_run()
  2022-10-03  3:15 ` [RFC 6/7] migration: simplify migration_iteration_run() Juan Quintela
@ 2022-11-23 17:39   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 23+ messages in thread
From: Dr. David Alan Gilbert @ 2022-11-23 17:39 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alex Williamson, Eric Blake, Stefan Hajnoczi,
	Fam Zheng, qemu-s390x, Cornelia Huck, Thomas Huth,
	Vladimir Sementsov-Ogievskiy, Laurent Vivier, John Snow,
	Christian Borntraeger, Halil Pasic, Paolo Bonzini, qemu-block,
	Eric Farman, Richard Henderson, David Hildenbrand

* Juan Quintela (quintela@redhat.com) wrote:
> Signed-off-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 97fefd579e..35e512887a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3747,23 +3747,23 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>          trace_migrate_pending_exact(pending_size, s->threshold_size, pend_pre, pend_post);
>      }
>  
> -    if (pending_size && pending_size >= s->threshold_size) {
> -        /* Still a significant amount to transfer */
> -        if (!in_postcopy && pend_pre <= s->threshold_size &&
> -            qatomic_read(&s->start_postcopy)) {
> -            if (postcopy_start(s)) {
> -                error_report("%s: postcopy failed to start", __func__);
> -            }
> -            return MIG_ITERATE_SKIP;
> -        }
> -        /* Just another iteration step */
> -        qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
> -    } else {
> +    if (pending_size < s->threshold_size) {
>          trace_migration_thread_low_pending(pending_size);
>          migration_completion(s);
>          return MIG_ITERATE_BREAK;
>      }
>  
> +    /* Still a significant amount to transfer */
> +    if (!in_postcopy && pend_pre <= s->threshold_size &&
> +        qatomic_read(&s->start_postcopy)) {
> +        if (postcopy_start(s)) {
> +            error_report("%s: postcopy failed to start", __func__);
> +        }
> +        return MIG_ITERATE_SKIP;
> +    }
> +
> +    /* Just another iteration step */
> +    qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
>      return MIG_ITERATE_RESUME;
>  }
>  
> -- 
> 2.37.2
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-23 17:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-03  3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
2022-10-03  3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
2022-11-22 13:54   ` Dr. David Alan Gilbert
2022-10-03  3:15 ` [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
2022-11-22 17:48   ` Dr. David Alan Gilbert
2022-10-03  3:15 ` [RFC 3/7] migration: Block migration comment or code is wrong Juan Quintela
2022-10-03 19:12   ` Stefan Hajnoczi
2022-10-03  3:15 ` [RFC 4/7] migration: Split save_live_pending() into state_pending_* Juan Quintela
2022-11-22 20:08   ` Dr. David Alan Gilbert
2022-10-03  3:15 ` [RFC 5/7] migration: Remove unused threshold_size parameter Juan Quintela
2022-11-23 16:38   ` Dr. David Alan Gilbert
2022-10-03  3:15 ` [RFC 6/7] migration: simplify migration_iteration_run() Juan Quintela
2022-11-23 17:39   ` Dr. David Alan Gilbert
2022-10-03  3:16 ` [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped Juan Quintela
2022-10-13 12:25   ` Joao Martins
2022-10-13 16:08     ` Juan Quintela
2022-10-14 10:36       ` Joao Martins
2022-10-14 11:28         ` Juan Quintela
2022-10-14 12:29           ` Joao Martins
2022-10-18 12:22             ` Jason Gunthorpe via
2022-10-19 15:51               ` Yishai Hadas
2022-10-14 19:49     ` Jason Gunthorpe
2022-10-20 14:56 ` [RFC 0/7] migration patches for VFIO Yishai Hadas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).