All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 00/26] Next patches
@ 2023-02-02 16:06 Juan Quintela
  2023-02-02 16:06 ` [PULL 01/26] migration: Fix migration crash when target psize larger than host Juan Quintela
                   ` (27 more replies)
  0 siblings, 28 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x

The following changes since commit deabea6e88f7c4c3c12a36ee30051c6209561165:

  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-02-02 10:10:07 +0000)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request

for you to fetch changes up to 5ee6d3d1eeccd85aa2a835e82b8d9e1b4f7441e1:

  migration: check magic value for deciding the mapping of channels (2023-02-02 17:04:16 +0100)

----------------------------------------------------------------
Migration PULL request, new try

Hi

It includes:
- David Hildenbrand fixes for virtio-men
- David Gilbert canary to detect problems
- Fix for rdma return values (Fiona)
- Peter Xu uffd_open fixes
- Peter Xu show right downtime for postcopy
- manish.mishra msg fix fixes
- my vfio changes.

Please apply.

Please, apply.

----------------------------------------------------------------

David Hildenbrand (13):
  migration/ram: Fix populate_read_range()
  migration/ram: Fix error handling in ram_write_tracking_start()
  migration/ram: Don't explicitly unprotect when unregistering uffd-wp
  migration/ram: Rely on used_length for uffd_change_protection()
  migration/ram: Optimize ram_write_tracking_start() for
    RamDiscardManager
  migration/savevm: Move more savevm handling into vmstate_save()
  migration/savevm: Prepare vmdesc json writer in
    qemu_savevm_state_setup()
  migration/savevm: Allow immutable device state to be migrated early
    (i.e., before RAM)
  migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and
    VMSTATE_BITMAP_TEST()
  migration/ram: Factor out check for advised postcopy
  virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  virtio-mem: Migrate immutable properties early
  virtio-mem: Proper support for preallocation with migration

Dr. David Alan Gilbert (2):
  migration: Add canary to VMSTATE_END_OF_LIST
  migration: Perform vmsd structure check during tests

Fiona Ebner (1):
  migration/rdma: fix return value for qio_channel_rdma_{readv,writev}

Juan Quintela (4):
  migration: No save_live_pending() method uses the QEMUFile parameter
  migration: Split save_live_pending() into state_pending_*
  migration: Remove unused threshold_size parameter
  migration: simplify migration_iteration_run()

Peter Xu (3):
  migration: Fix migration crash when target psize larger than host
  util/userfaultfd: Add uffd_open()
  migration: Show downtime during postcopy phase

Zhenzhong Duan (1):
  migration/dirtyrate: Show sample pages only in page-sampling mode

manish.mishra (2):
  io: Add support for MSG_PEEK for socket channel
  migration: check magic value for deciding the mapping of channels

 docs/devel/migration.rst            |  18 +--
 docs/devel/vfio-migration.rst       |   4 +-
 include/hw/virtio/virtio-mem.h      |   8 ++
 include/io/channel.h                |   6 +
 include/migration/misc.h            |   4 +-
 include/migration/register.h        |  17 +--
 include/migration/vmstate.h         |  35 +++++-
 include/qemu/userfaultfd.h          |   8 ++
 migration/channel.h                 |   5 +
 migration/migration.h               |   4 +
 migration/multifd.h                 |   2 +-
 migration/postcopy-ram.h            |   2 +-
 migration/savevm.h                  |  10 +-
 chardev/char-socket.c               |   4 +-
 hw/core/machine.c                   |   4 +-
 hw/s390x/s390-stattrib.c            |  11 +-
 hw/vfio/migration.c                 |  20 +--
 hw/virtio/virtio-mem.c              | 144 ++++++++++++++++++++-
 io/channel-buffer.c                 |   1 +
 io/channel-command.c                |   1 +
 io/channel-file.c                   |   1 +
 io/channel-null.c                   |   1 +
 io/channel-socket.c                 |  19 ++-
 io/channel-tls.c                    |   1 +
 io/channel-websock.c                |   1 +
 io/channel.c                        |  16 ++-
 migration/block-dirty-bitmap.c      |  14 +--
 migration/block.c                   |  13 +-
 migration/channel-block.c           |   1 +
 migration/channel.c                 |  45 +++++++
 migration/dirtyrate.c               |  10 +-
 migration/migration.c               | 119 ++++++++++++------
 migration/multifd.c                 |  19 +--
 migration/postcopy-ram.c            |  16 +--
 migration/ram.c                     | 120 +++++++++++++-----
 migration/rdma.c                    |  16 ++-
 migration/savevm.c                  | 187 ++++++++++++++++++++--------
 migration/vmstate.c                 |   2 +
 scsi/qemu-pr-helper.c               |   2 +-
 tests/qtest/migration-test.c        |   3 +-
 tests/qtest/tpm-emu.c               |   2 +-
 tests/unit/test-io-channel-socket.c |   1 +
 util/userfaultfd.c                  |  13 +-
 util/vhost-user-server.c            |   2 +-
 hw/vfio/trace-events                |   2 +-
 migration/trace-events              |   7 +-
 46 files changed, 715 insertions(+), 226 deletions(-)

-- 
2.39.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PULL 01/26] migration: Fix migration crash when target psize larger than host
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 02/26] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu, qemu-stable

From: Peter Xu <peterx@redhat.com>

Commit d9e474ea56 overlooked the case where the target psize is even larger
than the host psize.  One example is Alpha has 8K page size and migration
will start to crash the source QEMU when running Alpha migration on x86.

Fix it by detecting that case and set host start/end just to cover the
single page to be migrated.

This will slightly optimize the common case where host psize equals to
guest psize so we don't even need to do the roundups, but that's trivial.

Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456
Fixes: d9e474ea56 ("migration: Teach PSS about host page")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..68a45338e3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2319,8 +2319,25 @@ static void pss_host_page_prepare(PageSearchStatus *pss)
     size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
 
     pss->host_page_sending = true;
-    pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
-    pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+    if (guest_pfns <= 1) {
+        /*
+         * This covers both when guest psize == host psize, or when guest
+         * has larger psize than the host (guest_pfns==0).
+         *
+         * For the latter, we always send one whole guest page per
+         * iteration of the host page (example: an Alpha VM on x86 host
+         * will have guest psize 8K while host psize 4K).
+         */
+        pss->host_page_start = pss->page;
+        pss->host_page_end = pss->page + 1;
+    } else {
+        /*
+         * The host page spans over multiple guest pages, we send them
+         * within the same host page iteration.
+         */
+        pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns);
+        pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns);
+    }
 }
 
 /*
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 02/26] migration: No save_live_pending() method uses the QEMUFile parameter
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
  2023-02-02 16:06 ` [PULL 01/26] migration: Fix migration crash when target psize larger than host Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 03/26] migration: Split save_live_pending() into state_pending_* Juan Quintela
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x

So remove it everywhere.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/register.h   | 2 +-
 migration/savevm.h             | 2 +-
 hw/s390x/s390-stattrib.c       | 2 +-
 hw/vfio/migration.c            | 2 +-
 migration/block-dirty-bitmap.c | 2 +-
 migration/block.c              | 2 +-
 migration/migration.c          | 2 +-
 migration/ram.c                | 2 +-
 migration/savevm.c             | 4 ++--
 9 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..6ca71367af 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,7 +46,7 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_setup)(QEMUFile *f, void *opaque);
-    void (*save_live_pending)(QEMUFile *f, void *opaque,
+    void (*save_live_pending)(void *opaque,
                               uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..524cf12f25 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+void qemu_savevm_state_pending(uint64_t max_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9eda1c3b2a..a553a1e850 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void cmma_save_pending(void *opaque, uint64_t max_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
                               uint64_t *res_postcopy_only)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c74453e0b5..b2125c7607 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,7 +456,7 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(QEMUFile *f, void *opaque,
+static void vfio_save_pending(void *opaque,
                               uint64_t threshold_size,
                               uint64_t *res_precopy_only,
                               uint64_t *res_compatible,
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 15127d489a..c27ef9b033 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -762,7 +762,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
+static void dirty_bitmap_save_pending(void *opaque,
                                       uint64_t max_size,
                                       uint64_t *res_precopy_only,
                                       uint64_t *res_compatible,
diff --git a/migration/block.c b/migration/block.c
index 5da15a62de..47852b8d58 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -863,7 +863,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void block_save_pending(void *opaque, uint64_t max_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..76524cc56e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3751,7 +3751,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pending_size, pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
+    qemu_savevm_state_pending(s->threshold_size, &pend_pre,
                               &pend_compat, &pend_post);
     pending_size = pend_pre + pend_compat + pend_post;
 
diff --git a/migration/ram.c b/migration/ram.c
index 68a45338e3..389739f162 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3409,7 +3409,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void ram_save_pending(void *opaque, uint64_t max_size,
                              uint64_t *res_precopy_only,
                              uint64_t *res_compatible,
                              uint64_t *res_postcopy_only)
diff --git a/migration/savevm.c b/migration/savevm.c
index a783789430..5e4bccb966 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1472,7 +1472,7 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
+void qemu_savevm_state_pending(uint64_t threshold_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
@@ -1493,7 +1493,7 @@ void qemu_savevm_state_pending(QEMUFile *f, uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->save_live_pending(f, se->opaque, threshold_size,
+        se->ops->save_live_pending(se->opaque, threshold_size,
                                    res_precopy_only, res_compatible,
                                    res_postcopy_only);
     }
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 03/26] migration: Split save_live_pending() into state_pending_*
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
  2023-02-02 16:06 ` [PULL 01/26] migration: Fix migration crash when target psize larger than host Juan Quintela
  2023-02-02 16:06 ` [PULL 02/26] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 04/26] migration: Remove unused threshold_size parameter Juan Quintela
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x

We split the function into to:

- state_pending_estimate: We estimate the remaining state size without
  stopping the machine.

- state pending_exact: We calculate the exact amount of remaining
  state.

The only "device" that implements different functions for _estimate()
and _exact() is ram.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/devel/migration.rst       | 18 ++++++++-------
 docs/devel/vfio-migration.rst  |  4 ++--
 include/migration/register.h   | 19 +++++++++------
 migration/savevm.h             | 12 ++++++----
 hw/s390x/s390-stattrib.c       | 11 +++++----
 hw/vfio/migration.c            | 21 +++++++++--------
 migration/block-dirty-bitmap.c | 15 ++++++------
 migration/block.c              | 13 ++++++-----
 migration/migration.c          | 20 +++++++++++-----
 migration/ram.c                | 35 ++++++++++++++++++++--------
 migration/savevm.c             | 42 +++++++++++++++++++++++++++-------
 hw/vfio/trace-events           |  2 +-
 migration/trace-events         |  7 +++---
 13 files changed, 143 insertions(+), 76 deletions(-)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 3e9656d8e0..6f65c23b47 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -482,15 +482,17 @@ An iterative device must provide:
   - A ``load_setup`` function that initialises the data structures on the
     destination.
 
-  - A ``save_live_pending`` function that is called repeatedly and must
-    indicate how much more data the iterative data must save.  The core
-    migration code will use this to determine when to pause the CPUs
-    and complete the migration.
+  - A ``state_pending_exact`` function that indicates how much more
+    data we must save.  The core migration code will use this to
+    determine when to pause the CPUs and complete the migration.
 
-  - A ``save_live_iterate`` function (called after ``save_live_pending``
-    when there is significant data still to be sent).  It should send
-    a chunk of data until the point that stream bandwidth limits tell it
-    to stop.  Each call generates one section.
+  - A ``state_pending_estimate`` function that indicates how much more
+    data we must save.  When the estimated amount is smaller than the
+    threshold, we call ``state_pending_exact``.
+
+  - A ``save_live_iterate`` function should send a chunk of data until
+    the point that stream bandwidth limits tell it to stop.  Each call
+    generates one section.
 
   - A ``save_live_complete_precopy`` function that must transmit the
     last section for the device containing any remaining data.
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..673057c90d 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach as follows:
 * A ``load_setup`` function that sets up the migration region on the
   destination and sets _RESUMING flag in the VFIO device state.
 
-* A ``save_live_pending`` function that reads pending_bytes from the vendor
+* A ``state_pending_exact`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
@@ -114,7 +114,7 @@ Live migration save path
                     (RUNNING, _SETUP, _RUNNING|_SAVING)
                                   |
                     (RUNNING, _ACTIVE, _RUNNING|_SAVING)
-             If device is active, get pending_bytes by .save_live_pending()
+             If device is active, get pending_bytes by .state_pending_exact()
           If total pending_bytes >= threshold_size, call .save_live_iterate()
                   Data of VFIO device for pre-copy phase is copied
         Iterate till total pending bytes converge and are less than threshold
diff --git a/include/migration/register.h b/include/migration/register.h
index 6ca71367af..15cf32994d 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,11 +46,6 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_setup)(QEMUFile *f, void *opaque);
-    void (*save_live_pending)(void *opaque,
-                              uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only);
     /* Note for save_live_pending:
      * - res_precopy_only is for data which must be migrated in precopy phase
      *     or in stopped state, in other words - before target vm start
@@ -61,8 +56,18 @@ typedef struct SaveVMHandlers {
      * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the
      * whole amount of pending data.
      */
-
-
+    /* This estimates the remaining data to transfer */
+    void (*state_pending_estimate)(void *opaque,
+                                   uint64_t threshold_size,
+                                   uint64_t *res_precopy_only,
+                                   uint64_t *res_compatible,
+                                   uint64_t *res_postcopy_only);
+    /* This calculate the exact remaining data to transfer */
+    void (*state_pending_exact)(void *opaque,
+                                uint64_t threshold_size,
+                                uint64_t *res_precopy_only,
+                                uint64_t *res_compatible,
+                                uint64_t *res_postcopy_only);
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
diff --git a/migration/savevm.h b/migration/savevm.h
index 524cf12f25..5d2cff4411 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,10 +40,14 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending(uint64_t max_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only);
+void qemu_savevm_state_pending_exact(uint64_t threshold_size,
+                                     uint64_t *res_precopy_only,
+                                     uint64_t *res_compatible,
+                                     uint64_t *res_postcopy_only);
+void qemu_savevm_state_pending_estimate(uint64_t thershold_size,
+                                        uint64_t *res_precopy_only,
+                                        uint64_t *res_compatible,
+                                        uint64_t *res_postcopy_only);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index a553a1e850..8f573ebb10 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,10 +182,10 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_save_pending(void *opaque, uint64_t max_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
+static void cmma_state_pending(void *opaque, uint64_t max_size,
+                               uint64_t *res_precopy_only,
+                               uint64_t *res_compatible,
+                               uint64_t *res_postcopy_only)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
@@ -371,7 +371,8 @@ static SaveVMHandlers savevm_s390_stattrib_handlers = {
     .save_setup = cmma_save_setup,
     .save_live_iterate = cmma_save_iterate,
     .save_live_complete_precopy = cmma_save_complete,
-    .save_live_pending = cmma_save_pending,
+    .state_pending_exact = cmma_state_pending,
+    .state_pending_estimate = cmma_state_pending,
     .save_cleanup = cmma_save_cleanup,
     .load_state = cmma_load,
     .is_active = cmma_active,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index b2125c7607..c49ca466d4 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,11 +456,11 @@ static void vfio_save_cleanup(void *opaque)
     trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(void *opaque,
-                              uint64_t threshold_size,
-                              uint64_t *res_precopy_only,
-                              uint64_t *res_compatible,
-                              uint64_t *res_postcopy_only)
+static void vfio_state_pending(void *opaque,
+                               uint64_t threshold_size,
+                               uint64_t *res_precopy_only,
+                               uint64_t *res_compatible,
+                               uint64_t *res_postcopy_only)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -473,7 +473,7 @@ static void vfio_save_pending(void *opaque,
 
     *res_precopy_only += migration->pending_bytes;
 
-    trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
+    trace_vfio_state_pending(vbasedev->name, *res_precopy_only,
                             *res_postcopy_only, *res_compatible);
 }
 
@@ -515,9 +515,9 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
     }
 
     /*
-     * Reset pending_bytes as .save_live_pending is not called during savevm or
-     * snapshot case, in such case vfio_update_pending() at the start of this
-     * function updates pending_bytes.
+     * Reset pending_bytes as state_pending* are not called during
+     * savevm or snapshot case, in such case vfio_update_pending() at
+     * the start of this function updates pending_bytes.
      */
     migration->pending_bytes = 0;
     trace_vfio_save_iterate(vbasedev->name, data_size);
@@ -685,7 +685,8 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_vfio_handlers = {
     .save_setup = vfio_save_setup,
     .save_cleanup = vfio_save_cleanup,
-    .save_live_pending = vfio_save_pending,
+    .state_pending_exact = vfio_state_pending,
+    .state_pending_estimate = vfio_state_pending,
     .save_live_iterate = vfio_save_iterate,
     .save_live_complete_precopy = vfio_save_complete_precopy,
     .save_state = vfio_save_state,
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c27ef9b033..6fac9fb34f 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -762,11 +762,11 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void dirty_bitmap_save_pending(void *opaque,
-                                      uint64_t max_size,
-                                      uint64_t *res_precopy_only,
-                                      uint64_t *res_compatible,
-                                      uint64_t *res_postcopy_only)
+static void dirty_bitmap_state_pending(void *opaque,
+                                       uint64_t max_size,
+                                       uint64_t *res_precopy_only,
+                                       uint64_t *res_compatible,
+                                       uint64_t *res_postcopy_only)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms;
@@ -784,7 +784,7 @@ static void dirty_bitmap_save_pending(void *opaque,
 
     qemu_mutex_unlock_iothread();
 
-    trace_dirty_bitmap_save_pending(pending, max_size);
+    trace_dirty_bitmap_state_pending(pending);
 
     *res_postcopy_only += pending;
 }
@@ -1253,7 +1253,8 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
     .save_live_complete_postcopy = dirty_bitmap_save_complete,
     .save_live_complete_precopy = dirty_bitmap_save_complete,
     .has_postcopy = dirty_bitmap_has_postcopy,
-    .save_live_pending = dirty_bitmap_save_pending,
+    .state_pending_exact = dirty_bitmap_state_pending,
+    .state_pending_estimate = dirty_bitmap_state_pending,
     .save_live_iterate = dirty_bitmap_save_iterate,
     .is_active_iterate = dirty_bitmap_is_active_iterate,
     .load_state = dirty_bitmap_load,
diff --git a/migration/block.c b/migration/block.c
index 47852b8d58..544e74e9c5 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -863,10 +863,10 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_save_pending(void *opaque, uint64_t max_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only)
+static void block_state_pending(void *opaque, uint64_t max_size,
+                                uint64_t *res_precopy_only,
+                                uint64_t *res_compatible,
+                                uint64_t *res_postcopy_only)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -885,7 +885,7 @@ static void block_save_pending(void *opaque, uint64_t max_size,
         pending = BLK_MIG_BLOCK_SIZE;
     }
 
-    trace_migration_block_save_pending(pending);
+    trace_migration_block_state_pending(pending);
     /* We don't do postcopy */
     *res_precopy_only += pending;
 }
@@ -1020,7 +1020,8 @@ static SaveVMHandlers savevm_block_handlers = {
     .save_setup = block_save_setup,
     .save_live_iterate = block_save_iterate,
     .save_live_complete_precopy = block_save_complete,
-    .save_live_pending = block_save_pending,
+    .state_pending_exact = block_state_pending,
+    .state_pending_estimate = block_state_pending,
     .load_state = block_load,
     .save_cleanup = block_migration_cleanup,
     .is_active = block_is_active,
diff --git a/migration/migration.c b/migration/migration.c
index 76524cc56e..e7b4b94348 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3748,15 +3748,23 @@ typedef enum {
  */
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
-    uint64_t pending_size, pend_pre, pend_compat, pend_post;
+    uint64_t pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending(s->threshold_size, &pend_pre,
-                              &pend_compat, &pend_post);
-    pending_size = pend_pre + pend_compat + pend_post;
+    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre,
+                                       &pend_compat, &pend_post);
+    uint64_t pending_size = pend_pre + pend_compat + pend_post;
 
-    trace_migrate_pending(pending_size, s->threshold_size,
-                          pend_pre, pend_compat, pend_post);
+    trace_migrate_pending_estimate(pending_size, s->threshold_size,
+                                   pend_pre, pend_compat, pend_post);
+
+    if (pend_pre + pend_compat <= s->threshold_size) {
+        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre,
+                                        &pend_compat, &pend_post);
+        pending_size = pend_pre + pend_compat + pend_post;
+        trace_migrate_pending_exact(pending_size, s->threshold_size,
+                                    pend_pre, pend_compat, pend_post);
+    }
 
     if (pending_size && pending_size >= s->threshold_size) {
         /* Still a significant amount to transfer */
diff --git a/migration/ram.c b/migration/ram.c
index 389739f162..56ff9cd29d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3409,19 +3409,35 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_save_pending(void *opaque, uint64_t max_size,
-                             uint64_t *res_precopy_only,
-                             uint64_t *res_compatible,
-                             uint64_t *res_postcopy_only)
+static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
+                                       uint64_t *res_precopy_only,
+                                       uint64_t *res_compatible,
+                                       uint64_t *res_postcopy_only)
 {
     RAMState **temp = opaque;
     RAMState *rs = *temp;
-    uint64_t remaining_size;
 
-    remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
 
-    if (!migration_in_postcopy() &&
-        remaining_size < max_size) {
+    if (migrate_postcopy_ram()) {
+        /* We can do postcopy, and all the data is postcopiable */
+        *res_postcopy_only += remaining_size;
+    } else {
+        *res_precopy_only += remaining_size;
+    }
+}
+
+static void ram_state_pending_exact(void *opaque, uint64_t max_size,
+                                    uint64_t *res_precopy_only,
+                                    uint64_t *res_compatible,
+                                    uint64_t *res_postcopy_only)
+{
+    RAMState **temp = opaque;
+    RAMState *rs = *temp;
+
+    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
+
+    if (!migration_in_postcopy()) {
         qemu_mutex_lock_iothread();
         WITH_RCU_READ_LOCK_GUARD() {
             migration_bitmap_sync_precopy(rs);
@@ -4577,7 +4593,8 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .has_postcopy = ram_has_postcopy,
-    .save_live_pending = ram_save_pending,
+    .state_pending_exact = ram_state_pending_exact,
+    .state_pending_estimate = ram_state_pending_estimate,
     .load_state = ram_load,
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
diff --git a/migration/savevm.c b/migration/savevm.c
index 5e4bccb966..7f9f770c1e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1472,10 +1472,10 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending(uint64_t threshold_size,
-                               uint64_t *res_precopy_only,
-                               uint64_t *res_compatible,
-                               uint64_t *res_postcopy_only)
+void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
+                                        uint64_t *res_precopy_only,
+                                        uint64_t *res_compatible,
+                                        uint64_t *res_postcopy_only)
 {
     SaveStateEntry *se;
 
@@ -1485,7 +1485,7 @@ void qemu_savevm_state_pending(uint64_t threshold_size,
 
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_pending) {
+        if (!se->ops || !se->ops->state_pending_exact) {
             continue;
         }
         if (se->ops->is_active) {
@@ -1493,9 +1493,35 @@ void qemu_savevm_state_pending(uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->save_live_pending(se->opaque, threshold_size,
-                                   res_precopy_only, res_compatible,
-                                   res_postcopy_only);
+        se->ops->state_pending_exact(se->opaque, threshold_size,
+                                     res_precopy_only, res_compatible,
+                                     res_postcopy_only);
+    }
+}
+
+void qemu_savevm_state_pending_exact(uint64_t threshold_size,
+                                     uint64_t *res_precopy_only,
+                                     uint64_t *res_compatible,
+                                     uint64_t *res_postcopy_only)
+{
+    SaveStateEntry *se;
+
+    *res_precopy_only = 0;
+    *res_compatible = 0;
+    *res_postcopy_only = 0;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->state_pending_estimate) {
+            continue;
+        }
+        if (se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        se->ops->state_pending_estimate(se->opaque, threshold_size,
+                                        res_precopy_only, res_compatible,
+                                        res_postcopy_only);
     }
 }
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9e00..52de1c84f8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -157,7 +157,7 @@ vfio_save_cleanup(const char *name) " (%s)"
 vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
 vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
 vfio_save_device_config_state(const char *name) " (%s)"
-vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
+vfio_state_pending(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t compatible) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" compatible 0x%"PRIx64
 vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_complete_precopy(const char *name) " (%s)"
 vfio_load_device_config_state(const char *name) " (%s)"
diff --git a/migration/trace-events b/migration/trace-events
index 57003edcbd..adb680b0e6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -150,7 +150,8 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_exact(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "exact pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_estimate(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "estimate pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
@@ -330,7 +331,7 @@ send_bitmap_bits(uint32_t flags, uint64_t start_sector, uint32_t nr_sectors, uin
 dirty_bitmap_save_iterate(int in_postcopy) "in postcopy: %d"
 dirty_bitmap_save_complete_enter(void) ""
 dirty_bitmap_save_complete_finish(void) ""
-dirty_bitmap_save_pending(uint64_t pending, uint64_t max_size) "pending %" PRIu64 " max: %" PRIu64
+dirty_bitmap_state_pending(uint64_t pending) "pending %" PRIu64
 dirty_bitmap_load_complete(void) ""
 dirty_bitmap_load_bits_enter(uint64_t first_sector, uint32_t nr_sectors) "chunk: %" PRIu64 " %" PRIu32
 dirty_bitmap_load_bits_zeroes(void) ""
@@ -355,7 +356,7 @@ migration_block_save_device_dirty(int64_t sector) "Error reading sector %" PRId6
 migration_block_flush_blks(const char *action, int submitted, int read_done, int transferred) "%s submitted %d read_done %d transferred %d"
 migration_block_save(const char *mig_stage, int submitted, int transferred) "Enter save live %s submitted %d transferred %d"
 migration_block_save_complete(void) "Block migration completed"
-migration_block_save_pending(uint64_t pending) "Enter save live pending  %" PRIu64
+migration_block_state_pending(uint64_t pending) "Enter save live pending  %" PRIu64
 
 # page_cache.c
 migration_pagecache_init(int64_t max_num_items) "Setting cache buckets to %" PRId64
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 04/26] migration: Remove unused threshold_size parameter
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (2 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 03/26] migration: Split save_live_pending() into state_pending_* Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 05/26] migration: simplify migration_iteration_run() Juan Quintela
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x

Until previous commit, save_live_pending() was used for ram.  Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/register.h   |  2 --
 migration/savevm.h             |  6 ++----
 hw/s390x/s390-stattrib.c       |  2 +-
 hw/vfio/migration.c            |  1 -
 migration/block-dirty-bitmap.c |  1 -
 migration/block.c              |  2 +-
 migration/migration.c          | 10 ++++------
 migration/ram.c                |  4 ++--
 migration/savevm.c             | 11 ++++-------
 migration/trace-events         |  4 ++--
 10 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 15cf32994d..b91a0cdbf8 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -58,13 +58,11 @@ typedef struct SaveVMHandlers {
      */
     /* This estimates the remaining data to transfer */
     void (*state_pending_estimate)(void *opaque,
-                                   uint64_t threshold_size,
                                    uint64_t *res_precopy_only,
                                    uint64_t *res_compatible,
                                    uint64_t *res_postcopy_only);
     /* This calculate the exact remaining data to transfer */
     void (*state_pending_exact)(void *opaque,
-                                uint64_t threshold_size,
                                 uint64_t *res_precopy_only,
                                 uint64_t *res_compatible,
                                 uint64_t *res_postcopy_only);
diff --git a/migration/savevm.h b/migration/savevm.h
index 5d2cff4411..b1901e68d5 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,12 +40,10 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
                                        bool inactivate_disks);
-void qemu_savevm_state_pending_exact(uint64_t threshold_size,
-                                     uint64_t *res_precopy_only,
+void qemu_savevm_state_pending_exact(uint64_t *res_precopy_only,
                                      uint64_t *res_compatible,
                                      uint64_t *res_postcopy_only);
-void qemu_savevm_state_pending_estimate(uint64_t thershold_size,
-                                        uint64_t *res_precopy_only,
+void qemu_savevm_state_pending_estimate(uint64_t *res_precopy_only,
                                         uint64_t *res_compatible,
                                         uint64_t *res_postcopy_only);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 8f573ebb10..3e32002eab 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void cmma_state_pending(void *opaque, uint64_t max_size,
+static void cmma_state_pending(void *opaque,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c49ca466d4..b3318f0f20 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -457,7 +457,6 @@ static void vfio_save_cleanup(void *opaque)
 }
 
 static void vfio_state_pending(void *opaque,
-                               uint64_t threshold_size,
                                uint64_t *res_precopy_only,
                                uint64_t *res_compatible,
                                uint64_t *res_postcopy_only)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 6fac9fb34f..5a621419d3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -763,7 +763,6 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
 }
 
 static void dirty_bitmap_state_pending(void *opaque,
-                                       uint64_t max_size,
                                        uint64_t *res_precopy_only,
                                        uint64_t *res_compatible,
                                        uint64_t *res_postcopy_only)
diff --git a/migration/block.c b/migration/block.c
index 544e74e9c5..29f69025af 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -863,7 +863,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void block_state_pending(void *opaque, uint64_t max_size,
+static void block_state_pending(void *opaque,
                                 uint64_t *res_precopy_only,
                                 uint64_t *res_compatible,
                                 uint64_t *res_postcopy_only)
diff --git a/migration/migration.c b/migration/migration.c
index e7b4b94348..594a42f085 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3751,18 +3751,16 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-    qemu_savevm_state_pending_estimate(s->threshold_size, &pend_pre,
-                                       &pend_compat, &pend_post);
+    qemu_savevm_state_pending_estimate(&pend_pre, &pend_compat, &pend_post);
     uint64_t pending_size = pend_pre + pend_compat + pend_post;
 
-    trace_migrate_pending_estimate(pending_size, s->threshold_size,
+    trace_migrate_pending_estimate(pending_size,
                                    pend_pre, pend_compat, pend_post);
 
     if (pend_pre + pend_compat <= s->threshold_size) {
-        qemu_savevm_state_pending_exact(s->threshold_size, &pend_pre,
-                                        &pend_compat, &pend_post);
+        qemu_savevm_state_pending_exact(&pend_pre, &pend_compat, &pend_post);
         pending_size = pend_pre + pend_compat + pend_post;
-        trace_migrate_pending_exact(pending_size, s->threshold_size,
+        trace_migrate_pending_exact(pending_size,
                                     pend_pre, pend_compat, pend_post);
     }
 
diff --git a/migration/ram.c b/migration/ram.c
index 56ff9cd29d..885d7dbf23 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3409,7 +3409,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
+static void ram_state_pending_estimate(void *opaque,
                                        uint64_t *res_precopy_only,
                                        uint64_t *res_compatible,
                                        uint64_t *res_postcopy_only)
@@ -3427,7 +3427,7 @@ static void ram_state_pending_estimate(void *opaque, uint64_t max_size,
     }
 }
 
-static void ram_state_pending_exact(void *opaque, uint64_t max_size,
+static void ram_state_pending_exact(void *opaque,
                                     uint64_t *res_precopy_only,
                                     uint64_t *res_compatible,
                                     uint64_t *res_postcopy_only)
diff --git a/migration/savevm.c b/migration/savevm.c
index 7f9f770c1e..e1caa3ea7c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1472,8 +1472,7 @@ flush:
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
  */
-void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
-                                        uint64_t *res_precopy_only,
+void qemu_savevm_state_pending_estimate(uint64_t *res_precopy_only,
                                         uint64_t *res_compatible,
                                         uint64_t *res_postcopy_only)
 {
@@ -1483,7 +1482,6 @@ void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
     *res_compatible = 0;
     *res_postcopy_only = 0;
 
-
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->state_pending_exact) {
             continue;
@@ -1493,14 +1491,13 @@ void qemu_savevm_state_pending_estimate(uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->state_pending_exact(se->opaque, threshold_size,
+        se->ops->state_pending_exact(se->opaque,
                                      res_precopy_only, res_compatible,
                                      res_postcopy_only);
     }
 }
 
-void qemu_savevm_state_pending_exact(uint64_t threshold_size,
-                                     uint64_t *res_precopy_only,
+void qemu_savevm_state_pending_exact(uint64_t *res_precopy_only,
                                      uint64_t *res_compatible,
                                      uint64_t *res_postcopy_only)
 {
@@ -1519,7 +1516,7 @@ void qemu_savevm_state_pending_exact(uint64_t threshold_size,
                 continue;
             }
         }
-        se->ops->state_pending_estimate(se->opaque, threshold_size,
+        se->ops->state_pending_estimate(se->opaque,
                                         res_precopy_only, res_compatible,
                                         res_postcopy_only);
     }
diff --git a/migration/trace-events b/migration/trace-events
index adb680b0e6..67b65a70ff 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -150,8 +150,8 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-migrate_pending_exact(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "exact pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
-migrate_pending_estimate(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "estimate pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t compat, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
+migrate_pending_estimate(uint64_t size, uint64_t pre, uint64_t compat, uint64_t post) "estimate pending size %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 05/26] migration: simplify migration_iteration_run()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (3 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 04/26] migration: Remove unused threshold_size parameter Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 06/26] util/userfaultfd: Add uffd_open() Juan Quintela
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 594a42f085..cb9aee76c0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3764,23 +3764,23 @@ static MigIterateState migration_iteration_run(MigrationState *s)
                                     pend_pre, pend_compat, pend_post);
     }
 
-    if (pending_size && pending_size >= s->threshold_size) {
-        /* Still a significant amount to transfer */
-        if (!in_postcopy && pend_pre <= s->threshold_size &&
-            qatomic_read(&s->start_postcopy)) {
-            if (postcopy_start(s)) {
-                error_report("%s: postcopy failed to start", __func__);
-            }
-            return MIG_ITERATE_SKIP;
-        }
-        /* Just another iteration step */
-        qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
-    } else {
+    if (!pending_size || pending_size < s->threshold_size) {
         trace_migration_thread_low_pending(pending_size);
         migration_completion(s);
         return MIG_ITERATE_BREAK;
     }
 
+    /* Still a significant amount to transfer */
+    if (!in_postcopy && pend_pre <= s->threshold_size &&
+        qatomic_read(&s->start_postcopy)) {
+        if (postcopy_start(s)) {
+            error_report("%s: postcopy failed to start", __func__);
+        }
+        return MIG_ITERATE_SKIP;
+    }
+
+    /* Just another iteration step */
+    qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
     return MIG_ITERATE_RESUME;
 }
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 06/26] util/userfaultfd: Add uffd_open()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (4 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 05/26] migration: simplify migration_iteration_run() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 07/26] migration/ram: Fix populate_read_range() Juan Quintela
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: Peter Xu <peterx@redhat.com>

Add a helper to create the uffd handle.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/qemu/userfaultfd.h   |  8 ++++++++
 migration/postcopy-ram.c     | 11 +++++------
 tests/qtest/migration-test.c |  3 ++-
 util/userfaultfd.c           | 13 +++++++++++--
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/qemu/userfaultfd.h b/include/qemu/userfaultfd.h
index 6b74f92792..2101115f70 100644
--- a/include/qemu/userfaultfd.h
+++ b/include/qemu/userfaultfd.h
@@ -17,6 +17,14 @@
 #include "exec/hwaddr.h"
 #include <linux/userfaultfd.h>
 
+/**
+ * uffd_open(): Open an userfaultfd handle for current context.
+ *
+ * @flags: The flags we want to pass in when creating the handle.
+ *
+ * Returns: the uffd handle if >=0, or <0 if error happens.
+ */
+int uffd_open(int flags);
 int uffd_query_features(uint64_t *features);
 int uffd_create_fd(uint64_t features, bool non_blocking);
 void uffd_close_fd(int uffd_fd);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b9a37ef255..0c55df0e52 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -37,6 +37,7 @@
 #include "qemu-file.h"
 #include "yank_functions.h"
 #include "tls.h"
+#include "qemu/userfaultfd.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -226,11 +227,9 @@ static bool receive_ufd_features(uint64_t *features)
     int ufd;
     bool ret = true;
 
-    /* if we are here __NR_userfaultfd should exists */
-    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    ufd = uffd_open(O_CLOEXEC);
     if (ufd == -1) {
-        error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
-                     strerror(errno));
+        error_report("%s: uffd_open() failed: %s", __func__, strerror(errno));
         return false;
     }
 
@@ -375,7 +374,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
         goto out;
     }
 
-    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    ufd = uffd_open(O_CLOEXEC);
     if (ufd == -1) {
         error_report("%s: userfaultfd not available: %s", __func__,
                      strerror(errno));
@@ -1160,7 +1159,7 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
 int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
 {
     /* Open the fd for the kernel to give us userfaults */
-    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    mis->userfault_fd = uffd_open(O_CLOEXEC | O_NONBLOCK);
     if (mis->userfault_fd == -1) {
         error_report("%s: Failed to open userfault fd: %s", __func__,
                      strerror(errno));
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 1dd32c9506..7a5d1922dd 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -62,13 +62,14 @@ static bool uffd_feature_thread_id;
 #include <sys/eventfd.h>
 #include <sys/ioctl.h>
 #include <linux/userfaultfd.h>
+#include "qemu/userfaultfd.h"
 
 static bool ufd_version_check(void)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
 
-    int ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    int ufd = uffd_open(O_CLOEXEC);
 
     if (ufd == -1) {
         g_test_message("Skipping test: userfaultfd not available");
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index f1cd6af2b1..4953b3137d 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -19,6 +19,15 @@
 #include <sys/syscall.h>
 #include <sys/ioctl.h>
 
+int uffd_open(int flags)
+{
+#if defined(__NR_userfaultfd)
+    return syscall(__NR_userfaultfd, flags);
+#else
+    return -EINVAL;
+#endif
+}
+
 /**
  * uffd_query_features: query UFFD features
  *
@@ -32,7 +41,7 @@ int uffd_query_features(uint64_t *features)
     struct uffdio_api api_struct = { 0 };
     int ret = -1;
 
-    uffd_fd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    uffd_fd = uffd_open(O_CLOEXEC);
     if (uffd_fd < 0) {
         trace_uffd_query_features_nosys(errno);
         return -1;
@@ -69,7 +78,7 @@ int uffd_create_fd(uint64_t features, bool non_blocking)
     uint64_t ioctl_mask = BIT(_UFFDIO_REGISTER) | BIT(_UFFDIO_UNREGISTER);
 
     flags = O_CLOEXEC | (non_blocking ? O_NONBLOCK : 0);
-    uffd_fd = syscall(__NR_userfaultfd, flags);
+    uffd_fd = uffd_open(flags);
     if (uffd_fd < 0) {
         trace_uffd_create_fd_nosys(errno);
         return -1;
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 07/26] migration/ram: Fix populate_read_range()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (5 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 06/26] util/userfaultfd: Add uffd_open() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 08/26] migration/ram: Fix error handling in ram_write_tracking_start() Juan Quintela
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, qemu-stable, Peter Xu

From: David Hildenbrand <david@redhat.com>

Unfortunately, commit f7b9dcfbcf44 broke populate_read_range(): the loop
end condition is very wrong, resulting in that function not populating the
full range. Lets' fix that.

Fixes: f7b9dcfbcf44 ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 885d7dbf23..ba228eead4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1774,13 +1774,15 @@ out:
 static inline void populate_read_range(RAMBlock *block, ram_addr_t offset,
                                        ram_addr_t size)
 {
+    const ram_addr_t end = offset + size;
+
     /*
      * We read one byte of each page; this will preallocate page tables if
      * required and populate the shared zeropage on MAP_PRIVATE anonymous memory
      * where no page was populated yet. This might require adaption when
      * supporting other mappings, like shmem.
      */
-    for (; offset < size; offset += block->page_size) {
+    for (; offset < end; offset += block->page_size) {
         char tmp = *((char *)block->host + offset);
 
         /* Don't optimize the read out */
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 08/26] migration/ram: Fix error handling in ram_write_tracking_start()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (6 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 07/26] migration/ram: Fix populate_read_range() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 09/26] migration/ram: Don't explicitly unprotect when unregistering uffd-wp Juan Quintela
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, qemu-stable, Peter Xu

From: David Hildenbrand <david@redhat.com>

If something goes wrong during uffd_change_protection(), we would miss
to unregister uffd-wp and not release our reference. Fix it by
performing the uffd_change_protection(true) last.

Note that a uffd_change_protection(false) on the recovery path without a
prior uffd_change_protection(false) is fine.

Fixes: 278e2f551a09 ("migration: support UFFD write fault processing in ram_save_iterate()")
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ba228eead4..73e5ca93e5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1896,13 +1896,14 @@ int ram_write_tracking_start(void)
                 block->max_length, UFFDIO_REGISTER_MODE_WP, NULL)) {
             goto fail;
         }
+        block->flags |= RAM_UF_WRITEPROTECT;
+        memory_region_ref(block->mr);
+
         /* Apply UFFD write protection to the block memory range */
         if (uffd_change_protection(rs->uffdio_fd, block->host,
                 block->max_length, true, false)) {
             goto fail;
         }
-        block->flags |= RAM_UF_WRITEPROTECT;
-        memory_region_ref(block->mr);
 
         trace_ram_write_tracking_ramblock_start(block->idstr, block->page_size,
                 block->host, block->max_length);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 09/26] migration/ram: Don't explicitly unprotect when unregistering uffd-wp
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (7 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 08/26] migration/ram: Fix error handling in ram_write_tracking_start() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 10/26] migration/ram: Rely on used_length for uffd_change_protection() Juan Quintela
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

When unregistering uffd-wp, older kernels before commit f369b07c86143
("mm/uffd:reset write protection when unregister with wp-mode") won't
clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous
uffd-wp PTE bits would trigger again. With above commit, the kernel will
clear the uffd-wp PTE bits when unregistering itself.

Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we
don't care about clearing them at all: a new background snapshot will
re-register uffd-wp and re-protect all memory either way.

So let's skip the manual clearing of uffd-wp. If ever relevant, we
could clear conditionally in uffd_unregister_memory() -- we just need a
way to figure out more recent kernels.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 73e5ca93e5..efaae07dd8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1918,12 +1918,6 @@ fail:
         if ((block->flags & RAM_UF_WRITEPROTECT) == 0) {
             continue;
         }
-        /*
-         * In case some memory block failed to be write-protected
-         * remove protection and unregister all succeeded RAM blocks
-         */
-        uffd_change_protection(rs->uffdio_fd, block->host, block->max_length,
-                false, false);
         uffd_unregister_memory(rs->uffdio_fd, block->host, block->max_length);
         /* Cleanup flags and remove reference */
         block->flags &= ~RAM_UF_WRITEPROTECT;
@@ -1949,9 +1943,6 @@ void ram_write_tracking_stop(void)
         if ((block->flags & RAM_UF_WRITEPROTECT) == 0) {
             continue;
         }
-        /* Remove protection and unregister all affected RAM blocks */
-        uffd_change_protection(rs->uffdio_fd, block->host, block->max_length,
-                false, false);
         uffd_unregister_memory(rs->uffdio_fd, block->host, block->max_length);
 
         trace_ram_write_tracking_ramblock_stop(block->idstr, block->page_size,
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 10/26] migration/ram: Rely on used_length for uffd_change_protection()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (8 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 09/26] migration/ram: Don't explicitly unprotect when unregistering uffd-wp Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 11/26] migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager Juan Quintela
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

ram_mig_ram_block_resized() will abort migration (including background
snapshots) when resizing a RAMBlock. ram_block_populate_read() will only
populate RAM up to used_length, so at least for anonymous memory
protecting everything between used_length and max_length won't
actually be protected and is just a NOP.

So let's only protect everything up to used_length.

Note: it still makes sense to register uffd-wp for max_length, such
that RAM_UF_WRITEPROTECT is independent of a changing used_length.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index efaae07dd8..a6956c9e7d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1901,7 +1901,7 @@ int ram_write_tracking_start(void)
 
         /* Apply UFFD write protection to the block memory range */
         if (uffd_change_protection(rs->uffdio_fd, block->host,
-                block->max_length, true, false)) {
+                                   block->used_length, true, false)) {
             goto fail;
         }
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 11/26] migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (9 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 10/26] migration/ram: Rely on used_length for uffd_change_protection() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 12/26] migration/savevm: Move more savevm handling into vmstate_save() Juan Quintela
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

ram_block_populate_read() already optimizes for RamDiscardManager.
However, ram_write_tracking_start() will still try protecting discarded
memory ranges.

Let's optimize, because discarded ranges don't map any pages and

(1) For anonymous memory, trying to protect using uffd-wp without a mapped
    page is ignored by the kernel and consequently a NOP.

(2) For shared/file-backed memory, we will fill present page tables in the
    range with PTE markers. However, we will even allocate page tables
    just to fill them with unnecessary PTE markers and effectively
    waste memory.

So let's exclude these ranges, just like ram_block_populate_read()
already does.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a6956c9e7d..7f6d5efe8d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1865,6 +1865,39 @@ void ram_write_tracking_prepare(void)
     }
 }
 
+static inline int uffd_protect_section(MemoryRegionSection *section,
+                                       void *opaque)
+{
+    const hwaddr size = int128_get64(section->size);
+    const hwaddr offset = section->offset_within_region;
+    RAMBlock *rb = section->mr->ram_block;
+    int uffd_fd = (uintptr_t)opaque;
+
+    return uffd_change_protection(uffd_fd, rb->host + offset, size, true,
+                                  false);
+}
+
+static int ram_block_uffd_protect(RAMBlock *rb, int uffd_fd)
+{
+    assert(rb->flags & RAM_UF_WRITEPROTECT);
+
+    /* See ram_block_populate_read() */
+    if (rb->mr && memory_region_has_ram_discard_manager(rb->mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(rb->mr);
+        MemoryRegionSection section = {
+            .mr = rb->mr,
+            .offset_within_region = 0,
+            .size = rb->mr->size,
+        };
+
+        return ram_discard_manager_replay_populated(rdm, &section,
+                                                    uffd_protect_section,
+                                                    (void *)(uintptr_t)uffd_fd);
+    }
+    return uffd_change_protection(uffd_fd, rb->host,
+                                  rb->used_length, true, false);
+}
+
 /*
  * ram_write_tracking_start: start UFFD-WP memory tracking
  *
@@ -1900,8 +1933,7 @@ int ram_write_tracking_start(void)
         memory_region_ref(block->mr);
 
         /* Apply UFFD write protection to the block memory range */
-        if (uffd_change_protection(rs->uffdio_fd, block->host,
-                                   block->used_length, true, false)) {
+        if (ram_block_uffd_protect(block, uffd_fd)) {
             goto fail;
         }
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 12/26] migration/savevm: Move more savevm handling into vmstate_save()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (10 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 11/26] migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 13/26] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() Juan Quintela
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

Let's move more code into vmstate_save(), reducing code duplication and
preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We
have to move vmstate_save() to make the compiler happy.

We'll now also trace from qemu_save_device_state(), triggering the same
tracepoints as previously called from
qemu_savevm_state_complete_precopy_non_iterable() only. Note that
qemu_save_device_state() ignores iterable device state, such as RAM,
and consequently doesn't trigger some other trace points (e.g.,
trace_savevm_state_setup()).

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/savevm.c | 79 ++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index e1caa3ea7c..3e3631652e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -898,17 +898,6 @@ static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se,
     }
 }
 
-static int vmstate_save(QEMUFile *f, SaveStateEntry *se,
-                        JSONWriter *vmdesc)
-{
-    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
-    if (!se->vmsd) {
-        vmstate_save_old_style(f, se, vmdesc);
-        return 0;
-    }
-    return vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
-}
-
 /*
  * Write the header for device section (QEMU_VM_SECTION START/END/PART/FULL)
  */
@@ -942,6 +931,43 @@ static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
     }
 }
 
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
+{
+    int ret;
+
+    if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
+        return 0;
+    }
+    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+        trace_savevm_section_skip(se->idstr, se->section_id);
+        return 0;
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+    if (vmdesc) {
+        json_writer_start_object(vmdesc, NULL);
+        json_writer_str(vmdesc, "name", se->idstr);
+        json_writer_int64(vmdesc, "instance_id", se->instance_id);
+    }
+
+    trace_vmstate_save(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
+    if (!se->vmsd) {
+        vmstate_save_old_style(f, se, vmdesc);
+    } else {
+        ret = vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    trace_savevm_section_end(se->idstr, se->section_id, 0);
+    save_section_footer(f, se);
+    if (vmdesc) {
+        json_writer_end_object(vmdesc);
+    }
+    return 0;
+}
 /**
  * qemu_savevm_command_send: Send a 'QEMU_VM_COMMAND' type element with the
  *                           command and associated data.
@@ -1375,31 +1401,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
     json_writer_start_array(vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-
-        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
-            continue;
-        }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            trace_savevm_section_skip(se->idstr, se->section_id);
-            continue;
-        }
-
-        trace_savevm_section_start(se->idstr, se->section_id);
-
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
             return ret;
         }
-        trace_savevm_section_end(se->idstr, se->section_id, 0);
-        save_section_footer(f, se);
-
-        json_writer_end_object(vmdesc);
     }
 
     if (inactivate_disks) {
@@ -1618,21 +1624,10 @@ int qemu_save_device_state(QEMUFile *f)
         if (se->is_ram) {
             continue;
         }
-        if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
-            continue;
-        }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            continue;
-        }
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
-
         ret = vmstate_save(f, se, NULL);
         if (ret) {
             return ret;
         }
-
-        save_section_footer(f, se);
     }
 
     qemu_put_byte(f, QEMU_VM_EOF);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 13/26] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (11 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 12/26] migration/savevm: Move more savevm handling into vmstate_save() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 14/26] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) Juan Quintela
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

... and store it in the migration state. This is a preparation for
storing selected vmds's already in qemu_savevm_state_setup().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.h |  4 ++++
 migration/migration.c |  2 ++
 migration/savevm.c    | 18 ++++++++++++------
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+    JSONWriter *vmdesc;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/migration.c b/migration/migration.c
index cb9aee76c0..3344c95d26 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1903,6 +1903,8 @@ static void migrate_fd_cleanup(MigrationState *s)
 
     g_free(s->hostname);
     s->hostname = NULL;
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;
 
     qemu_savevm_state_cleanup();
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 3e3631652e..28f88b5521 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
 #include "postcopy-ram.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
@@ -1190,10 +1189,16 @@ bool qemu_savevm_state_guest_unplug_pending(void)
 
 void qemu_savevm_state_setup(QEMUFile *f)
 {
+    MigrationState *ms = migrate_get_current();
     SaveStateEntry *se;
     Error *local_err = NULL;
     int ret;
 
+    ms->vmdesc = json_writer_new(false);
+    json_writer_start_object(ms->vmdesc, NULL);
+    json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(ms->vmdesc, "devices");
+
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_setup) {
@@ -1391,15 +1396,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
@@ -1434,6 +1436,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
     }
 
+    /* Free it now to detect any inconsistencies. */
+    json_writer_free(vmdesc);
+    ms->vmdesc = NULL;
+
     return 0;
 }
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 14/26] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (12 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 13/26] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 15/26] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() Juan Quintela
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content, and perform
sanity checks before touching anything on the destination. This
information is immutable on the migration source while migration is active,

We want to use this information for proper preallocation support with
migration: currently, we don't preallocate memory on the migration target,
and especially with hugetlb, we can easily run out of hugetlb pages during
RAM migration and will crash (SIGBUS) instead of catching this gracefully
via preallocation.

Migrating device state via a VMSD before we start iterating is currently
impossible: the only approach that would be possible is avoiding a VMSD
and migrating state manually during save_setup(), to be restored during
load_state().

Let's allow for migrating device state via a VMSD early, during the
setup phase in qemu_savevm_state_setup(). To keep it simple, we
indicate applicable VMSD's using an "early_setup" flag.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of such early state migration.

While at it, also use a bool for the "unmigratable" member.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/vmstate.h | 16 +++++++++++++++-
 migration/savevm.c          | 14 ++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..64680d824e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -178,7 +178,21 @@ struct VMStateField {
 
 struct VMStateDescription {
     const char *name;
-    int unmigratable;
+    bool unmigratable;
+    /*
+     * This VMSD describes something that should be sent during setup phase
+     * of migration. It plays similar role as save_setup() for explicitly
+     * registered vmstate entries, so it can be seen as a way to describe
+     * save_setup() in VMSD structures.
+     *
+     * Note that for now, a SaveStateEntry cannot have a VMSD and
+     * operations (e.g., save_setup()) set at the same time. Consequently,
+     * save_setup() and a VMSD with early_setup set to true are mutually
+     * exclusive. For this reason, also early_setup VMSDs are migrated in a
+     * QEMU_VM_SECTION_FULL section, while save_setup() data is migrated in
+     * a QEMU_VM_SECTION_START section.
+     */
+    bool early_setup;
     int version_id;
     int minimum_version_id;
     MigrationPriority priority;
diff --git a/migration/savevm.c b/migration/savevm.c
index 28f88b5521..6d985ad4af 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1201,6 +1201,15 @@ void qemu_savevm_state_setup(QEMUFile *f)
 
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->early_setup) {
+            ret = vmstate_save(f, se, ms->vmdesc);
+            if (ret) {
+                qemu_file_set_error(f, ret);
+                break;
+            }
+            continue;
+        }
+
         if (!se->ops || !se->ops->save_setup) {
             continue;
         }
@@ -1403,6 +1412,11 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->early_setup) {
+            /* Already saved during qemu_savevm_state_setup(). */
+            continue;
+        }
+
         ret = vmstate_save(f, se, vmdesc);
         if (ret) {
             qemu_file_set_error(f, ret);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 15/26] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (13 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 14/26] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 16/26] migration/ram: Factor out check for advised postcopy Juan Quintela
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

We'll make use of both next in the context of virtio-mem.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/vmstate.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 64680d824e..28a3b92aa1 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -719,8 +719,9 @@ extern const VMStateInfo vmstate_info_qlist;
  *        '_state' type
  *    That the pointer is right at the start of _tmp_type.
  */
-#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) {                 \
+#define VMSTATE_WITH_TMP_TEST(_state, _test, _tmp_type, _vmsd) {     \
     .name         = "tmp",                                           \
+    .field_exists = (_test),                                         \
     .size         = sizeof(_tmp_type) +                              \
                     QEMU_BUILD_BUG_ON_ZERO(offsetof(_tmp_type, parent) != 0) + \
                     type_check_pointer(_state,                       \
@@ -729,6 +730,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .info         = &vmstate_info_tmp,                               \
 }
 
+#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) \
+    VMSTATE_WITH_TMP_TEST(_state, NULL, _tmp_type, _vmsd)
+
 #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {              \
     .name         = "unused",                                        \
     .field_exists = (_test),                                         \
@@ -752,8 +756,9 @@ extern const VMStateInfo vmstate_info_qlist;
 /* _field_size should be a int32_t field in the _state struct giving the
  * size of the bitmap _field in bits.
  */
-#define VMSTATE_BITMAP(_field, _state, _version, _field_size) {      \
+#define VMSTATE_BITMAP_TEST(_field, _state, _test, _version, _field_size) { \
     .name         = (stringify(_field)),                             \
+    .field_exists = (_test),                                         \
     .version_id   = (_version),                                      \
     .size_offset  = vmstate_offset_value(_state, _field_size, int32_t),\
     .info         = &vmstate_info_bitmap,                            \
@@ -761,6 +766,9 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset       = offsetof(_state, _field),                        \
 }
 
+#define VMSTATE_BITMAP(_field, _state, _version, _field_size) \
+    VMSTATE_BITMAP_TEST(_field, _state, NULL, _version, _field_size)
+
 /* For migrating a QTAILQ.
  * Target QTAILQ needs be properly initialized.
  * _type: type of QTAILQ element
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 16/26] migration/ram: Factor out check for advised postcopy
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (14 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 15/26] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 17/26] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Juan Quintela
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

Let's factor out this check, to be used in virtio-mem context next.

While at it, fix a spelling error in a related comment.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/misc.h | 4 +++-
 migration/migration.c    | 7 +++++++
 migration/ram.c          | 8 +-------
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 465906710d..8b49841016 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -67,8 +67,10 @@ bool migration_has_failed(MigrationState *);
 /* ...and after the device transmission */
 bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
-/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */
+/* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */
 bool migration_in_incoming_postcopy(void);
+/* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */
+bool migration_incoming_postcopy_advised(void);
 /* True if background snapshot is active */
 bool migration_in_bg_snapshot(void);
 
diff --git a/migration/migration.c b/migration/migration.c
index 3344c95d26..73225064e1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2096,6 +2096,13 @@ bool migration_in_incoming_postcopy(void)
     return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END;
 }
 
+bool migration_incoming_postcopy_advised(void)
+{
+    PostcopyState ps = postcopy_state_get();
+
+    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
+}
+
 bool migration_in_bg_snapshot(void)
 {
     MigrationState *s = migrate_get_current();
diff --git a/migration/ram.c b/migration/ram.c
index 7f6d5efe8d..b966e148c2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4150,12 +4150,6 @@ int ram_load_postcopy(QEMUFile *f, int channel)
     return ret;
 }
 
-static bool postcopy_is_advised(void)
-{
-    PostcopyState ps = postcopy_state_get();
-    return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END;
-}
-
 static bool postcopy_is_running(void)
 {
     PostcopyState ps = postcopy_state_get();
@@ -4226,7 +4220,7 @@ static int ram_load_precopy(QEMUFile *f)
     MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
-    bool postcopy_advised = postcopy_is_advised();
+    bool postcopy_advised = migration_incoming_postcopy_advised();
     if (!migrate_use_compression()) {
         invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
     }
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 17/26] virtio-mem: Fail if a memory backend with "prealloc=on" is specified
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (15 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 16/26] migration/ram: Factor out check for advised postcopy Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 18/26] virtio-mem: Migrate immutable properties early Juan Quintela
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Michal Privoznik, Peter Xu

From: David Hildenbrand <david@redhat.com>

"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.

Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.

Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 hw/virtio/virtio-mem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 1ed1f5a4af..02f7b5469a 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -772,6 +772,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "'%s' property specifies an unsupported memdev",
                    VIRTIO_MEM_MEMDEV_PROP);
         return;
+    } else if (vmem->memdev->prealloc) {
+        error_setg(errp, "'%s' property specifies a memdev with preallocation"
+                   " enabled: %s. Instead, specify 'prealloc=on' for the"
+                   " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP,
+                   object_get_canonical_path_component(OBJECT(vmem->memdev)));
+        return;
     }
 
     if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) ||
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 18/26] virtio-mem: Migrate immutable properties early
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (16 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 17/26] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 19/26] virtio-mem: Proper support for preallocation with migration Juan Quintela
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu

From: David Hildenbrand <david@redhat.com>

The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.

Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.

In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.

Use a new device property to keep behavior of compat machines
unmodified.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/hw/virtio/virtio-mem.h |  8 ++++++
 hw/core/machine.c              |  4 ++-
 hw/virtio/virtio-mem.c         | 51 ++++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 7745cfc1a3..f15e561785 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass,
 #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
 #define VIRTIO_MEM_ADDR_PROP "memaddr"
 #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible"
+#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration"
 #define VIRTIO_MEM_PREALLOC_PROP "prealloc"
 
 struct VirtIOMEM {
@@ -74,6 +75,13 @@ struct VirtIOMEM {
     /* whether to prealloc memory when plugging new blocks */
     bool prealloc;
 
+    /*
+     * Whether we migrate properties that are immutable while migration is
+     * active early, before state of other devices and especially, before
+     * migrating any RAM content.
+     */
+    bool early_migration;
+
     /* notifiers to notify when "size" changes */
     NotifierList size_change_notifiers;
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index f7761baab5..b5cd42cd8c 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,7 +41,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_2[] = {};
+GlobalProperty hw_compat_7_2[] = {
+    { "virtio-mem", "x-early-migration", "false" },
+};
 const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2);
 
 GlobalProperty hw_compat_7_1[] = {
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 02f7b5469a..ca37949df8 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -31,6 +31,8 @@
 #include CONFIG_DEVICES
 #include "trace.h"
 
+static const VMStateDescription vmstate_virtio_mem_device_early;
+
 /*
  * We only had legacy x86 guests that did not support
  * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests.
@@ -878,6 +880,10 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
 
     host_memory_backend_set_mapped(vmem->memdev, true);
     vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
+    if (vmem->early_migration) {
+        vmstate_register(VMSTATE_IF(vmem), VMSTATE_INSTANCE_ID_ANY,
+                         &vmstate_virtio_mem_device_early, vmem);
+    }
     qemu_register_reset(virtio_mem_system_reset, vmem);
 
     /*
@@ -899,6 +905,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
      */
     memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
     qemu_unregister_reset(virtio_mem_system_reset, vmem);
+    if (vmem->early_migration) {
+        vmstate_unregister(VMSTATE_IF(vmem), &vmstate_virtio_mem_device_early,
+                           vmem);
+    }
     vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
     host_memory_backend_set_mapped(vmem->memdev, false);
     virtio_del_queue(vdev, 0);
@@ -1015,18 +1025,53 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = {
     },
 };
 
+static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+
+    /* With early migration, these fields were already migrated. */
+    return !vmem->early_migration;
+}
+
 static const VMStateDescription vmstate_virtio_mem_device = {
     .name = "virtio-mem-device",
     .minimum_version_id = 1,
     .version_id = 1,
     .priority = MIG_PRI_VIRTIO_MEM,
     .post_load = virtio_mem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists,
+                              VirtIOMEMMigSanityChecks,
+                              vmstate_virtio_mem_sanity_checks),
+        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
+        VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists),
+        VMSTATE_UINT64(requested_size, VirtIOMEM),
+        VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists,
+                            0, bitmap_size),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+/*
+ * Transfer properties that are immutable while migration is active early,
+ * such that we have have this information around before migrating any RAM
+ * content.
+ *
+ * Note that virtio_mem_is_busy() makes sure these properties can no longer
+ * change on the migration source until migration completed.
+ *
+ * With QEMU compat machines, we transmit these properties later, via
+ * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists().
+ */
+static const VMStateDescription vmstate_virtio_mem_device_early = {
+    .name = "virtio-mem-device-early",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .early_setup = true,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-        VMSTATE_UINT64(usable_region_size, VirtIOMEM),
         VMSTATE_UINT64(size, VirtIOMEM),
-        VMSTATE_UINT64(requested_size, VirtIOMEM),
         VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size),
         VMSTATE_END_OF_LIST()
     },
@@ -1211,6 +1256,8 @@ static Property virtio_mem_properties[] = {
     DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM,
                             unplugged_inaccessible, ON_OFF_AUTO_AUTO),
 #endif
+    DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM,
+                     early_migration, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 19/26] virtio-mem: Proper support for preallocation with migration
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (17 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 18/26] virtio-mem: Migrate immutable properties early Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 20/26] migration: Show downtime during postcopy phase Juan Quintela
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Jing Qi, Peter Xu

From: David Hildenbrand <david@redhat.com>

Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.

This is especially relevant for the following cases:

(1) User errors

With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.

(2) Excluded memory ranges during migration

For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.

To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.

Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.

Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.

Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:

  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
  qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
  qemu-system-x86_64: load of migration failed: Cannot allocate memory

And we can even introspect the early migration data, including the
bitmap:
  $ ./scripts/analyze-migration.py -f STATEFILE
  {
  "ram (2)": {
      "section sizes": {
          "0000:00:03.0/mem0": "0x0000000780000000",
          "0000:00:04.0/mem1": "0x0000000780000000",
          "pc.ram": "0x0000000100000000",
          "/rom@etc/acpi/tables": "0x0000000000020000",
          "pc.bios": "0x0000000000040000",
          "0000:00:02.0/e1000.rom": "0x0000000000040000",
          "pc.rom": "0x0000000000020000",
          "/rom@etc/table-loader": "0x0000000000001000",
          "/rom@etc/acpi/rsdp": "0x0000000000001000"
      }
  },
  "0000:00:03.0/virtio-mem-device-early (51)": {
      "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x0000000040000000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  "0000:00:04.0/virtio-mem-device-early (53)": {
      "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x00000001fa400000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  [...]

Reported-by: Jing Qi <jinqi@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 hw/virtio/virtio-mem.c | 87 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ca37949df8..957fe77dc0 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
+static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg,
+                                             virtio_mem_range_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size);
+    while (first_bit < vmem->bitmap_size) {
+        offset = first_bit * vmem->block_size;
+        last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * vmem->block_size;
+
+        ret = cb(vmem, arg, offset, size);
+        if (ret) {
+            break;
+        }
+        first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                  last_bit + 2);
+    }
+    return ret;
+}
+
 /*
  * Adjust the memory section to cover the intersection with the given range.
  *
@@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     RamDiscardListener *rdl;
     int ret;
 
+    if (vmem->prealloc && !vmem->early_migration) {
+        warn_report("Proper preallocation with migration requires a newer QEMU machine");
+    }
+
     /*
      * We started out with all memory discarded and our memory region is mapped
      * into an address space. Replay, now that we updated the bitmap.
@@ -957,6 +985,64 @@ static int virtio_mem_post_load(void *opaque, int version_id)
     return virtio_mem_restore_unplugged(vmem);
 }
 
+static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg,
+                                        uint64_t offset, uint64_t size)
+{
+    void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset;
+    int fd = memory_region_get_fd(&vmem->memdev->mr);
+    Error *local_err = NULL;
+
+    qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -ENOMEM;
+    }
+    return 0;
+}
+
+static int virtio_mem_post_load_early(void *opaque, int version_id)
+{
+    VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+    RAMBlock *rb = vmem->memdev->mr.ram_block;
+    int ret;
+
+    if (!vmem->prealloc) {
+        return 0;
+    }
+
+    /*
+     * We restored the bitmap and verified that the basic properties
+     * match on source and destination, so we can go ahead and preallocate
+     * memory for all plugged memory blocks, before actual RAM migration starts
+     * touching this memory.
+     */
+    ret = virtio_mem_for_each_plugged_range(vmem, NULL,
+                                            virtio_mem_prealloc_range_cb);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * This is tricky: postcopy wants to start with a clean slate. On
+     * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily
+     * preallocated) RAM such that postcopy will work as expected later.
+     *
+     * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual
+     * RAM migration. So let's discard all memory again. This looks like an
+     * expensive NOP, but actually serves a purpose: we made sure that we
+     * were able to allocate all required backend memory once. We cannot
+     * guarantee that the backend memory we will free will remain free
+     * until we need it during postcopy, but at least we can catch the
+     * obvious setup issues this way.
+     */
+    if (migration_incoming_postcopy_advised()) {
+        if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
+            return -EBUSY;
+        }
+    }
+    return 0;
+}
+
 typedef struct VirtIOMEMMigSanityChecks {
     VirtIOMEM *parent;
     uint64_t addr;
@@ -1068,6 +1154,7 @@ static const VMStateDescription vmstate_virtio_mem_device_early = {
     .minimum_version_id = 1,
     .version_id = 1,
     .early_setup = true,
+    .post_load = virtio_mem_post_load_early,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
                          vmstate_virtio_mem_sanity_checks),
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 20/26] migration: Show downtime during postcopy phase
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (18 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 19/26] virtio-mem: Proper support for preallocation with migration Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 21/26] migration/rdma: fix return value for qio_channel_rdma_{readv, writev} Juan Quintela
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Xu, Leonardo Bras

From: Peter Xu <peterx@redhat.com>

The downtime should be displayed during postcopy phase because the
switchover phase is done.  OTOH it's weird to show "expected downtime"
which can confuse what does that mean if the switchover has already
happened anyway.

This is a slight ABI change on QMP, but I assume it shouldn't affect
anyone.

Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 73225064e1..6509203080 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1021,20 +1021,30 @@ bool migration_is_running(int state)
     }
 }
 
+static bool migrate_show_downtime(MigrationState *s)
+{
+    return (s->state == MIGRATION_STATUS_COMPLETED) || migration_in_postcopy();
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
     info->has_status = true;
     info->has_setup_time = true;
     info->setup_time = s->setup_time;
+
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         info->has_total_time = true;
         info->total_time = s->total_time;
-        info->has_downtime = true;
-        info->downtime = s->downtime;
     } else {
         info->has_total_time = true;
         info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
                            s->start_time;
+    }
+
+    if (migrate_show_downtime(s)) {
+        info->has_downtime = true;
+        info->downtime = s->downtime;
+    } else {
         info->has_expected_downtime = true;
         info->expected_downtime = s->expected_downtime;
     }
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 21/26] migration/rdma: fix return value for qio_channel_rdma_{readv, writev}
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (19 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 20/26] migration: Show downtime during postcopy phase Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 22/26] migration: Add canary to VMSTATE_END_OF_LIST Juan Quintela
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Fiona Ebner, Zhang Chen

From: Fiona Ebner <f.ebner@proxmox.com>

upon errors. As the documentation in include/io/channel.h states, only
-1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other
values have the potential to confuse the call sites.

error_setg is used rather than error_setg_errno, because there are
certain code paths where -1 (as a non-errno) is propagated up (e.g.
starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control)
all the way to qio_channel_rdma_{readv,writev}.

Similar to a216ec85b7 ("migration/channel-block: fix return value for
qio_channel_block_{readv,writev}").

Suggested-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/rdma.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 94a55dd95b..0ba1668d70 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2785,7 +2785,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     rdma = qatomic_rcu_read(&rioc->rdmaout);
 
     if (!rdma) {
-        return -EIO;
+        error_setg(errp, "RDMA control channel output is not set");
+        return -1;
     }
 
     CHECK_ERROR_STATE();
@@ -2797,7 +2798,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
-        return ret;
+        error_setg(errp, "qemu_rdma_write_flush returned %d", ret);
+        return -1;
     }
 
     for (i = 0; i < niov; i++) {
@@ -2816,7 +2818,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
-                return ret;
+                error_setg(errp, "qemu_rdma_exchange_send returned %d", ret);
+                return -1;
             }
 
             data += len;
@@ -2867,7 +2870,8 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        error_setg(errp, "RDMA control channel input is not set");
+        return -1;
     }
 
     CHECK_ERROR_STATE();
@@ -2903,7 +2907,8 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
-            return ret;
+            error_setg(errp, "qemu_rdma_exchange_recv returned %d", ret);
+            return -1;
         }
 
         /*
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 22/26] migration: Add canary to VMSTATE_END_OF_LIST
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (20 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 21/26] migration/rdma: fix return value for qio_channel_rdma_{readv, writev} Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 23/26] migration: Perform vmsd structure check during tests Juan Quintela
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Maydell,
	Philippe Mathieu-Daudé

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions;
given that the current check is only for ->name being NULL, sometimes
we get unlucky and the code apparently works and no one spots the error.

Explicitly add a flag, VMS_END that should be set, and assert it is
set during the traversal.

Note: This can't go in until we update the copy of vmstate.h in slirp.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/vmstate.h | 7 ++++++-
 migration/savevm.c          | 1 +
 migration/vmstate.c         | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 28a3b92aa1..084f5e784a 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -147,6 +147,9 @@ enum VMStateFlags {
      * VMStateField.struct_version_id to tell which version of the
      * structure we are referencing to use. */
     VMS_VSTRUCT           = 0x8000,
+
+    /* Marker for end of list */
+    VMS_END = 0x10000
 };
 
 typedef enum {
@@ -1183,7 +1186,9 @@ extern const VMStateInfo vmstate_info_qlist;
     VMSTATE_UNUSED_BUFFER(_test, 0, _size)
 
 #define VMSTATE_END_OF_LIST()                                         \
-    {}
+    {                     \
+        .flags = VMS_END, \
+    }
 
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                        void *opaque, int version_id);
diff --git a/migration/savevm.c b/migration/savevm.c
index 6d985ad4af..5c3e5b1bb5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -585,6 +585,7 @@ static void dump_vmstate_vmsd(FILE *out_file,
             field++;
             first = false;
         }
+        assert(field->flags == VMS_END);
         fprintf(out_file, "\n%*s]", indent, "");
     }
     if (vmsd->subsections != NULL) {
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 924494bda3..83ca4c7d3e 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -154,6 +154,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
         }
         field++;
     }
+    assert(field->flags == VMS_END);
     ret = vmstate_subsection_load(f, vmsd, opaque);
     if (ret != 0) {
         return ret;
@@ -408,6 +409,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
         }
         field++;
     }
+    assert(field->flags == VMS_END);
 
     if (vmdesc) {
         json_writer_end_array(vmdesc);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 23/26] migration: Perform vmsd structure check during tests
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (21 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 22/26] migration: Add canary to VMSTATE_END_OF_LIST Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 24/26] migration/dirtyrate: Show sample pages only in page-sampling mode Juan Quintela
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Peter Maydell

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Perform a check on vmsd structures during test runs in the hope
of catching any missing terminators and other simple screwups.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/savevm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 5c3e5b1bb5..e9cf4999ad 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -66,6 +66,7 @@
 #include "net/announce.h"
 #include "qemu/yank.h"
 #include "yank_functions.h"
+#include "sysemu/qtest.h"
 
 const unsigned int postcopy_ram_discard_version;
 
@@ -804,6 +805,42 @@ void unregister_savevm(VMStateIf *obj, const char *idstr, void *opaque)
     }
 }
 
+/*
+ * Perform some basic checks on vmsd's at registration
+ * time.
+ */
+static void vmstate_check(const VMStateDescription *vmsd)
+{
+    const VMStateField *field = vmsd->fields;
+    const VMStateDescription **subsection = vmsd->subsections;
+
+    if (field) {
+        while (field->name) {
+            if (field->flags & (VMS_STRUCT | VMS_VSTRUCT)) {
+                /* Recurse to sub structures */
+                vmstate_check(field->vmsd);
+            }
+            /* Carry on */
+            field++;
+        }
+        /* Check for the end of field list canary */
+        if (field->flags != VMS_END) {
+            error_report("VMSTATE not ending with VMS_END: %s", vmsd->name);
+            g_assert_not_reached();
+        }
+    }
+
+    while (subsection && *subsection) {
+        /*
+         * The name of a subsection should start with the name of the
+         * current object.
+         */
+        assert(!strncmp(vmsd->name, (*subsection)->name, strlen(vmsd->name)));
+        vmstate_check(*subsection);
+        subsection++;
+    }
+}
+
 int vmstate_register_with_alias_id(VMStateIf *obj, uint32_t instance_id,
                                    const VMStateDescription *vmsd,
                                    void *opaque, int alias_id,
@@ -849,6 +886,11 @@ int vmstate_register_with_alias_id(VMStateIf *obj, uint32_t instance_id,
     } else {
         se->instance_id = instance_id;
     }
+
+    /* Perform a recursive sanity check during the test runs */
+    if (qtest_enabled()) {
+        vmstate_check(vmsd);
+    }
     assert(!se->compat || se->instance_id == 0);
     savevm_state_handler_insert(se);
     return 0;
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 24/26] migration/dirtyrate: Show sample pages only in page-sampling mode
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (22 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 23/26] migration: Perform vmsd structure check during tests Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 25/26] io: Add support for MSG_PEEK for socket channel Juan Quintela
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, Zhenzhong Duan, Peter Xu

From: Zhenzhong Duan <zhenzhong.duan@intel.com>

The value of "Sample Pages" is confusing in mode other than page-sampling.
See below:

(qemu) calc_dirty_rate -b 10 520
(qemu) info dirty_rate
Status: measuring
Start Time: 11646834 (ms)
Sample Pages: 520 (per GB)
Period: 10 (sec)
Mode: dirty-bitmap
Dirty rate: (not ready)

(qemu) info dirty_rate
Status: measured
Start Time: 11646834 (ms)
Sample Pages: 0 (per GB)
Period: 10 (sec)
Mode: dirty-bitmap
Dirty rate: 2 (MB/s)

While it's totally useless in dirty-ring and dirty-bitmap mode, fix to
show it only in page-sampling mode.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/dirtyrate.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 4bfb97fc68..575d48c397 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -714,8 +714,8 @@ void qmp_calc_dirty_rate(int64_t calc_time,
         mode =  DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
     }
 
-    if (has_sample_pages && mode == DIRTY_RATE_MEASURE_MODE_DIRTY_RING) {
-        error_setg(errp, "either sample-pages or dirty-ring can be specified.");
+    if (has_sample_pages && mode != DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING) {
+        error_setg(errp, "sample-pages is used only in page-sampling mode");
         return;
     }
 
@@ -785,8 +785,10 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
                    DirtyRateStatus_str(info->status));
     monitor_printf(mon, "Start Time: %"PRIi64" (ms)\n",
                    info->start_time);
-    monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
-                   info->sample_pages);
+    if (info->mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING) {
+        monitor_printf(mon, "Sample Pages: %"PRIu64" (per GB)\n",
+                       info->sample_pages);
+    }
     monitor_printf(mon, "Period: %"PRIi64" (sec)\n",
                    info->calc_time);
     monitor_printf(mon, "Mode: %s\n",
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 25/26] io: Add support for MSG_PEEK for socket channel
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (23 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 24/26] migration/dirtyrate: Show sample pages only in page-sampling mode Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-02 16:06 ` [PULL 26/26] migration: check magic value for deciding the mapping of channels Juan Quintela
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, manish.mishra, Peter Xu

From: "manish.mishra" <manish.mishra@nutanix.com>

MSG_PEEK peeks at the channel, The data is treated as unread and
the next read shall still return this data. This support is
currently added only for socket class. Extra parameter 'flags'
is added to io_readv calls to pass extra read flags like MSG_PEEK.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/io/channel.h                |  6 ++++++
 chardev/char-socket.c               |  4 ++--
 io/channel-buffer.c                 |  1 +
 io/channel-command.c                |  1 +
 io/channel-file.c                   |  1 +
 io/channel-null.c                   |  1 +
 io/channel-socket.c                 | 19 ++++++++++++++++++-
 io/channel-tls.c                    |  1 +
 io/channel-websock.c                |  1 +
 io/channel.c                        | 16 ++++++++++++----
 migration/channel-block.c           |  1 +
 migration/rdma.c                    |  1 +
 scsi/qemu-pr-helper.c               |  2 +-
 tests/qtest/tpm-emu.c               |  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 util/vhost-user-server.c            |  2 +-
 16 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 78b15f7870..153fbd2904 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -34,6 +34,8 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
 
+#define QIO_CHANNEL_READ_FLAG_MSG_PEEK 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
@@ -41,6 +43,7 @@ enum QIOChannelFeature {
     QIO_CHANNEL_FEATURE_SHUTDOWN,
     QIO_CHANNEL_FEATURE_LISTEN,
     QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
+    QIO_CHANNEL_FEATURE_READ_MSG_PEEK,
 };
 
 
@@ -114,6 +117,7 @@ struct QIOChannelClass {
                         size_t niov,
                         int **fds,
                         size_t *nfds,
+                        int flags,
                         Error **errp);
     int (*io_close)(QIOChannel *ioc,
                     Error **errp);
@@ -188,6 +192,7 @@ void qio_channel_set_name(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: pointer to an array that will received file handles
  * @nfds: pointer filled with number of elements in @fds on return
+ * @flags: read flags (QIO_CHANNEL_READ_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Read data from the IO channel, storing it in the
@@ -224,6 +229,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
                                size_t niov,
                                int **fds,
                                size_t *nfds,
+                               int flags,
                                Error **errp);
 
 
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 29ffe5075e..c2265436ac 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -283,11 +283,11 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t len)
     if (qio_channel_has_feature(s->ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
         ret = qio_channel_readv_full(s->ioc, &iov, 1,
                                      &msgfds, &msgfds_num,
-                                     NULL);
+                                     0, NULL);
     } else {
         ret = qio_channel_readv_full(s->ioc, &iov, 1,
                                      NULL, NULL,
-                                     NULL);
+                                     0, NULL);
     }
 
     if (msgfds_num) {
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index bf52011be2..8096180f85 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -54,6 +54,7 @@ static ssize_t qio_channel_buffer_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index 74516252ba..e7edd091af 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -203,6 +203,7 @@ static ssize_t qio_channel_command_readv(QIOChannel *ioc,
                                          size_t niov,
                                          int **fds,
                                          size_t *nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index b67687c2aa..d76663e6ae 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -86,6 +86,7 @@ static ssize_t qio_channel_file_readv(QIOChannel *ioc,
                                       size_t niov,
                                       int **fds,
                                       size_t *nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-null.c b/io/channel-null.c
index 75e3781507..4fafdb770d 100644
--- a/io/channel-null.c
+++ b/io/channel-null.c
@@ -60,6 +60,7 @@ qio_channel_null_readv(QIOChannel *ioc,
                        size_t niov,
                        int **fds G_GNUC_UNUSED,
                        size_t *nfds G_GNUC_UNUSED,
+                       int flags,
                        Error **errp)
 {
     QIOChannelNull *nioc = QIO_CHANNEL_NULL(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b76dca9cc1..7aca84f61a 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -173,6 +173,9 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
     }
 #endif
 
+    qio_channel_set_feature(QIO_CHANNEL(ioc),
+                            QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+
     return 0;
 }
 
@@ -406,6 +409,9 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
     }
 #endif /* WIN32 */
 
+    qio_channel_set_feature(QIO_CHANNEL(cioc),
+                            QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
+
     trace_qio_channel_socket_accept_complete(ioc, cioc, cioc->fd);
     return cioc;
 
@@ -496,6 +502,7 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -517,6 +524,10 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
 
     }
 
+    if (flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) {
+        sflags |= MSG_PEEK;
+    }
+
  retry:
     ret = recvmsg(sioc->fd, &msg, sflags);
     if (ret < 0) {
@@ -624,11 +635,17 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
                                         size_t niov,
                                         int **fds,
                                         size_t *nfds,
+                                        int flags,
                                         Error **errp)
 {
     QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
     ssize_t done = 0;
     ssize_t i;
+    int sflags = 0;
+
+    if (flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) {
+        sflags |= MSG_PEEK;
+    }
 
     for (i = 0; i < niov; i++) {
         ssize_t ret;
@@ -636,7 +653,7 @@ static ssize_t qio_channel_socket_readv(QIOChannel *ioc,
         ret = recv(sioc->fd,
                    iov[i].iov_base,
                    iov[i].iov_len,
-                   0);
+                   sflags);
         if (ret < 0) {
             if (errno == EAGAIN) {
                 if (done) {
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 4ce890a538..c730cb8ec5 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -260,6 +260,7 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
                                      size_t niov,
                                      int **fds,
                                      size_t *nfds,
+                                     int flags,
                                      Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index fb4932ade7..a12acc27cf 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1081,6 +1081,7 @@ static ssize_t qio_channel_websock_readv(QIOChannel *ioc,
                                          size_t niov,
                                          int **fds,
                                          size_t *nfds,
+                                         int flags,
                                          Error **errp)
 {
     QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index 0640941ac5..a8c7f11649 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -52,6 +52,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
                                size_t niov,
                                int **fds,
                                size_t *nfds,
+                               int flags,
                                Error **errp)
 {
     QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
@@ -63,7 +64,14 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
         return -1;
     }
 
-    return klass->io_readv(ioc, iov, niov, fds, nfds, errp);
+    if ((flags & QIO_CHANNEL_READ_FLAG_MSG_PEEK) &&
+        !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+        error_setg_errno(errp, EINVAL,
+                         "Channel does not support peek read");
+        return -1;
+    }
+
+    return klass->io_readv(ioc, iov, niov, fds, nfds, flags, errp);
 }
 
 
@@ -146,7 +154,7 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
     while ((nlocal_iov > 0) || local_fds) {
         ssize_t len;
         len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
-                                     local_nfds, errp);
+                                     local_nfds, 0, errp);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_IN);
@@ -284,7 +292,7 @@ ssize_t qio_channel_readv(QIOChannel *ioc,
                           size_t niov,
                           Error **errp)
 {
-    return qio_channel_readv_full(ioc, iov, niov, NULL, NULL, errp);
+    return qio_channel_readv_full(ioc, iov, niov, NULL, NULL, 0, errp);
 }
 
 
@@ -303,7 +311,7 @@ ssize_t qio_channel_read(QIOChannel *ioc,
                          Error **errp)
 {
     struct iovec iov = { .iov_base = buf, .iov_len = buflen };
-    return qio_channel_readv_full(ioc, &iov, 1, NULL, NULL, errp);
+    return qio_channel_readv_full(ioc, &iov, 1, NULL, NULL, 0, errp);
 }
 
 
diff --git a/migration/channel-block.c b/migration/channel-block.c
index f4ab53acdb..b7374363c3 100644
--- a/migration/channel-block.c
+++ b/migration/channel-block.c
@@ -53,6 +53,7 @@ qio_channel_block_readv(QIOChannel *ioc,
                         size_t niov,
                         int **fds,
                         size_t *nfds,
+                        int flags,
                         Error **errp)
 {
     QIOChannelBlock *bioc = QIO_CHANNEL_BLOCK(ioc);
diff --git a/migration/rdma.c b/migration/rdma.c
index 0ba1668d70..288eadc2d2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2857,6 +2857,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       size_t niov,
                                       int **fds,
                                       size_t *nfds,
+                                      int flags,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index 196b78c00d..199227a556 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -614,7 +614,7 @@ static int coroutine_fn prh_read(PRHelperClient *client, void *buf, int sz,
         iov.iov_base = buf;
         iov.iov_len = sz;
         n_read = qio_channel_readv_full(QIO_CHANNEL(client->ioc), &iov, 1,
-                                        &fds, &nfds, errp);
+                                        &fds, &nfds, 0, errp);
 
         if (n_read == QIO_CHANNEL_ERR_BLOCK) {
             qio_channel_yield(QIO_CHANNEL(client->ioc), G_IO_IN);
diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
index 73e0000a2c..f05fe12f01 100644
--- a/tests/qtest/tpm-emu.c
+++ b/tests/qtest/tpm-emu.c
@@ -115,7 +115,7 @@ void *tpm_emu_ctrl_thread(void *data)
         int *pfd = NULL;
         size_t nfd = 0;
 
-        qio_channel_readv_full(ioc, &iov, 1, &pfd, &nfd, &error_abort);
+        qio_channel_readv_full(ioc, &iov, 1, &pfd, &nfd, 0, &error_abort);
         cmd = be32_to_cpu(cmd);
         g_assert_cmpint(cmd, ==, CMD_SET_DATAFD);
         g_assert_cmpint(nfd, ==, 1);
diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c
index b36a5d972a..b964bb202d 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -460,6 +460,7 @@ static void test_io_channel_unix_fd_pass(void)
                            G_N_ELEMENTS(iorecv),
                            &fdrecv,
                            &nfdrecv,
+                           0,
                            &error_abort);
 
     g_assert(nfdrecv == G_N_ELEMENTS(fdsend));
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 232984ace6..145eb17c08 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -116,7 +116,7 @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
          * qio_channel_readv_full may have short reads, keeping calling it
          * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
          */
-        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
+        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, 0, &local_err);
         if (rc < 0) {
             if (rc == QIO_CHANNEL_ERR_BLOCK) {
                 assert(local_err == NULL);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PULL 26/26] migration: check magic value for deciding the mapping of channels
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (24 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 25/26] io: Add support for MSG_PEEK for socket channel Juan Quintela
@ 2023-02-02 16:06 ` Juan Quintela
  2023-02-04 10:19 ` [PULL 00/26] Next patches Peter Maydell
  2023-02-07  0:49 ` Juan Quintela
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-02 16:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Juan Quintela, Fam Zheng, qemu-s390x, manish.mishra, Peter Xu

From: "manish.mishra" <manish.mishra@nutanix.com>

Current logic assumes that channel connections on the destination side are
always established in the same order as the source and the first one will
always be the main channel followed by the multifid or post-copy
preemption channel. This may not be always true, as even if a channel has a
connection established on the source side it can be in the pending state on
the destination side and a newer connection can be established first.
Basically causing out of order mapping of channels on the destination side.
Currently, all channels except post-copy preempt send a magic number, this
patch uses that magic number to decide the type of channel. This logic is
applicable only for precopy(multifd) live migration, as mentioned, the
post-copy preempt channel does not send any magic number. Also, tls live
migrations already does tls handshake before creating other channels, so
this issue is not possible with tls, hence this logic is avoided for tls
live migrations. This patch uses read peek to check the magic number of
channels so that current data/control stream management remains
un-effected.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/channel.h      |  5 ++++
 migration/multifd.h      |  2 +-
 migration/postcopy-ram.h |  2 +-
 migration/channel.c      | 45 +++++++++++++++++++++++++++++++++
 migration/migration.c    | 54 ++++++++++++++++++++++++++++------------
 migration/multifd.c      | 19 +++++++-------
 migration/postcopy-ram.c |  5 +---
 7 files changed, 101 insertions(+), 31 deletions(-)

diff --git a/migration/channel.h b/migration/channel.h
index 67a461c28a..5bdb8208a7 100644
--- a/migration/channel.h
+++ b/migration/channel.h
@@ -24,4 +24,9 @@ void migration_channel_connect(MigrationState *s,
                                QIOChannel *ioc,
                                const char *hostname,
                                Error *error_in);
+
+int migration_channel_read_peek(QIOChannel *ioc,
+                                const char *buf,
+                                const size_t buflen,
+                                Error **errp);
 #endif
diff --git a/migration/multifd.h b/migration/multifd.h
index e2802a9ce2..ff3aa2e2e9 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -18,7 +18,7 @@ void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
+void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 6147bf7d1d..25881c4127 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -190,7 +190,7 @@ enum PostcopyChannels {
     RAM_CHANNEL_MAX,
 };
 
-bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
+void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
 int postcopy_preempt_setup(MigrationState *s, Error **errp);
 int postcopy_preempt_wait_channel(MigrationState *s);
 
diff --git a/migration/channel.c b/migration/channel.c
index 1b0815039f..ca3319a309 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -92,3 +92,48 @@ void migration_channel_connect(MigrationState *s,
     migrate_fd_connect(s, error);
     error_free(error);
 }
+
+
+/**
+ * @migration_channel_read_peek - Peek at migration channel, without
+ *     actually removing it from channel buffer.
+ *
+ * @ioc: the channel object
+ * @buf: the memory region to read data into
+ * @buflen: the number of bytes to read in @buf
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Returns 0 if successful, returns -1 and sets @errp if fails.
+ */
+int migration_channel_read_peek(QIOChannel *ioc,
+                                const char *buf,
+                                const size_t buflen,
+                                Error **errp)
+{
+    ssize_t len = 0;
+    struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+
+    while (true) {
+        len = qio_channel_readv_full(ioc, &iov, 1, NULL, NULL,
+                                     QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
+
+        if (len <= 0 && len != QIO_CHANNEL_ERR_BLOCK) {
+            error_setg(errp,
+                       "Failed to peek at channel");
+            return -1;
+        }
+
+        if (len == buflen) {
+            break;
+        }
+
+        /* 1ms sleep. */
+        if (qemu_in_coroutine()) {
+            qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);
+        } else {
+            g_usleep(1000);
+        }
+    }
+
+    return 0;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 6509203080..f4f7d207f0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -31,6 +31,7 @@
 #include "migration.h"
 #include "savevm.h"
 #include "qemu-file.h"
+#include "channel.h"
 #include "migration/vmstate.h"
 #include "block/block.h"
 #include "qapi/error.h"
@@ -663,10 +664,6 @@ static bool migration_incoming_setup(QEMUFile *f, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
 
-    if (multifd_load_setup(errp) != 0) {
-        return false;
-    }
-
     if (!mis->from_src_file) {
         mis->from_src_file = f;
     }
@@ -733,31 +730,56 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
-    bool start_migration;
     QEMUFile *f;
+    bool default_channel = true;
+    uint32_t channel_magic = 0;
+    int ret = 0;
 
-    if (!mis->from_src_file) {
-        /* The first connection (multifd may have multiple) */
+    if (migrate_use_multifd() && !migrate_postcopy_ram() &&
+        qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+        /*
+         * With multiple channels, it is possible that we receive channels
+         * out of order on destination side, causing incorrect mapping of
+         * source channels on destination side. Check channel MAGIC to
+         * decide type of channel. Please note this is best effort, postcopy
+         * preempt channel does not send any magic number so avoid it for
+         * postcopy live migration. Also tls live migration already does
+         * tls handshake while initializing main channel so with tls this
+         * issue is not possible.
+         */
+        ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
+                                          sizeof(channel_magic), &local_err);
+
+        if (ret != 0) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+    } else {
+        default_channel = !mis->from_src_file;
+    }
+
+    if (multifd_load_setup(errp) != 0) {
+        error_setg(errp, "Failed to setup multifd channels");
+        return;
+    }
+
+    if (default_channel) {
         f = qemu_file_new_input(ioc);
 
         if (!migration_incoming_setup(f, errp)) {
             return;
         }
-
-        /*
-         * Common migration only needs one channel, so we can start
-         * right now.  Some features need more than one channel, we wait.
-         */
-        start_migration = !migration_needs_multiple_sockets();
     } else {
         /* Multiple connections */
         assert(migration_needs_multiple_sockets());
         if (migrate_use_multifd()) {
-            start_migration = multifd_recv_new_channel(ioc, &local_err);
+            multifd_recv_new_channel(ioc, &local_err);
         } else {
             assert(migrate_postcopy_preempt());
             f = qemu_file_new_input(ioc);
-            start_migration = postcopy_preempt_new_channel(mis, f);
+            postcopy_preempt_new_channel(mis, f);
         }
         if (local_err) {
             error_propagate(errp, local_err);
@@ -765,7 +787,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         }
     }
 
-    if (start_migration) {
+    if (migration_has_all_channels()) {
         /* If it's a recovery, we're done */
         if (postcopy_try_recover()) {
             return;
diff --git a/migration/multifd.c b/migration/multifd.c
index 000ca4d4ec..eeb4fb87ee 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1164,9 +1164,14 @@ int multifd_load_setup(Error **errp)
     uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
     uint8_t i;
 
-    if (!migrate_use_multifd()) {
+    /*
+     * Return successfully if multiFD recv state is already initialised
+     * or multiFD is not enabled.
+     */
+    if (multifd_recv_state || !migrate_use_multifd()) {
         return 0;
     }
+
     if (!migrate_multi_channels_is_allowed()) {
         error_setg(errp, "multifd is not supported by current protocol");
         return -1;
@@ -1227,11 +1232,9 @@ bool multifd_recv_all_channels_created(void)
 
 /*
  * Try to receive all multifd channels to get ready for the migration.
- * - Return true and do not set @errp when correctly receiving all channels;
- * - Return false and do not set @errp when correctly receiving the current one;
- * - Return false and set @errp when failing to receive the current channel.
+ * Sets @errp when failing to receive the current channel.
  */
-bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
+void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
     MultiFDRecvParams *p;
     Error *local_err = NULL;
@@ -1244,7 +1247,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
                                 "failed to receive packet"
                                 " via multifd channel %d: ",
                                 qatomic_read(&multifd_recv_state->count));
-        return false;
+        return;
     }
     trace_multifd_recv_new_channel(id);
 
@@ -1254,7 +1257,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
                    id);
         multifd_recv_terminate_threads(local_err);
         error_propagate(errp, local_err);
-        return false;
+        return;
     }
     p->c = ioc;
     object_ref(OBJECT(ioc));
@@ -1265,6 +1268,4 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
                        QEMU_THREAD_JOINABLE);
     qatomic_inc(&multifd_recv_state->count);
-    return qatomic_read(&multifd_recv_state->count) ==
-           migrate_multifd_channels();
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0c55df0e52..b98e95dab0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1538,7 +1538,7 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
     }
 }
 
-bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
+void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 {
     /*
      * The new loading channel has its own threads, so it needs to be
@@ -1547,9 +1547,6 @@ bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
     qemu_file_set_blocking(file, true);
     mis->postcopy_qemufile_dst = file;
     trace_postcopy_preempt_new_channel();
-
-    /* Start the migration immediately */
-    return true;
 }
 
 /*
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PULL 00/26] Next patches
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (25 preceding siblings ...)
  2023-02-02 16:06 ` [PULL 26/26] migration: check magic value for deciding the mapping of channels Juan Quintela
@ 2023-02-04 10:19 ` Peter Maydell
  2023-02-06 22:06   ` Peter Xu
  2023-02-07  0:49 ` Juan Quintela
  27 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2023-02-04 10:19 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Richard Henderson, Michael S. Tsirkin,
	Laurent Vivier, Ilya Leoshkevich, Halil Pasic,
	Marc-André Lureau, Coiby Xu, Eric Farman, Alex Williamson,
	Christian Borntraeger, Stefan Hajnoczi,
	Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Fam Zheng, qemu-s390x

On Thu, 2 Feb 2023 at 16:07, Juan Quintela <quintela@redhat.com> wrote:
>
> The following changes since commit deabea6e88f7c4c3c12a36ee30051c6209561165:
>
>   Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-02-02 10:10:07 +0000)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request
>
> for you to fetch changes up to 5ee6d3d1eeccd85aa2a835e82b8d9e1b4f7441e1:
>
>   migration: check magic value for deciding the mapping of channels (2023-02-02 17:04:16 +0100)
>
> ----------------------------------------------------------------
> Migration PULL request, new try

Fails to build on anything that isn't Linux:

In file included from ../migration/postcopy-ram.c:40:
/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cirrus-ci-build/include/qemu/userfaultfd.h:18:10:
fatal error: 'linux/userfaultfd.h' file not found

thanks
-- PMM


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 00/26] Next patches
  2023-02-04 10:19 ` [PULL 00/26] Next patches Peter Maydell
@ 2023-02-06 22:06   ` Peter Xu
  2023-02-06 23:33     ` Juan Quintela
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Xu @ 2023-02-06 22:06 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Juan Quintela, qemu-devel, Richard Henderson, Michael S. Tsirkin,
	Laurent Vivier, Ilya Leoshkevich, Halil Pasic,
	Marc-André Lureau, Coiby Xu, Eric Farman, Alex Williamson,
	Christian Borntraeger, Stefan Hajnoczi,
	Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Fam Zheng, qemu-s390x

On Sat, Feb 04, 2023 at 10:19:34AM +0000, Peter Maydell wrote:
> On Thu, 2 Feb 2023 at 16:07, Juan Quintela <quintela@redhat.com> wrote:
> >
> > The following changes since commit deabea6e88f7c4c3c12a36ee30051c6209561165:
> >
> >   Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-02-02 10:10:07 +0000)
> >
> > are available in the Git repository at:
> >
> >   https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request
> >
> > for you to fetch changes up to 5ee6d3d1eeccd85aa2a835e82b8d9e1b4f7441e1:
> >
> >   migration: check magic value for deciding the mapping of channels (2023-02-02 17:04:16 +0100)
> >
> > ----------------------------------------------------------------
> > Migration PULL request, new try
> 
> Fails to build on anything that isn't Linux:
> 
> In file included from ../migration/postcopy-ram.c:40:
> /private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cirrus-ci-build/include/qemu/userfaultfd.h:18:10:
> fatal error: 'linux/userfaultfd.h' file not found

Oops, my fault.

Juan, please feel free to drop patch "util/userfaultfd: Add uffd_open()".
I'll respin with the whole set.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 00/26] Next patches
  2023-02-06 22:06   ` Peter Xu
@ 2023-02-06 23:33     ` Juan Quintela
  0 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-06 23:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, open list:All patches CC here, Richard Henderson,
	Michael S. Tsirkin, Laurent Vivier, Ilya Leoshkevich,
	Halil Pasic, Marc-André Lureau, Coiby Xu, Eric Farman,
	Alex Williamson, Christian Borntraeger, Stefan Hajnoczi,
	Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, open list:Block layer core,
	Paolo Bonzini, Fam Zheng, open list:S390 TCG CPUs

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

On Mon, Feb 6, 2023, 23:07 Peter Xu <peterx@redhat.com> wrote:

> On Sat, Feb 04, 2023 at 10:19:34AM +0000, Peter Maydell wrote:
> > On Thu, 2 Feb 2023 at 16:07, Juan Quintela <quintela@redhat.com> wrote:
> > >
> > > The following changes since commit
> deabea6e88f7c4c3c12a36ee30051c6209561165:
> > >
> > >   Merge tag 'for_upstream' of
> https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-02-02
> 10:10:07 +0000)
> > >
> > > are available in the Git repository at:
> > >
> > >   https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request
> > >
> > > for you to fetch changes up to
> 5ee6d3d1eeccd85aa2a835e82b8d9e1b4f7441e1:
> > >
> > >   migration: check magic value for deciding the mapping of channels
> (2023-02-02 17:04:16 +0100)
> > >
> > > ----------------------------------------------------------------
> > > Migration PULL request, new try
> >
> > Fails to build on anything that isn't Linux:
> >
> > In file included from ../migration/postcopy-ram.c:40:
> >
> /private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cirrus-ci-build/include/qemu/userfaultfd.h:18:10:
> > fatal error: 'linux/userfaultfd.h' file not found
>
> Oops, my fault.
>
> Juan, please feel free to drop patch "util/userfaultfd: Add uffd_open()".
> I'll respin with the whole set.
>


Fixed it already

> --
> Peter Xu
>
>

[-- Attachment #2: Type: text/html, Size: 2366 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PULL 00/26] Next patches
  2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
                   ` (26 preceding siblings ...)
  2023-02-04 10:19 ` [PULL 00/26] Next patches Peter Maydell
@ 2023-02-07  0:49 ` Juan Quintela
  27 siblings, 0 replies; 31+ messages in thread
From: Juan Quintela @ 2023-02-07  0:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Michael S. Tsirkin, Laurent Vivier,
	Ilya Leoshkevich, Halil Pasic, Marc-André Lureau, Coiby Xu,
	Eric Farman, Alex Williamson, Christian Borntraeger,
	Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Stefan Berger, Eric Blake, Eduardo Habkost,
	Dr. David Alan Gilbert, Thomas Huth, David Hildenbrand,
	Marcel Apfelbaum, John Snow, Yanan Wang, Daniel P. Berrangé,
	Vladimir Sementsov-Ogievskiy, qemu-block, Paolo Bonzini,
	Fam Zheng, qemu-s390x

Juan Quintela <quintela@redhat.com> wrote:
> The following changes since commit deabea6e88f7c4c3c12a36ee30051c6209561165:
>
>   Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-02-02 10:10:07 +0000)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request
>
> for you to fetch changes up to 5ee6d3d1eeccd85aa2a835e82b8d9e1b4f7441e1:
>
>   migration: check magic value for deciding the mapping of channels (2023-02-02 17:04:16 +0100)
>
> ----------------------------------------------------------------
> Migration PULL request, new try

NACK

Has the same problem that peter detected.
Rebased it and fix file to only be included on linux.




>
> Hi
>
> It includes:
> - David Hildenbrand fixes for virtio-men
> - David Gilbert canary to detect problems
> - Fix for rdma return values (Fiona)
> - Peter Xu uffd_open fixes
> - Peter Xu show right downtime for postcopy
> - manish.mishra msg fix fixes
> - my vfio changes.
>
> Please apply.
>
> Please, apply.
>
> ----------------------------------------------------------------
>
> David Hildenbrand (13):
>   migration/ram: Fix populate_read_range()
>   migration/ram: Fix error handling in ram_write_tracking_start()
>   migration/ram: Don't explicitly unprotect when unregistering uffd-wp
>   migration/ram: Rely on used_length for uffd_change_protection()
>   migration/ram: Optimize ram_write_tracking_start() for
>     RamDiscardManager
>   migration/savevm: Move more savevm handling into vmstate_save()
>   migration/savevm: Prepare vmdesc json writer in
>     qemu_savevm_state_setup()
>   migration/savevm: Allow immutable device state to be migrated early
>     (i.e., before RAM)
>   migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and
>     VMSTATE_BITMAP_TEST()
>   migration/ram: Factor out check for advised postcopy
>   virtio-mem: Fail if a memory backend with "prealloc=on" is specified
>   virtio-mem: Migrate immutable properties early
>   virtio-mem: Proper support for preallocation with migration
>
> Dr. David Alan Gilbert (2):
>   migration: Add canary to VMSTATE_END_OF_LIST
>   migration: Perform vmsd structure check during tests
>
> Fiona Ebner (1):
>   migration/rdma: fix return value for qio_channel_rdma_{readv,writev}
>
> Juan Quintela (4):
>   migration: No save_live_pending() method uses the QEMUFile parameter
>   migration: Split save_live_pending() into state_pending_*
>   migration: Remove unused threshold_size parameter
>   migration: simplify migration_iteration_run()
>
> Peter Xu (3):
>   migration: Fix migration crash when target psize larger than host
>   util/userfaultfd: Add uffd_open()
>   migration: Show downtime during postcopy phase
>
> Zhenzhong Duan (1):
>   migration/dirtyrate: Show sample pages only in page-sampling mode
>
> manish.mishra (2):
>   io: Add support for MSG_PEEK for socket channel
>   migration: check magic value for deciding the mapping of channels
>
>  docs/devel/migration.rst            |  18 +--
>  docs/devel/vfio-migration.rst       |   4 +-
>  include/hw/virtio/virtio-mem.h      |   8 ++
>  include/io/channel.h                |   6 +
>  include/migration/misc.h            |   4 +-
>  include/migration/register.h        |  17 +--
>  include/migration/vmstate.h         |  35 +++++-
>  include/qemu/userfaultfd.h          |   8 ++
>  migration/channel.h                 |   5 +
>  migration/migration.h               |   4 +
>  migration/multifd.h                 |   2 +-
>  migration/postcopy-ram.h            |   2 +-
>  migration/savevm.h                  |  10 +-
>  chardev/char-socket.c               |   4 +-
>  hw/core/machine.c                   |   4 +-
>  hw/s390x/s390-stattrib.c            |  11 +-
>  hw/vfio/migration.c                 |  20 +--
>  hw/virtio/virtio-mem.c              | 144 ++++++++++++++++++++-
>  io/channel-buffer.c                 |   1 +
>  io/channel-command.c                |   1 +
>  io/channel-file.c                   |   1 +
>  io/channel-null.c                   |   1 +
>  io/channel-socket.c                 |  19 ++-
>  io/channel-tls.c                    |   1 +
>  io/channel-websock.c                |   1 +
>  io/channel.c                        |  16 ++-
>  migration/block-dirty-bitmap.c      |  14 +--
>  migration/block.c                   |  13 +-
>  migration/channel-block.c           |   1 +
>  migration/channel.c                 |  45 +++++++
>  migration/dirtyrate.c               |  10 +-
>  migration/migration.c               | 119 ++++++++++++------
>  migration/multifd.c                 |  19 +--
>  migration/postcopy-ram.c            |  16 +--
>  migration/ram.c                     | 120 +++++++++++++-----
>  migration/rdma.c                    |  16 ++-
>  migration/savevm.c                  | 187 ++++++++++++++++++++--------
>  migration/vmstate.c                 |   2 +
>  scsi/qemu-pr-helper.c               |   2 +-
>  tests/qtest/migration-test.c        |   3 +-
>  tests/qtest/tpm-emu.c               |   2 +-
>  tests/unit/test-io-channel-socket.c |   1 +
>  util/userfaultfd.c                  |  13 +-
>  util/vhost-user-server.c            |   2 +-
>  hw/vfio/trace-events                |   2 +-
>  migration/trace-events              |   7 +-
>  46 files changed, 715 insertions(+), 226 deletions(-)



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-02-07  0:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 16:06 [PULL 00/26] Next patches Juan Quintela
2023-02-02 16:06 ` [PULL 01/26] migration: Fix migration crash when target psize larger than host Juan Quintela
2023-02-02 16:06 ` [PULL 02/26] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
2023-02-02 16:06 ` [PULL 03/26] migration: Split save_live_pending() into state_pending_* Juan Quintela
2023-02-02 16:06 ` [PULL 04/26] migration: Remove unused threshold_size parameter Juan Quintela
2023-02-02 16:06 ` [PULL 05/26] migration: simplify migration_iteration_run() Juan Quintela
2023-02-02 16:06 ` [PULL 06/26] util/userfaultfd: Add uffd_open() Juan Quintela
2023-02-02 16:06 ` [PULL 07/26] migration/ram: Fix populate_read_range() Juan Quintela
2023-02-02 16:06 ` [PULL 08/26] migration/ram: Fix error handling in ram_write_tracking_start() Juan Quintela
2023-02-02 16:06 ` [PULL 09/26] migration/ram: Don't explicitly unprotect when unregistering uffd-wp Juan Quintela
2023-02-02 16:06 ` [PULL 10/26] migration/ram: Rely on used_length for uffd_change_protection() Juan Quintela
2023-02-02 16:06 ` [PULL 11/26] migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager Juan Quintela
2023-02-02 16:06 ` [PULL 12/26] migration/savevm: Move more savevm handling into vmstate_save() Juan Quintela
2023-02-02 16:06 ` [PULL 13/26] migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() Juan Quintela
2023-02-02 16:06 ` [PULL 14/26] migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) Juan Quintela
2023-02-02 16:06 ` [PULL 15/26] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() Juan Quintela
2023-02-02 16:06 ` [PULL 16/26] migration/ram: Factor out check for advised postcopy Juan Quintela
2023-02-02 16:06 ` [PULL 17/26] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Juan Quintela
2023-02-02 16:06 ` [PULL 18/26] virtio-mem: Migrate immutable properties early Juan Quintela
2023-02-02 16:06 ` [PULL 19/26] virtio-mem: Proper support for preallocation with migration Juan Quintela
2023-02-02 16:06 ` [PULL 20/26] migration: Show downtime during postcopy phase Juan Quintela
2023-02-02 16:06 ` [PULL 21/26] migration/rdma: fix return value for qio_channel_rdma_{readv, writev} Juan Quintela
2023-02-02 16:06 ` [PULL 22/26] migration: Add canary to VMSTATE_END_OF_LIST Juan Quintela
2023-02-02 16:06 ` [PULL 23/26] migration: Perform vmsd structure check during tests Juan Quintela
2023-02-02 16:06 ` [PULL 24/26] migration/dirtyrate: Show sample pages only in page-sampling mode Juan Quintela
2023-02-02 16:06 ` [PULL 25/26] io: Add support for MSG_PEEK for socket channel Juan Quintela
2023-02-02 16:06 ` [PULL 26/26] migration: check magic value for deciding the mapping of channels Juan Quintela
2023-02-04 10:19 ` [PULL 00/26] Next patches Peter Maydell
2023-02-06 22:06   ` Peter Xu
2023-02-06 23:33     ` Juan Quintela
2023-02-07  0:49 ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.