All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/25] migration: Improve error reporting
@ 2024-03-06 13:34 Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
                   ` (25 more replies)
  0 siblings, 26 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Hello,

The motivation behind these changes is to improve error reporting to
the upper management layer (libvirt) with a more detailed error, this
to let it decide, depending on the reported error, whether to try
migration again later. It would be useful in cases where migration
fails due to lack of HW resources on the host. For instance, some
adapters can only initiate a limited number of simultaneous dirty
tracking requests and this imposes a limit on the the number of VMs
that can be migrated simultaneously.

We are not quite ready for such a mechanism but what we can do first is
to cleanup the error reporting in the early save_setup sequence. This
is what the following changes propose, by adding an Error** argument to
various handlers and propagating it to the core migration subsystem.


Patchset is organized as follow :

* [1-4] already queued in migration-next.
  
  migration: Report error when shutdown fails
  migration: Remove SaveStateHandler and LoadStateHandler typedefs
  migration: Add documentation for SaveVMHandlers
  migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
  
* [5-9] are prequisite changes in other components related to the
  migration save_setup() handler. They make sure a failure is not
  returned without setting an error.
  
  s390/stattrib: Add Error** argument to set_migrationmode() handler
  vfio: Always report an error in vfio_save_setup()
  migration: Always report an error in block_save_setup()
  migration: Always report an error in ram_save_setup()
  migration: Add Error** argument to vmstate_save()

* [10-15] are the core changes in migration and memory components to
  propagate an error reported in a save_setup() handler.

  migration: Add Error** argument to qemu_savevm_state_setup()
  migration: Add Error** argument to .save_setup() handler
  migration: Add Error** argument to .load_setup() handler
  memory: Add Error** argument to .log_global_start() handler
  memory: Add Error** argument to the global_dirty_log routines
  migration: Modify ram_init_bitmaps() to report dirty tracking errors

* [16-19] contains the VFIO changes we are interested in. Can go
  through vfio-next.

  vfio: Add Error** argument to .set_dirty_page_tracking() handler
  vfio: Add Error** argument to vfio_devices_dma_logging_start()
  vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  vfio: Use new Error** argument in vfio_save_setup()

* [20-25] are followups for better error handling in VFIO. Good to
  have but not necessary for the issue described in the intro. Can go
  through vfio-next.

  vfio: Add Error** argument to .vfio_save_config() handler
  vfio: Reverse test on vfio_get_dirty_bitmap()
  memory: Add Error** argument to memory_get_xlat_addr()
  vfio: Add Error** argument to .get_dirty_bitmap() handler
  vfio: Also trace event failures in vfio_save_complete_precopy()
  vfio: Extend vfio_set_migration_error() with Error* argument

Thanks,

C.

Changes in v4:

 - Fixed frenchism futur to future
 - Fixed typo in set_migrationmode() handler
 - Added error_free() in hmp_migrationmode()
 - Fixed state name printed out in error returned by vfio_save_setup()
 - Fixed test on error returned by qemu_file_get_error()
 - Added an error when bdrv_nb_sectors() returns a negative value 
 - Dropped log_global_stop() and log_global_sync() changes
 - Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL
 - Modified memory_global_dirty_log_start() to loop on the list of
   listeners and handle errors directly.
 - Introduced memory_global_dirty_log_rollback() to revert operations
   previously done

Changes in v3:

 - New changes to make sure an error is always set in case of failure.
   This is the reason behing the 5/6 extra patches. (Markus)
 - Documentation fixup (Peter + Avihai)
 - Set migration state to MIGRATION_STATUS_FAILED always
 - Fixed error handling in bg_migration_thread() (Peter)
 - Fixed return value of vfio_listener_log_global_start/stop(). 
   Went unnoticed because value is not tested. (Peter)
 - Add ERRP_GUARD() when error_prepend is used 
 - Use error_setg_errno() when possible
    
Changes in v2:

- Removed v1 patches addressing the return-path thread termination as
  they are now superseded by :  
  https://lore.kernel.org/qemu-devel/20240226203122.22894-1-farosas@suse.de/
- Documentation updates of handlers
- Removed call to PRECOPY_NOTIFY_SETUP notifiers in case of errors
- Modified routines taking an Error** argument to return a bool when
  possible and made adjustments in callers.
- new MEMORY_LISTENER_CALL_LOG_GLOBAL macro for .log_global*()
  handlers
- Handled SETUP state when migration terminates
- Modified memory_get_xlat_addr() to take an Error** argument
- Various refinements on error handling

Cédric Le Goater (25):
  migration: Report error when shutdown fails
  migration: Remove SaveStateHandler and LoadStateHandler typedefs
  migration: Add documentation for SaveVMHandlers
  migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
  s390/stattrib: Add Error** argument to set_migrationmode() handler
  vfio: Always report an error in vfio_save_setup()
  migration: Always report an error in block_save_setup()
  migration: Always report an error in ram_save_setup()
  migration: Add Error** argument to vmstate_save()
  migration: Add Error** argument to qemu_savevm_state_setup()
  migration: Add Error** argument to .save_setup() handler
  migration: Add Error** argument to .load_setup() handler
  memory: Add Error** argument to .log_global_start() handler
  memory: Add Error** argument to the global_dirty_log routines
  migration: Modify ram_init_bitmaps() to report dirty tracking errors
  vfio: Add Error** argument to .set_dirty_page_tracking() handler
  vfio: Add Error** argument to vfio_devices_dma_logging_start()
  vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  vfio: Use new Error** argument in vfio_save_setup()
  vfio: Add Error** argument to .vfio_save_config() handler
  vfio: Reverse test on vfio_get_dirty_bitmap()
  memory: Add Error** argument to memory_get_xlat_addr()
  vfio: Add Error** argument to .get_dirty_bitmap() handler
  vfio: Also trace event failures in vfio_save_complete_precopy()
  vfio: Extend vfio_set_migration_error() with Error* argument

 include/exec/memory.h                 |  25 ++-
 include/hw/s390x/storage-attributes.h |   2 +-
 include/hw/vfio/vfio-common.h         |  29 ++-
 include/hw/vfio/vfio-container-base.h |  35 +++-
 include/migration/register.h          | 273 +++++++++++++++++++++++---
 include/qemu/typedefs.h               |   2 -
 migration/savevm.h                    |   2 +-
 hw/i386/xen/xen-hvm.c                 |   5 +-
 hw/ppc/spapr.c                        |   2 +-
 hw/s390x/s390-stattrib-kvm.c          |  12 +-
 hw/s390x/s390-stattrib.c              |  15 +-
 hw/vfio/common.c                      | 161 +++++++++------
 hw/vfio/container-base.c              |   9 +-
 hw/vfio/container.c                   |  19 +-
 hw/vfio/migration.c                   |  99 ++++++----
 hw/vfio/pci.c                         |   5 +-
 hw/virtio/vhost-vdpa.c                |   5 +-
 hw/virtio/vhost.c                     |   3 +-
 migration/block-dirty-bitmap.c        |   4 +-
 migration/block.c                     |  19 +-
 migration/dirtyrate.c                 |  13 +-
 migration/migration.c                 |  27 ++-
 migration/qemu-file.c                 |   5 +-
 migration/ram.c                       |  46 ++++-
 migration/savevm.c                    |  59 +++---
 system/memory.c                       |  56 +++++-
 26 files changed, 713 insertions(+), 219 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH v4 01/25] migration: Report error when shutdown fails
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs Cédric Le Goater
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This will help detect issues regarding I/O channels usage.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 migration/qemu-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b10c8826296808d815d01ee4ed4912f0ca4313d9..a10882d47fcbc17f136653b9c4afd914552c8c8d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -63,6 +63,8 @@ struct QEMUFile {
  */
 int qemu_file_shutdown(QEMUFile *f)
 {
+    Error *err = NULL;
+
     /*
      * We must set qemufile error before the real shutdown(), otherwise
      * there can be a race window where we thought IO all went though
@@ -91,7 +93,8 @@ int qemu_file_shutdown(QEMUFile *f)
         return -ENOSYS;
     }
 
-    if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL) < 0) {
+    if (qio_channel_shutdown(f->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, &err) < 0) {
+        error_report_err(err);
         return -EIO;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers Cédric Le Goater
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

They are only used once.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/migration/register.h | 4 ++--
 include/qemu/typedefs.h      | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 9ab1f79512c605f0c88a45b560c57486fa054441..2e6a7d766e62f64940086b7b511249c9ff21fa62 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -18,7 +18,7 @@
 
 typedef struct SaveVMHandlers {
     /* This runs inside the BQL.  */
-    SaveStateHandler *save_state;
+    void (*save_state)(QEMUFile *f, void *opaque);
 
     /*
      * save_prepare is called early, even before migration starts, and can be
@@ -71,7 +71,7 @@ typedef struct SaveVMHandlers {
     /* This calculate the exact remaining data to transfer */
     void (*state_pending_exact)(void *opaque, uint64_t *must_precopy,
                                 uint64_t *can_postcopy);
-    LoadStateHandler *load_state;
+    int (*load_state)(QEMUFile *f, void *opaque, int version_id);
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
     /* Called when postcopy migration wants to resume from failure */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index a028dba4d0b67e87165f9f1a4e960e9e6b94477c..50c277cf0b467f782ba526041b2663207bf70945 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -151,8 +151,6 @@ typedef struct IRQState *qemu_irq;
 /*
  * Function types
  */
-typedef void SaveStateHandler(QEMUFile *f, void *opaque);
-typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
 typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
 
 #endif /* QEMU_TYPEDEFS_H */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Cédric Le Goater
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

The SaveVMHandlers structure is still in use for complex subsystems
and devices. Document the handlers since we are going to modify a few
later.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/migration/register.h | 263 +++++++++++++++++++++++++++++++----
 1 file changed, 237 insertions(+), 26 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 2e6a7d766e62f64940086b7b511249c9ff21fa62..d7b70a8be68c9df47c7843bda7d430989d7ca384 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -16,30 +16,130 @@
 
 #include "hw/vmstate-if.h"
 
+/**
+ * struct SaveVMHandlers: handler structure to finely control
+ * migration of complex subsystems and devices, such as RAM, block and
+ * VFIO.
+ */
 typedef struct SaveVMHandlers {
-    /* This runs inside the BQL.  */
+
+    /* The following handlers run inside the BQL. */
+
+    /**
+     * @save_state
+     *
+     * Saves state section on the source using the latest state format
+     * version.
+     *
+     * Legacy method. Should be deprecated when all users are ported
+     * to VMStateDescription.
+     *
+     * @f: QEMUFile where to send the data
+     * @opaque: data pointer passed to register_savevm_live()
+     */
     void (*save_state)(QEMUFile *f, void *opaque);
 
-    /*
-     * save_prepare is called early, even before migration starts, and can be
-     * used to perform early checks.
+    /**
+     * @save_prepare
+     *
+     * Called early, even before migration starts, and can be used to
+     * perform early checks.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     * @errp: pointer to Error*, to store an error if it happens.
+     *
+     * Returns zero to indicate success and negative for error
      */
     int (*save_prepare)(void *opaque, Error **errp);
+
+    /**
+     * @save_setup
+     *
+     * Initializes the data structures on the source and transmits
+     * first section containing information on the device
+     *
+     * @f: QEMUFile where to send the data
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*save_setup)(QEMUFile *f, void *opaque);
+
+    /**
+     * @save_cleanup
+     *
+     * Uninitializes the data structures on the source
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     */
     void (*save_cleanup)(void *opaque);
+
+    /**
+     * @save_live_complete_postcopy
+     *
+     * Called at the end of postcopy for all postcopyable devices.
+     *
+     * @f: QEMUFile where to send the data
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
+
+    /**
+     * @save_live_complete_precopy
+     *
+     * Transmits the last section for the device containing any
+     * remaining data at the end of a precopy phase. When postcopy is
+     * enabled, devices that support postcopy will skip this step,
+     * where the final data will be flushed at the end of postcopy via
+     * @save_live_complete_postcopy instead.
+     *
+     * @f: QEMUFile where to send the data
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the BQL.  */
+
+    /**
+     * @is_active
+     *
+     * Will skip a state section if not active
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns true if state section is active else false
+     */
     bool (*is_active)(void *opaque);
+
+    /**
+     * @has_postcopy
+     *
+     * Checks if a device supports postcopy
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns true for postcopy support else false
+     */
     bool (*has_postcopy)(void *opaque);
 
-    /* is_active_iterate
-     * If it is not NULL then qemu_savevm_state_iterate will skip iteration if
-     * it returns false. For example, it is needed for only-postcopy-states,
-     * which needs to be handled by qemu_savevm_state_setup and
-     * qemu_savevm_state_pending, but do not need iterations until not in
-     * postcopy stage.
+    /**
+     * @is_active_iterate
+     *
+     * As #SaveVMHandlers.is_active(), will skip an inactive state
+     * section in qemu_savevm_state_iterate.
+     *
+     * For example, it is needed for only-postcopy-states, which needs
+     * to be handled by qemu_savevm_state_setup() and
+     * qemu_savevm_state_pending(), but do not need iterations until
+     * not in postcopy stage.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns true if state section is active else false
      */
     bool (*is_active_iterate)(void *opaque);
 
@@ -48,44 +148,155 @@ typedef struct SaveVMHandlers {
      * use data that is local to the migration thread or protected
      * by other locks.
      */
+
+    /**
+     * @save_live_iterate
+     *
+     * Should send a chunk of data until the point that stream
+     * bandwidth limits tell it to stop. Each call generates one
+     * section.
+     *
+     * @f: QEMUFile where to send the data
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns 0 to indicate that there is still more data to send,
+     *         1 that there is no more data to send and
+     *         negative to indicate an error.
+     */
     int (*save_live_iterate)(QEMUFile *f, void *opaque);
 
     /* This runs outside the BQL!  */
-    /* Note for save_live_pending:
-     * must_precopy:
-     * - must be migrated in precopy or in stopped state
-     * - i.e. must be migrated before target start
-     *
-     * can_postcopy:
-     * - can migrate in postcopy or in stopped state
-     * - i.e. can migrate after target start
-     * - some can also be migrated during precopy (RAM)
-     * - some must be migrated after source stops (block-dirty-bitmap)
-     *
-     * Sum of can_postcopy and must_postcopy is the whole amount of
+
+    /**
+     * @state_pending_estimate
+     *
+     * This estimates the remaining data to transfer
+     *
+     * Sum of @can_postcopy and @must_postcopy is the whole amount of
      * pending data.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     * @must_precopy: amount of data that must be migrated in precopy
+     *                or in stopped state, i.e. that must be migrated
+     *                before target start.
+     * @can_postcopy: amount of data that can be migrated in postcopy
+     *                or in stopped state, i.e. after target start.
+     *                Some can also be migrated during precopy (RAM).
+     *                Some must be migrated after source stops
+     *                (block-dirty-bitmap)
      */
-    /* This estimates the remaining data to transfer */
     void (*state_pending_estimate)(void *opaque, uint64_t *must_precopy,
                                    uint64_t *can_postcopy);
-    /* This calculate the exact remaining data to transfer */
+
+    /**
+     * @state_pending_exact
+     *
+     * This calculates the exact remaining data to transfer
+     *
+     * Sum of @can_postcopy and @must_postcopy is the whole amount of
+     * pending data.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     * @must_precopy: amount of data that must be migrated in precopy
+     *                or in stopped state, i.e. that must be migrated
+     *                before target start.
+     * @can_postcopy: amount of data that can be migrated in postcopy
+     *                or in stopped state, i.e. after target start.
+     *                Some can also be migrated during precopy (RAM).
+     *                Some must be migrated after source stops
+     *                (block-dirty-bitmap)
+     */
     void (*state_pending_exact)(void *opaque, uint64_t *must_precopy,
                                 uint64_t *can_postcopy);
+
+    /**
+     * @load_state
+     *
+     * Load sections generated by any of the save functions that
+     * generate sections.
+     *
+     * Legacy method. Should be deprecated when all users are ported
+     * to VMStateDescription.
+     *
+     * @f: QEMUFile where to receive the data
+     * @opaque: data pointer passed to register_savevm_live()
+     * @version_id: the maximum version_id supported
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*load_state)(QEMUFile *f, void *opaque, int version_id);
+
+    /**
+     * @load_setup
+     *
+     * Initializes the data structures on the destination.
+     *
+     * @f: QEMUFile where to receive the data
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*load_setup)(QEMUFile *f, void *opaque);
+
+    /**
+     * @load_cleanup
+     *
+     * Uninitializes the data structures on the destination.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*load_cleanup)(void *opaque);
-    /* Called when postcopy migration wants to resume from failure */
+
+    /**
+     * @resume_prepare
+     *
+     * Called when postcopy migration wants to resume from failure
+     *
+     * @s: Current migration state
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*resume_prepare)(MigrationState *s, void *opaque);
-    /* Checks if switchover ack should be used. Called only in dest */
+
+    /**
+     * @switchover_ack_needed
+     *
+     * Checks if switchover ack should be used. Called only on
+     * destination.
+     *
+     * @opaque: data pointer passed to register_savevm_live()
+     *
+     * Returns true if switchover ack should be used and false
+     * otherwise
+     */
     bool (*switchover_ack_needed)(void *opaque);
 } SaveVMHandlers;
 
+/**
+ * register_savevm_live: Register a set of custom migration handlers
+ *
+ * @idstr: state section identifier
+ * @instance_id: instance id
+ * @version_id: version id supported
+ * @ops: SaveVMHandlers structure
+ * @opaque: data pointer passed to SaveVMHandlers handlers
+ */
 int register_savevm_live(const char *idstr,
                          uint32_t instance_id,
                          int version_id,
                          const SaveVMHandlers *ops,
                          void *opaque);
 
+/**
+ * unregister_savevm: Unregister custom migration handlers
+ *
+ * @obj: object associated with state section
+ * @idstr:  state section identifier
+ * @opaque: data pointer passed to register_savevm_live()
+ */
 void unregister_savevm(VMStateIf *obj, const char *idstr, void *opaque);
 
 #endif
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (2 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

When commit bd2270608fa0 ("migration/ram.c: add a notifier chain for
precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of
qemu_savevm_state_setup(), it didn't take into account a possible
error in the loop calling vmstate_save() or .save_setup() handlers.

Check ret value before calling the notifiers.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 migration/savevm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index dc1fb9c0d32bbf037471b810bd28e9361c2d7b87..63066f49f3ad5504be6d44ffdf9f7b759c0a25ef 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1317,7 +1317,7 @@ void qemu_savevm_state_setup(QEMUFile *f)
     MigrationState *ms = migrate_get_current();
     SaveStateEntry *se;
     Error *local_err = NULL;
-    int ret;
+    int ret = 0;
 
     json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
     json_writer_start_array(ms->vmdesc, "devices");
@@ -1351,6 +1351,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
         }
     }
 
+    if (ret) {
+        return;
+    }
+
     if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
         error_report_err(local_err);
     }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (3 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07 12:18   ` Fabiano Rosas
                     ` (2 more replies)
  2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
                   ` (20 subsequent siblings)
  25 siblings, 3 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Halil Pasic, Christian Borntraeger, Thomas Huth

This will prepare ground for future changes adding an Error** argument
to the save_setup() handler. We need to make sure that on failure,
set_migrationmode() always sets a new error. See the Rules section in
qapi/error.h.

Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Fixed state name printed out in error returned by vfio_save_setup()
 - Fixed test on error returned by qemu_file_get_error()

 include/hw/s390x/storage-attributes.h |  2 +-
 hw/s390x/s390-stattrib-kvm.c          | 12 ++++++++++--
 hw/s390x/s390-stattrib.c              | 15 ++++++++++-----
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/hw/s390x/storage-attributes.h b/include/hw/s390x/storage-attributes.h
index 5239eb538c1b087797867a247abfc14551af6a4d..8921a04d514bf64a3113255ee10ed33fc598ae06 100644
--- a/include/hw/s390x/storage-attributes.h
+++ b/include/hw/s390x/storage-attributes.h
@@ -39,7 +39,7 @@ struct S390StAttribClass {
     int (*set_stattr)(S390StAttribState *sa, uint64_t start_gfn,
                       uint32_t count, uint8_t *values);
     void (*synchronize)(S390StAttribState *sa);
-    int (*set_migrationmode)(S390StAttribState *sa, bool value);
+    int (*set_migrationmode)(S390StAttribState *sa, bool value, Error **errp);
     int (*get_active)(S390StAttribState *sa);
     long long (*get_dirtycount)(S390StAttribState *sa);
 };
diff --git a/hw/s390x/s390-stattrib-kvm.c b/hw/s390x/s390-stattrib-kvm.c
index 24cd01382e2d74d62c2d7e980eb6aca1077d893d..eeaa8110981c970e91a8948f027e398c34637321 100644
--- a/hw/s390x/s390-stattrib-kvm.c
+++ b/hw/s390x/s390-stattrib-kvm.c
@@ -17,6 +17,7 @@
 #include "sysemu/kvm.h"
 #include "exec/ram_addr.h"
 #include "kvm/kvm_s390x.h"
+#include "qapi/error.h"
 
 Object *kvm_s390_stattrib_create(void)
 {
@@ -137,14 +138,21 @@ static void kvm_s390_stattrib_synchronize(S390StAttribState *sa)
     }
 }
 
-static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val)
+static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val,
+                                               Error **errp)
 {
     struct kvm_device_attr attr = {
         .group = KVM_S390_VM_MIGRATION,
         .attr = val,
         .addr = 0,
     };
-    return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attr);
+    int r;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attr);
+    if (r) {
+        error_setg_errno(errp, -r, "setting KVM_S390_VM_MIGRATION failed");
+    }
+    return r;
 }
 
 static long long kvm_s390_stattrib_get_dirtycount(S390StAttribState *sa)
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index c483b62a9b5f71772639fc180bdad15ecb6711cb..b743e8a2fee84c7374460ccea6df1cf447cda44b 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -60,11 +60,13 @@ void hmp_migrationmode(Monitor *mon, const QDict *qdict)
     S390StAttribState *sas = s390_get_stattrib_device();
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
     uint64_t what = qdict_get_int(qdict, "mode");
+    Error *local_err = NULL;
     int r;
 
-    r = sac->set_migrationmode(sas, what);
+    r = sac->set_migrationmode(sas, what, &local_err);
     if (r < 0) {
-        monitor_printf(mon, "Error: %s", strerror(-r));
+        monitor_printf(mon, "Error: %s", error_get_pretty(local_err));
+        error_free(local_err);
     }
 }
 
@@ -170,13 +172,15 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
+    Error *local_err = NULL;
     int res;
     /*
      * Signal that we want to start a migration, thus needing PGSTE dirty
      * tracking.
      */
-    res = sac->set_migrationmode(sas, 1);
+    res = sac->set_migrationmode(sas, true, &local_err);
     if (res) {
+        error_report_err(local_err);
         return res;
     }
     qemu_put_be64(f, STATTR_FLAG_EOS);
@@ -260,7 +264,7 @@ static void cmma_save_cleanup(void *opaque)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
-    sac->set_migrationmode(sas, 0);
+    sac->set_migrationmode(sas, false, NULL);
 }
 
 static bool cmma_active(void *opaque)
@@ -293,7 +297,8 @@ static long long qemu_s390_get_dirtycount_stub(S390StAttribState *sa)
 {
     return 0;
 }
-static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value)
+static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value,
+                                            Error **errp)
 {
     return 0;
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (4 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:36   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This will prepare ground for future changes adding an Error** argument
to the save_setup() handler. We need to make sure that on failure,
vfio_save_setup() always sets a new error.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Fixed state name printed out in error returned by vfio_save_setup()
 - Fixed test on error returned by qemu_file_get_error()
 
 hw/vfio/migration.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 2050ac8897231ff89cc223f0570d5c7a65dede9e..330b3a28548e32b0b3268072895bb5e4875766a2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -383,6 +383,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
     uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
+    int ret;
 
     qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
 
@@ -397,13 +398,13 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     }
 
     if (vfio_precopy_supported(vbasedev)) {
-        int ret;
-
         switch (migration->device_state) {
         case VFIO_DEVICE_STATE_RUNNING:
             ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
                                            VFIO_DEVICE_STATE_RUNNING);
             if (ret) {
+                error_report("%s: Failed to set new PRE_COPY state",
+                             vbasedev->name);
                 return ret;
             }
 
@@ -414,6 +415,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
             /* vfio_save_complete_precopy() will go to STOP_COPY */
             break;
         default:
+            error_report("%s: Invalid device state %d", vbasedev->name,
+                         migration->device_state);
             return -EINVAL;
         }
     }
@@ -422,7 +425,13 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 
     qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 
-    return qemu_file_get_error(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_report("%s: save setup failed : %s", vbasedev->name,
+                     strerror(-ret));
+    }
+
+    return ret;
 }
 
 static void vfio_save_cleanup(void *opaque)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 07/25] migration: Always report an error in block_save_setup()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (5 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07 12:28   ` Fabiano Rosas
  2024-03-08  6:59   ` Peter Xu
  2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
                   ` (18 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Stefan Hajnoczi

This will prepare ground for future changes adding an Error** argument
to the save_setup() handler. We need to make sure that on failure,
block_save_setup() always sets a new error.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Added an error when bdrv_nb_sectors() returns a negative value
 
 migration/block.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 8c6ebafacc1ffe930d1d4f19d968817b14852c69..aa65ec718c2875ad9d1c40c971035f14d8086a6e 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -367,7 +367,7 @@ static void unset_dirty_tracking(void)
     }
 }
 
-static int init_blk_migration(QEMUFile *f)
+static int init_blk_migration(QEMUFile *f, Error **errp)
 {
     BlockDriverState *bs;
     BlkMigDevState *bmds;
@@ -378,7 +378,6 @@ static int init_blk_migration(QEMUFile *f)
         BlkMigDevState *bmds;
         BlockDriverState *bs;
     } *bmds_bs;
-    Error *local_err = NULL;
     int ret;
 
     GRAPH_RDLOCK_GUARD_MAINLOOP();
@@ -404,6 +403,10 @@ static int init_blk_migration(QEMUFile *f)
         sectors = bdrv_nb_sectors(bs);
         if (sectors <= 0) {
             ret = sectors;
+            if (ret < 0) {
+                error_setg(errp, "Error getting length of block device %s",
+                           bdrv_get_device_name(bs));
+            }
             bdrv_next_cleanup(&it);
             goto out;
         }
@@ -439,9 +442,8 @@ static int init_blk_migration(QEMUFile *f)
         bs = bmds_bs[i].bs;
 
         if (bmds) {
-            ret = blk_insert_bs(bmds->blk, bs, &local_err);
+            ret = blk_insert_bs(bmds->blk, bs, errp);
             if (ret < 0) {
-                error_report_err(local_err);
                 goto out;
             }
 
@@ -711,6 +713,7 @@ static void block_migration_cleanup(void *opaque)
 static int block_save_setup(QEMUFile *f, void *opaque)
 {
     int ret;
+    Error *local_err = NULL;
 
     trace_migration_block_save("setup", block_mig_state.submitted,
                                block_mig_state.transferred);
@@ -718,18 +721,27 @@ static int block_save_setup(QEMUFile *f, void *opaque)
     warn_report("block migration is deprecated;"
                 " use blockdev-mirror with NBD instead");
 
-    ret = init_blk_migration(f);
+    ret = init_blk_migration(f, &local_err);
     if (ret < 0) {
+        error_report_err(local_err);
         return ret;
     }
 
     /* start track dirty blocks */
     ret = set_dirty_tracking();
     if (ret) {
+        error_setg_errno(&local_err, -ret,
+                         "Failed to start block dirty tracking");
+        error_report_err(local_err);
         return ret;
     }
 
     ret = flush_blks(f);
+    if (ret) {
+        error_setg_errno(&local_err, -ret, "Flushing block failed");
+        error_report_err(local_err);
+        return ret;
+    }
     blk_mig_reset_dirty_cursor();
     qemu_put_be64(f, BLK_MIG_FLAG_EOS);
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 08/25] migration: Always report an error in ram_save_setup()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (6 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07 12:28   ` Fabiano Rosas
  2024-03-06 13:34 ` [PATCH v4 09/25] migration: Add Error** argument to vmstate_save() Cédric Le Goater
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This will prepare ground for future changes adding an Error** argument
to the save_setup() handler. We need to make sure that on failure,
ram_save_setup() sets a new error.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Fixed test on error returned by qemu_fflush() 
 
 migration/ram.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 003c28e1336a5fbe7a3877512b8fc3cf62f1bab3..3ac7f52a5f8e2c0d78a8cf150b3fa6611e12ffcc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3057,12 +3057,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     int ret, max_hg_page_size;
 
     if (compress_threads_save_setup()) {
+        error_report("%s: failed to start compress threads", __func__);
         return -1;
     }
 
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
         if (ram_init_all(rsp) != 0) {
+            error_report("%s: failed to setup RAM for migration", __func__);
             compress_threads_save_cleanup();
             return -1;
         }
@@ -3099,12 +3101,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     ret = rdma_registration_start(f, RAM_CONTROL_SETUP);
     if (ret < 0) {
+        error_report("%s: failed to start RDMA registration", __func__);
         qemu_file_set_error(f, ret);
         return ret;
     }
 
     ret = rdma_registration_stop(f, RAM_CONTROL_SETUP);
     if (ret < 0) {
+        error_report("%s: failed to stop RDMA registration", __func__);
         qemu_file_set_error(f, ret);
         return ret;
     }
@@ -3116,6 +3120,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ret = multifd_send_sync_main();
     bql_lock();
     if (ret < 0) {
+        error_report("%s: multifd synchronization failed", __func__);
         return ret;
     }
 
@@ -3125,7 +3130,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }
 
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
-    return qemu_fflush(f);
+    ret = qemu_fflush(f);
+    if (ret < 0) {
+        error_report("%s failed : %s", __func__, strerror(-ret));
+    }
+    return ret;
 }
 
 static void ram_save_file_bmap(QEMUFile *f)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 09/25] migration: Add Error** argument to vmstate_save()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (7 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This will prepare ground for future changes adding an Error** argument
to qemu_savevm_state_setup().

Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 migration/savevm.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 63066f49f3ad5504be6d44ffdf9f7b759c0a25ef..ee31ffb5e88cea723039c754c30ce2c8a0ef35f3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1009,11 +1009,10 @@ static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
     }
 }
 
-static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
+static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc,
+                        Error **errp)
 {
     int ret;
-    Error *local_err = NULL;
-    MigrationState *s = migrate_get_current();
 
     if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
         return 0;
@@ -1035,10 +1034,9 @@ static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
     if (!se->vmsd) {
         vmstate_save_old_style(f, se, vmdesc);
     } else {
-        ret = vmstate_save_state_with_err(f, se->vmsd, se->opaque, vmdesc, &local_err);
+        ret = vmstate_save_state_with_err(f, se->vmsd, se->opaque, vmdesc,
+                                          errp);
         if (ret) {
-            migrate_set_error(s, local_err);
-            error_report_err(local_err);
             return ret;
         }
     }
@@ -1325,8 +1323,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (se->vmsd && se->vmsd->early_setup) {
-            ret = vmstate_save(f, se, ms->vmdesc);
+            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
             if (ret) {
+                migrate_set_error(ms, local_err);
+                error_report_err(local_err);
                 qemu_file_set_error(f, ret);
                 break;
             }
@@ -1545,6 +1545,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
     JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
+    Error *local_err = NULL;
     int ret;
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -1555,8 +1556,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 
         start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
 
-        ret = vmstate_save(f, se, vmdesc);
+        ret = vmstate_save(f, se, vmdesc, &local_err);
         if (ret) {
+            migrate_set_error(ms, local_err);
+            error_report_err(local_err);
             qemu_file_set_error(f, ret);
             return ret;
         }
@@ -1571,7 +1574,6 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
          * bdrv_activate_all() on the other end won't fail. */
         ret = bdrv_inactivate_all();
         if (ret) {
-            Error *local_err = NULL;
             error_setg(&local_err, "%s: bdrv_inactivate_all() failed (%d)",
                        __func__, ret);
             migrate_set_error(ms, local_err);
@@ -1767,6 +1769,8 @@ void qemu_savevm_live_state(QEMUFile *f)
 
 int qemu_save_device_state(QEMUFile *f)
 {
+    MigrationState *ms = migrate_get_current();
+    Error *local_err = NULL;
     SaveStateEntry *se;
 
     if (!migration_in_colo_state()) {
@@ -1781,8 +1785,10 @@ int qemu_save_device_state(QEMUFile *f)
         if (se->is_ram) {
             continue;
         }
-        ret = vmstate_save(f, se, NULL);
+        ret = vmstate_save(f, se, NULL, &local_err);
         if (ret) {
+            migrate_set_error(ms, local_err);
+            error_report_err(local_err);
             return ret;
         }
     }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (8 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 09/25] migration: Add Error** argument to vmstate_save() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07 12:45   ` Fabiano Rosas
                     ` (2 more replies)
  2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
                   ` (15 subsequent siblings)
  25 siblings, 3 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This prepares ground for the changes coming next which add an Error**
argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
now handle the error and fail earlier setting the migration state from
MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.

In qemu_savevm_state(), move the cleanup to preserve the error
reported by .save_setup() handlers.

Since the previous behavior was to ignore errors at this step of
migration, this change should be examined closely to check that
cleanups are still correctly done.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:
 
 - Merged cleanup change in qemu_savevm_state()
   
 Changes in v3:
 
 - Set migration state to MIGRATION_STATUS_FAILED 
 - Fixed error handling to be done under lock in bg_migration_thread()
 - Made sure an error is always set in case of failure in
   qemu_savevm_state_setup()
   
 migration/savevm.h    |  2 +-
 migration/migration.c | 27 ++++++++++++++++++++++++---
 migration/savevm.c    | 26 +++++++++++++++-----------
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/migration/savevm.h b/migration/savevm.h
index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -32,7 +32,7 @@
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_non_migratable_list(strList **reasons);
 int qemu_savevm_state_prepare(Error **errp);
-void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
 bool qemu_savevm_state_guest_unplug_pending(void);
 int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     MigThrError thr_error;
     bool urgent = false;
+    Error *local_err = NULL;
+    int ret;
 
     thread = migration_threads_add("live_migration", qemu_get_thread_id());
 
@@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
     }
 
     bql_lock();
-    qemu_savevm_state_setup(s->to_dst_file);
+    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
     bql_unlock();
 
+    if (ret) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_FAILED);
+        goto out;
+     }
+
     qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                MIGRATION_STATUS_ACTIVE);
 
@@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
     MigThrError thr_error;
     QEMUFile *fb;
     bool early_fail = true;
+    bool setup_fail = true;
+    Error *local_err = NULL;
+    int ret;
 
     rcu_register_thread();
     object_ref(OBJECT(s));
@@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
 
     bql_lock();
     qemu_savevm_state_header(s->to_dst_file);
-    qemu_savevm_state_setup(s->to_dst_file);
+    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
+    if (ret) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+        goto fail;
+    }
     bql_unlock();
 
+    setup_fail = false;
+
     qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                MIGRATION_STATUS_ACTIVE);
 
@@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
 
 fail:
     if (early_fail) {
-        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+        migrate_set_state(&s->state,
+                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
                 MIGRATION_STATUS_FAILED);
         bql_unlock();
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
     return 0;
 }
 
-void qemu_savevm_state_setup(QEMUFile *f)
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
 {
+    ERRP_GUARD();
     MigrationState *ms = migrate_get_current();
     SaveStateEntry *se;
-    Error *local_err = NULL;
     int ret = 0;
 
     json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
@@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (se->vmsd && se->vmsd->early_setup) {
-            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
+            ret = vmstate_save(f, se, ms->vmdesc, errp);
             if (ret) {
-                migrate_set_error(ms, local_err);
-                error_report_err(local_err);
+                migrate_set_error(ms, *errp);
                 qemu_file_set_error(f, ret);
                 break;
             }
@@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
         ret = se->ops->save_setup(f, se->opaque);
         save_section_footer(f, se);
         if (ret < 0) {
+            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
+                       "%d(%s): %d", se->section_id, se->idstr, ret);
             qemu_file_set_error(f, ret);
             break;
         }
     }
 
     if (ret) {
-        return;
+        return ret;
     }
 
-    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
-        error_report_err(local_err);
-    }
+    /* TODO: Should we check that errp is set in case of failure ? */
+    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
 }
 
 int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     ms->to_dst_file = f;
 
     qemu_savevm_state_header(f);
-    qemu_savevm_state_setup(f);
+    ret = qemu_savevm_state_setup(f, errp);
+    if (ret) {
+        goto cleanup;
+    }
 
     while (qemu_file_get_error(f) == 0) {
         if (qemu_savevm_state_iterate(f, false) > 0) {
@@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         qemu_savevm_state_complete_precopy(f, false, false);
         ret = qemu_file_get_error(f);
     }
-    qemu_savevm_state_cleanup();
     if (ret != 0) {
         error_setg_errno(errp, -ret, "Error while writing VM state");
     }
+cleanup:
+    qemu_savevm_state_cleanup();
 
     if (ret != 0) {
         status = MIGRATION_STATUS_FAILED;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (9 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:53   ` Vladimir Sementsov-Ogievskiy
  2024-03-06 13:34 ` [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Nicholas Piggin, Harsh Prateek Bora, Halil Pasic, Thomas Huth,
	Eric Blake, Vladimir Sementsov-Ogievskiy, John Snow,
	Stefan Hajnoczi

The purpose is to record a potential error in the migration stream if
qemu_savevm_state_setup() fails. Most of the current .save_setup()
handlers can be modified to use the Error argument instead of managing
their own and calling locally error_report().

Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: John Snow <jsnow@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v3: 

 - Made sure an error is always set in case of failure in
   qemu_savevm_state_setup()

 Changes in v2: 

 - dropped qemu_file_set_error_obj(f, ret, local_err);
 
 include/migration/register.h   |  3 ++-
 hw/ppc/spapr.c                 |  2 +-
 hw/s390x/s390-stattrib.c       |  6 ++----
 hw/vfio/migration.c            | 17 ++++++++---------
 migration/block-dirty-bitmap.c |  4 +++-
 migration/block.c              | 13 ++++---------
 migration/ram.c                | 15 ++++++++-------
 migration/savevm.c             |  4 +---
 8 files changed, 29 insertions(+), 35 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index d7b70a8be68c9df47c7843bda7d430989d7ca384..64fc7c11036c82edd6d69513e56a0216d36c17aa 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -60,10 +60,11 @@ typedef struct SaveVMHandlers {
      *
      * @f: QEMUFile where to send the data
      * @opaque: data pointer passed to register_savevm_live()
+     * @errp: pointer to Error*, to store an error if it happens.
      *
      * Returns zero to indicate success and negative for error
      */
-    int (*save_setup)(QEMUFile *f, void *opaque);
+    int (*save_setup)(QEMUFile *f, void *opaque, Error **errp);
 
     /**
      * @save_cleanup
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 55263f0815ed7671b32ea20b394ae71c82e616cb..045c024ffa76eacfc496bd486cb6cafbee2df73e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2142,7 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
     }
 };
 
-static int htab_save_setup(QEMUFile *f, void *opaque)
+static int htab_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     SpaprMachineState *spapr = opaque;
 
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index b743e8a2fee84c7374460ccea6df1cf447cda44b..bc04187b2b69226db80219da1a964a87428adc0c 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -168,19 +168,17 @@ static int cmma_load(QEMUFile *f, void *opaque, int version_id)
     return ret;
 }
 
-static int cmma_save_setup(QEMUFile *f, void *opaque)
+static int cmma_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     S390StAttribState *sas = S390_STATTRIB(opaque);
     S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
-    Error *local_err = NULL;
     int res;
     /*
      * Signal that we want to start a migration, thus needing PGSTE dirty
      * tracking.
      */
-    res = sac->set_migrationmode(sas, true, &local_err);
+    res = sac->set_migrationmode(sas, true, errp);
     if (res) {
-        error_report_err(local_err);
         return res;
     }
     qemu_put_be64(f, STATTR_FLAG_EOS);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 330b3a28548e32b0b3268072895bb5e4875766a2..3e893093ea6191fda35b7fdaddad5bff23e97a13 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -378,7 +378,7 @@ static int vfio_save_prepare(void *opaque, Error **errp)
     return 0;
 }
 
-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
@@ -392,8 +392,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
                                       stop_copy_size);
     migration->data_buffer = g_try_malloc0(migration->data_buffer_size);
     if (!migration->data_buffer) {
-        error_report("%s: Failed to allocate migration data buffer",
-                     vbasedev->name);
+        error_setg(errp, "%s: Failed to allocate migration data buffer",
+                   vbasedev->name);
         return -ENOMEM;
     }
 
@@ -403,8 +403,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
             ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
                                            VFIO_DEVICE_STATE_RUNNING);
             if (ret) {
-                error_report("%s: Failed to set new PRE_COPY state",
-                             vbasedev->name);
+                error_setg(errp, "%s: Failed to set new PRE_COPY state",
+                           vbasedev->name);
                 return ret;
             }
 
@@ -415,8 +415,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
             /* vfio_save_complete_precopy() will go to STOP_COPY */
             break;
         default:
-            error_report("%s: Invalid device state %d", vbasedev->name,
-                         migration->device_state);
+            error_setg(errp, "%s: Invalid device state %d", vbasedev->name,
+                       migration->device_state);
             return -EINVAL;
         }
     }
@@ -427,8 +427,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
-        error_report("%s: save setup failed : %s", vbasedev->name,
-                     strerror(-ret));
+        error_setg_errno(errp, -ret, "%s: save setup failed", vbasedev->name);
     }
 
     return ret;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 2708abf3d762de774ed294d3fdb8e56690d2974c..542a8c297b329abc30d1b3a205d29340fa59a961 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,12 +1213,14 @@ fail:
     return ret;
 }
 
-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     DBMSaveState *s = &((DBMState *)opaque)->save;
     SaveBitmapState *dbms = NULL;
 
     if (init_dirty_bitmap_migration(s) < 0) {
+        error_setg(errp,
+                   "Failed to initialize dirty tracking bitmap for blocks");
         return -1;
     }
 
diff --git a/migration/block.c b/migration/block.c
index aa65ec718c2875ad9d1c40c971035f14d8086a6e..86c1e0e5dd9bfcca33d711600359f12f16f99b9a 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -710,10 +710,9 @@ static void block_migration_cleanup(void *opaque)
     blk_mig_unlock();
 }
 
-static int block_save_setup(QEMUFile *f, void *opaque)
+static int block_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     int ret;
-    Error *local_err = NULL;
 
     trace_migration_block_save("setup", block_mig_state.submitted,
                                block_mig_state.transferred);
@@ -721,25 +720,21 @@ static int block_save_setup(QEMUFile *f, void *opaque)
     warn_report("block migration is deprecated;"
                 " use blockdev-mirror with NBD instead");
 
-    ret = init_blk_migration(f, &local_err);
+    ret = init_blk_migration(f, errp);
     if (ret < 0) {
-        error_report_err(local_err);
         return ret;
     }
 
     /* start track dirty blocks */
     ret = set_dirty_tracking();
     if (ret) {
-        error_setg_errno(&local_err, -ret,
-                         "Failed to start block dirty tracking");
-        error_report_err(local_err);
+        error_setg_errno(errp, -ret, "Failed to start block dirty tracking");
         return ret;
     }
 
     ret = flush_blks(f);
     if (ret) {
-        error_setg_errno(&local_err, -ret, "Flushing block failed");
-        error_report_err(local_err);
+        error_setg_errno(errp, -ret, "Flushing block failed");
         return ret;
     }
     blk_mig_reset_dirty_cursor();
diff --git a/migration/ram.c b/migration/ram.c
index 3ac7f52a5f8e2c0d78a8cf150b3fa6611e12ffcc..52ad519b305532284003d78b93dd4a7186c767af 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3049,22 +3049,23 @@ static bool mapped_ram_read_header(QEMUFile *file, MappedRamHeader *header,
  *
  * @f: QEMUFile where to send the data
  * @opaque: RAMState pointer
+ * @errp: pointer to Error*, to store an error if it happens.
  */
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     RAMState **rsp = opaque;
     RAMBlock *block;
     int ret, max_hg_page_size;
 
     if (compress_threads_save_setup()) {
-        error_report("%s: failed to start compress threads", __func__);
+        error_setg(errp, "%s: failed to start compress threads", __func__);
         return -1;
     }
 
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
         if (ram_init_all(rsp) != 0) {
-            error_report("%s: failed to setup RAM for migration", __func__);
+            error_setg(errp, "%s: failed to setup RAM for migration", __func__);
             compress_threads_save_cleanup();
             return -1;
         }
@@ -3101,14 +3102,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
     ret = rdma_registration_start(f, RAM_CONTROL_SETUP);
     if (ret < 0) {
-        error_report("%s: failed to start RDMA registration", __func__);
+        error_setg(errp, "%s: failed to start RDMA registration", __func__);
         qemu_file_set_error(f, ret);
         return ret;
     }
 
     ret = rdma_registration_stop(f, RAM_CONTROL_SETUP);
     if (ret < 0) {
-        error_report("%s: failed to stop RDMA registration", __func__);
+        error_setg(errp, "%s: failed to stop RDMA registration", __func__);
         qemu_file_set_error(f, ret);
         return ret;
     }
@@ -3120,7 +3121,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     ret = multifd_send_sync_main();
     bql_lock();
     if (ret < 0) {
-        error_report("%s: multifd synchronization failed", __func__);
+        error_setg(errp, "%s: multifd synchronization failed", __func__);
         return ret;
     }
 
@@ -3132,7 +3133,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     ret = qemu_fflush(f);
     if (ret < 0) {
-        error_report("%s failed : %s", __func__, strerror(-ret));
+        error_setg_errno(errp, -ret, "%s failed", __func__);
     }
     return ret;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 63fdbb5ad7d4dbfaef1d2094350bf302cc677602..52d35b2a72c6238bfe5dcb4d81c1af8d2bf73013 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1342,11 +1342,9 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
         }
         save_section_header(f, se, QEMU_VM_SECTION_START);
 
-        ret = se->ops->save_setup(f, se->opaque);
+        ret = se->ops->save_setup(f, se->opaque, errp);
         save_section_footer(f, se);
         if (ret < 0) {
-            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
-                       "%d(%s): %d", se->section_id, se->idstr, ret);
             qemu_file_set_error(f, ret);
             break;
         }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (10 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This will be useful to report errors at a higher level, mostly in VFIO
today.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v3:

 - ERRP_GUARD() because of error_prepend use 
 - Made sure an error is always set in case of failure in
   vfio_load_setup()

 include/migration/register.h |  3 ++-
 hw/vfio/migration.c          |  9 +++++++--
 migration/ram.c              |  3 ++-
 migration/savevm.c           | 11 +++++++----
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 64fc7c11036c82edd6d69513e56a0216d36c17aa..f60e797894e5faacdf55d2d6de175074ac58944f 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -234,10 +234,11 @@ typedef struct SaveVMHandlers {
      *
      * @f: QEMUFile where to receive the data
      * @opaque: data pointer passed to register_savevm_live()
+     * @errp: pointer to Error*, to store an error if it happens.
      *
      * Returns zero to indicate success and negative for error
      */
-    int (*load_setup)(QEMUFile *f, void *opaque);
+    int (*load_setup)(QEMUFile *f, void *opaque, Error **errp);
 
     /**
      * @load_cleanup
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3e893093ea6191fda35b7fdaddad5bff23e97a13..a3bb1a92ba0b9c2c585efe54cfda0b774a81dcb9 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -588,12 +588,17 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
     }
 }
 
-static int vfio_load_setup(QEMUFile *f, void *opaque)
+static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     VFIODevice *vbasedev = opaque;
+    int ret;
 
-    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
                                    vbasedev->migration->device_state);
+    if (ret) {
+        error_setg(errp, "%s: Failed to set RESUMING state", vbasedev->name);
+    }
+    return ret;
 }
 
 static int vfio_load_cleanup(void *opaque)
diff --git a/migration/ram.c b/migration/ram.c
index 52ad519b305532284003d78b93dd4a7186c767af..c5149b7d717aefad7f590422af0ea4a40e7507be 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3678,8 +3678,9 @@ void colo_release_ram_cache(void)
  *
  * @f: QEMUFile where to receive the data
  * @opaque: RAMState pointer
+ * @errp: pointer to Error*, to store an error if it happens.
  */
-static int ram_load_setup(QEMUFile *f, void *opaque)
+static int ram_load_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     xbzrle_load_setup();
     ramblock_recv_map_init();
diff --git a/migration/savevm.c b/migration/savevm.c
index 52d35b2a72c6238bfe5dcb4d81c1af8d2bf73013..ed0d1f31bee9b671698a75c29ab448ee2812685d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2750,8 +2750,9 @@ static void qemu_loadvm_state_switchover_ack_needed(MigrationIncomingState *mis)
     trace_loadvm_state_switchover_ack_needed(mis->switchover_ack_pending_num);
 }
 
-static int qemu_loadvm_state_setup(QEMUFile *f)
+static int qemu_loadvm_state_setup(QEMUFile *f, Error **errp)
 {
+    ERRP_GUARD(); /* error_prepend use */
     SaveStateEntry *se;
     int ret;
 
@@ -2766,10 +2767,11 @@ static int qemu_loadvm_state_setup(QEMUFile *f)
             }
         }
 
-        ret = se->ops->load_setup(f, se->opaque);
+        ret = se->ops->load_setup(f, se->opaque, errp);
         if (ret < 0) {
+            error_prepend(errp, "Load state of device %s failed: ",
+                          se->idstr);
             qemu_file_set_error(f, ret);
-            error_report("Load state of device %s failed", se->idstr);
             return ret;
         }
     }
@@ -2950,7 +2952,8 @@ int qemu_loadvm_state(QEMUFile *f)
         return ret;
     }
 
-    if (qemu_loadvm_state_setup(f) != 0) {
+    if (qemu_loadvm_state_setup(f, &local_err) != 0) {
+        error_report_err(local_err);
         return -EINVAL;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (11 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-15 11:18   ` Peter Xu
  2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Stefano Stabellini, Anthony Perard, Paul Durrant,
	Michael S. Tsirkin, Paolo Bonzini, David Hildenbrand

Modify all .log_global_start() handlers to take an Error** parameter
and return a bool. Adapt memory_global_dirty_log_start() to interrupt
on the first error the loop on handlers. In such case, a rollback is
performed to stop dirty logging on all listeners where it was
previously enabled.

Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Dropped log_global_stop() and log_global_sync() changes
 - Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL 
 - Modified memory_global_dirty_log_start() to loop on the list of
   listeners and handle errors directly.
 - Introduced memory_global_dirty_log_rollback() to revert operations
   previously done
   
 Changes in v3: 

 - Fixed return value of vfio_listener_log_global_start() and
   vfio_listener_log_global_stop(). Went unnoticed because not tested.
   
 include/exec/memory.h |  5 ++++-
 hw/i386/xen/xen-hvm.c |  3 ++-
 hw/vfio/common.c      |  4 +++-
 hw/virtio/vhost.c     |  3 ++-
 system/memory.c       | 43 +++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8626a355b310ed7b1a1db7978ba4b394032c2f15..5555567bc4c9fdb53e8f63487f1400980275687d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -998,8 +998,11 @@ struct MemoryListener {
      * active at that time.
      *
      * @listener: The #MemoryListener.
+     * @errp: pointer to Error*, to store an error if it happens.
+     *
+     * Return: true on success, else false setting @errp with error.
      */
-    void (*log_global_start)(MemoryListener *listener);
+    bool (*log_global_start)(MemoryListener *listener, Error **errp);
 
     /**
      * @log_global_stop:
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index f42621e6742552035122ea58092c91c3458338ff..0608ca99f5166fd6379ee674442484e805eff9c0 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -446,11 +446,12 @@ static void xen_log_sync(MemoryListener *listener, MemoryRegionSection *section)
                           int128_get64(section->size));
 }
 
-static void xen_log_global_start(MemoryListener *listener)
+static bool xen_log_global_start(MemoryListener *listener, Error **errp)
 {
     if (xen_enabled()) {
         xen_in_migration = true;
     }
+    return true;
 }
 
 static void xen_log_global_stop(MemoryListener *listener)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ff88c3f31ca660b3c0a790601100fdc6116192a0..800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1075,7 +1075,8 @@ out:
     return ret;
 }
 
-static void vfio_listener_log_global_start(MemoryListener *listener)
+static bool vfio_listener_log_global_start(MemoryListener *listener,
+                                           Error **errp)
 {
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
@@ -1092,6 +1093,7 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
                      ret, strerror(-ret));
         vfio_set_migration_error(ret);
     }
+    return !ret;
 }
 
 static void vfio_listener_log_global_stop(MemoryListener *listener)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac794680ea9b65eba6cc22e70cf141e90aa73..030b24db9270fc272024db3ff60a6cc449fba1ca 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1044,7 +1044,7 @@ check_dev_state:
     return r;
 }
 
-static void vhost_log_global_start(MemoryListener *listener)
+static bool vhost_log_global_start(MemoryListener *listener, Error **errp)
 {
     int r;
 
@@ -1052,6 +1052,7 @@ static void vhost_log_global_start(MemoryListener *listener)
     if (r < 0) {
         abort();
     }
+    return true;
 }
 
 static void vhost_log_global_stop(MemoryListener *listener)
diff --git a/system/memory.c b/system/memory.c
index a229a79988fce2aa3cb77e3a130db4c694e8cd49..3600e716149407c10a1f6bf8f0a81c2611cf15ba 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2914,9 +2914,27 @@ static unsigned int postponed_stop_flags;
 static VMChangeStateEntry *vmstate_change;
 static void memory_global_dirty_log_stop_postponed_run(void);
 
+/*
+ * Stop dirty logging on all listeners where it was previously enabled.
+ */
+static void memory_global_dirty_log_rollback(MemoryListener *listener,
+                                             unsigned int flags)
+{
+    global_dirty_tracking &= ~flags;
+    trace_global_dirty_changed(global_dirty_tracking);
+
+    while (listener) {
+        if (listener->log_global_stop) {
+            listener->log_global_stop(listener);
+        }
+        listener = QTAILQ_PREV(listener, link);
+    }
+}
+
 void memory_global_dirty_log_start(unsigned int flags)
 {
     unsigned int old_flags;
+    Error *local_err = NULL;
 
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
 
@@ -2936,7 +2954,25 @@ void memory_global_dirty_log_start(unsigned int flags)
     trace_global_dirty_changed(global_dirty_tracking);
 
     if (!old_flags) {
-        MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
+        MemoryListener *listener;
+        bool ret = true;
+
+        QTAILQ_FOREACH(listener, &memory_listeners, link) {
+            if (listener->log_global_start) {
+                ret = listener->log_global_start(listener, &local_err);
+                if (!ret) {
+                    break;
+                }
+            }
+        }
+
+        if (!ret) {
+            memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
+                                             flags);
+            error_report_err(local_err);
+            return;
+        }
+
         memory_region_transaction_begin();
         memory_region_update_pending = true;
         memory_region_transaction_commit();
@@ -3009,13 +3045,16 @@ static void listener_add_address_space(MemoryListener *listener,
 {
     FlatView *view;
     FlatRange *fr;
+    Error *local_err = NULL;
 
     if (listener->begin) {
         listener->begin(listener);
     }
     if (global_dirty_tracking) {
         if (listener->log_global_start) {
-            listener->log_global_start(listener);
+            if (!listener->log_global_start(listener, &local_err)) {
+                error_report_err(local_err);
+            }
         }
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (12 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-15 11:34   ` Peter Xu
  2024-03-16  2:41   ` Yong Huang
  2024-03-06 13:34 ` [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
                   ` (11 subsequent siblings)
  25 siblings, 2 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Stefano Stabellini, Anthony Perard, Paul Durrant,
	Michael S. Tsirkin, Paolo Bonzini, David Hildenbrand,
	Hyman Huang

Now that the log_global*() handlers take an Error** parameter and
return a bool, do the same for memory_global_dirty_log_start() and
memory_global_dirty_log_stop(). The error is reported in the callers
for now and it will be propagated in the call stack in the next
changes.

To be noted a functional change in ram_init_bitmaps(), if the dirty
pages logger fails to start, there is no need to synchronize the dirty
pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
similar way.

Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Dropped log_global_stop() and log_global_sync() changes
 
 include/exec/memory.h |  5 ++++-
 hw/i386/xen/xen-hvm.c |  2 +-
 migration/dirtyrate.c | 13 +++++++++++--
 migration/ram.c       | 22 ++++++++++++++++++++--
 system/memory.c       | 11 +++++------
 5 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2570,8 +2570,11 @@ void memory_listener_unregister(MemoryListener *listener);
  * memory_global_dirty_log_start: begin dirty logging for all regions
  *
  * @flags: purpose of starting dirty log, migration or dirty rate
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return: true on success, else false setting @errp with error.
  */
-void memory_global_dirty_log_start(unsigned int flags);
+bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
 
 /**
  * memory_global_dirty_log_stop: end dirty logging for all regions
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 0608ca99f5166fd6379ee674442484e805eff9c0..57cb7df50788a6c31eff68c95e8eaa856fdebede 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -654,7 +654,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
 void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
 {
     if (enable) {
-        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
     } else {
         memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
     }
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -90,9 +90,15 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
 
 void global_dirty_log_change(unsigned int flag, bool start)
 {
+    Error *local_err = NULL;
+    bool ret;
+
     bql_lock();
     if (start) {
-        memory_global_dirty_log_start(flag);
+        ret = memory_global_dirty_log_start(flag, &local_err);
+        if (!ret) {
+            error_report_err(local_err);
+        }
     } else {
         memory_global_dirty_log_stop(flag);
     }
@@ -608,9 +614,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
 {
     int64_t start_time;
     DirtyPageRecord dirty_pages;
+    Error *local_err = NULL;
 
     bql_lock();
-    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
+    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE, &local_err)) {
+        error_report_err(local_err);
+    }
 
     /*
      * 1'round of log sync may return all 1 bits with
diff --git a/migration/ram.c b/migration/ram.c
index c5149b7d717aefad7f590422af0ea4a40e7507be..397b4c0f218a66d194e44f9c5f9fe8e9885c48b6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
 
 static void ram_init_bitmaps(RAMState *rs)
 {
+    Error *local_err = NULL;
+    bool ret = true;
+
     qemu_mutex_lock_ramlist();
 
     WITH_RCU_READ_LOCK_GUARD() {
         ram_list_init_bitmaps();
         /* We don't use dirty log with background snapshots */
         if (!migrate_background_snapshot()) {
-            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
+                                                &local_err);
+            if (!ret) {
+                error_report_err(local_err);
+                goto out_unlock;
+            }
             migration_bitmap_sync_precopy(rs, false);
         }
     }
+out_unlock:
     qemu_mutex_unlock_ramlist();
 
+    if (!ret) {
+        return;
+    }
+
     /*
      * After an eventual first bitmap sync, fixup the initial bitmap
      * containing all 1s to exclude any discarded pages from migration.
@@ -3631,6 +3644,8 @@ int colo_init_ram_cache(void)
 void colo_incoming_start_dirty_log(void)
 {
     RAMBlock *block = NULL;
+    Error *local_err = NULL;
+
     /* For memory_global_dirty_log_start below. */
     bql_lock();
     qemu_mutex_lock_ramlist();
@@ -3642,7 +3657,10 @@ void colo_incoming_start_dirty_log(void)
             /* Discard this dirty bitmap record */
             bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
         }
-        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
+        if (!memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
+                                           &local_err)) {
+            error_report_err(local_err);
+        }
     }
     ram_state->migration_dirty_pages = 0;
     qemu_mutex_unlock_ramlist();
diff --git a/system/memory.c b/system/memory.c
index 3600e716149407c10a1f6bf8f0a81c2611cf15ba..cbc098216b789f50460f1d1bc7ec122030693d9e 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2931,10 +2931,9 @@ static void memory_global_dirty_log_rollback(MemoryListener *listener,
     }
 }
 
-void memory_global_dirty_log_start(unsigned int flags)
+bool memory_global_dirty_log_start(unsigned int flags, Error **errp)
 {
     unsigned int old_flags;
-    Error *local_err = NULL;
 
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
 
@@ -2946,7 +2945,7 @@ void memory_global_dirty_log_start(unsigned int flags)
 
     flags &= ~global_dirty_tracking;
     if (!flags) {
-        return;
+        return true;
     }
 
     old_flags = global_dirty_tracking;
@@ -2959,7 +2958,7 @@ void memory_global_dirty_log_start(unsigned int flags)
 
         QTAILQ_FOREACH(listener, &memory_listeners, link) {
             if (listener->log_global_start) {
-                ret = listener->log_global_start(listener, &local_err);
+                ret = listener->log_global_start(listener, errp);
                 if (!ret) {
                     break;
                 }
@@ -2969,14 +2968,14 @@ void memory_global_dirty_log_start(unsigned int flags)
         if (!ret) {
             memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
                                              flags);
-            error_report_err(local_err);
-            return;
+            return false;
         }
 
         memory_region_transaction_begin();
         memory_region_update_pending = true;
         memory_region_transaction_commit();
     }
+    return true;
 }
 
 static void memory_global_dirty_log_do_stop(unsigned int flags)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (13 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

The .save_setup() handler has now an Error** argument that we can use
to propagate errors reported by the .log_global_start() handler. Do
that for the RAM. The caller qemu_savevm_state_setup() will store the
error under the migration stream for later detection in the migration
sequence.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 migration/ram.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 397b4c0f218a66d194e44f9c5f9fe8e9885c48b6..1e48eee769d314321e31ea71855f4b49a78b6a13 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2834,9 +2834,8 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
     }
 }
 
-static void ram_init_bitmaps(RAMState *rs)
+static bool ram_init_bitmaps(RAMState *rs, Error **errp)
 {
-    Error *local_err = NULL;
     bool ret = true;
 
     qemu_mutex_lock_ramlist();
@@ -2845,10 +2844,8 @@ static void ram_init_bitmaps(RAMState *rs)
         ram_list_init_bitmaps();
         /* We don't use dirty log with background snapshots */
         if (!migrate_background_snapshot()) {
-            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
-                                                &local_err);
+            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
             if (!ret) {
-                error_report_err(local_err);
                 goto out_unlock;
             }
             migration_bitmap_sync_precopy(rs, false);
@@ -2858,7 +2855,7 @@ out_unlock:
     qemu_mutex_unlock_ramlist();
 
     if (!ret) {
-        return;
+        return false;
     }
 
     /*
@@ -2866,9 +2863,10 @@ out_unlock:
      * containing all 1s to exclude any discarded pages from migration.
      */
     migration_bitmap_clear_discarded_pages(rs);
+    return true;
 }
 
-static int ram_init_all(RAMState **rsp)
+static int ram_init_all(RAMState **rsp, Error **errp)
 {
     if (ram_state_init(rsp)) {
         return -1;
@@ -2879,7 +2877,9 @@ static int ram_init_all(RAMState **rsp)
         return -1;
     }
 
-    ram_init_bitmaps(*rsp);
+    if (!ram_init_bitmaps(*rsp, errp)) {
+        return -1;
+    }
 
     return 0;
 }
@@ -3077,8 +3077,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
 
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
-        if (ram_init_all(rsp) != 0) {
-            error_setg(errp, "%s: failed to setup RAM for migration", __func__);
+        if (ram_init_all(rsp, errp) != 0) {
             compress_threads_save_cleanup();
             return -1;
         }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (14 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  8:09   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

We will use the Error object to improve error reporting in the
.log_global*() handlers of VFIO. Add documentation while at it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v3:

 - Use error_setg_errno() in vfio_legacy_set_dirty_page_tracking()
 
 include/hw/vfio/vfio-container-base.h | 18 ++++++++++++++++--
 hw/vfio/common.c                      |  4 ++--
 hw/vfio/container-base.c              |  4 ++--
 hw/vfio/container.c                   |  6 +++---
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 3582d5f97a37877b2adfc0d0b06996c82403f8b7..c76984654a596e3016a8cf833e10143eb872e102 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -82,7 +82,7 @@ int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
 void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
                                        MemoryRegionSection *section);
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
-                                           bool start);
+                                           bool start, Error **errp);
 int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                       VFIOBitmap *vbmap,
                                       hwaddr iova, hwaddr size);
@@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
     int (*attach_device)(const char *name, VFIODevice *vbasedev,
                          AddressSpace *as, Error **errp);
     void (*detach_device)(VFIODevice *vbasedev);
+
     /* migration feature */
+
+    /**
+     * @set_dirty_page_tracking
+     *
+     * Start or stop dirty pages tracking on VFIO container
+     *
+     * @bcontainer: #VFIOContainerBase on which to de/activate dirty
+     *              pages tracking
+     * @start: indicates whether to start or stop dirty pages tracking
+     * @errp: pointer to Error*, to store an error if it happens.
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
-                                   bool start);
+                                   bool start, Error **errp);
     int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
                               VFIOBitmap *vbmap,
                               hwaddr iova, hwaddr size);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4..5598a508399a6c0b3a20ba17311cbe83d84250c5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1085,7 +1085,7 @@ static bool vfio_listener_log_global_start(MemoryListener *listener,
     if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
         ret = vfio_devices_dma_logging_start(bcontainer);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
     }
 
     if (ret) {
@@ -1105,7 +1105,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
         vfio_devices_dma_logging_stop(bcontainer);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
     }
 
     if (ret) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 913ae49077c4f09b7b27517c1231cfbe4befb7fb..7c0764121d24b02b6c4e66e368d7dff78a6d65aa 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -53,14 +53,14 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
 }
 
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
-                                           bool start)
+                                           bool start, Error **errp)
 {
     if (!bcontainer->dirty_pages_supported) {
         return 0;
     }
 
     g_assert(bcontainer->ops->set_dirty_page_tracking);
-    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
+    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start, errp);
 }
 
 int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 096d77eac3946a9c38fc2a98116b93353f71f06e..6524575aeddcea8470b5fd10caf57475088d1813 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -210,7 +210,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
 
 static int
 vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
-                                    bool start)
+                                    bool start, Error **errp)
 {
     const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                                   bcontainer);
@@ -228,8 +228,8 @@ vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
     if (ret) {
         ret = -errno;
-        error_report("Failed to set dirty tracking flag 0x%x errno: %d",
-                     dirty.flags, errno);
+        error_setg_errno(errp, errno, "Failed to set dirty tracking flag 0x%x",
+                         dirty.flags);
     }
 
     return ret;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (15 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  8:15   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This allows to update the Error argument of the VFIO log_global_start()
handler. Errors detected when device level logging is started will be
propagated up to qemu_savevm_state_setup() when the ram save_setup()
handler is executed.

The vfio_set_migration_error() call becomes redundant. Remove it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Dropped log_global_stop() and log_global_sync() changes
   
 Changes in v3:

 - Use error_setg_errno() in vfio_devices_dma_logging_start() 
 - ERRP_GUARD() because of error_prepend use in
   vfio_listener_log_global_start()
   
 hw/vfio/common.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5598a508399a6c0b3a20ba17311cbe83d84250c5..d6790557da2f2890398fa03dbbef18129cd2c1bb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1036,7 +1036,8 @@ static void vfio_device_feature_dma_logging_start_destroy(
     g_free(feature);
 }
 
-static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
+static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
+                                          Error **errp)
 {
     struct vfio_device_feature *feature;
     VFIODirtyRanges ranges;
@@ -1058,8 +1059,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
         ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
         if (ret) {
             ret = -errno;
-            error_report("%s: Failed to start DMA logging, err %d (%s)",
-                         vbasedev->name, ret, strerror(errno));
+            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
+                             vbasedev->name);
             goto out;
         }
         vbasedev->dirty_tracking = true;
@@ -1078,20 +1079,19 @@ out:
 static bool vfio_listener_log_global_start(MemoryListener *listener,
                                            Error **errp)
 {
+    ERRP_GUARD(); /* error_prepend use */
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
     int ret;
 
     if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
-        ret = vfio_devices_dma_logging_start(bcontainer);
+        ret = vfio_devices_dma_logging_start(bcontainer, errp);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, errp);
     }
 
     if (ret) {
-        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
-                     ret, strerror(-ret));
-        vfio_set_migration_error(ret);
+        error_prepend(errp, "vfio: Could not start dirty page tracking - ");
     }
     return !ret;
 }
@@ -1100,17 +1100,20 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
 {
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
+    Error *local_err = NULL;
     int ret = 0;
 
     if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
         vfio_devices_dma_logging_stop(bcontainer);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
+                                                     &local_err);
     }
 
     if (ret) {
-        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
-                     ret, strerror(-ret));
+        error_prepend(&local_err,
+                      "vfio: Could not stop dirty page tracking - ");
+        error_report_err(local_err);
         vfio_set_migration_error(ret);
     }
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (16 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  8:53   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

This improves error reporting in the log_global_stop() VFIO handler.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Dropped log_global_stop() and log_global_sync() changes
   
 Changes in v3:

 - Use error_setg_errno() in vfio_devices_dma_logging_stop() 
 
 hw/vfio/common.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d6790557da2f2890398fa03dbbef18129cd2c1bb..5b2e6a179cdd5f8ca5be84b7097661e96b391456 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -938,12 +938,14 @@ static void vfio_dirty_tracking_init(VFIOContainerBase *bcontainer,
     memory_listener_unregister(&dirty.listener);
 }
 
-static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
+static int vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer,
+                                          Error **errp)
 {
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
                               sizeof(uint64_t))] = {};
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
     VFIODevice *vbasedev;
+    int ret = 0;
 
     feature->argsz = sizeof(buf);
     feature->flags = VFIO_DEVICE_FEATURE_SET |
@@ -955,11 +957,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
         }
 
         if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-            warn_report("%s: Failed to stop DMA logging, err %d (%s)",
-                        vbasedev->name, -errno, strerror(errno));
+            /* Keep first error */
+            if (!ret) {
+                ret = -errno;
+                error_setg_errno(errp, errno, "%s: Failed to stop DMA logging",
+                                 vbasedev->name);
+            }
         }
         vbasedev->dirty_tracking = false;
     }
+
+    return ret;
 }
 
 static struct vfio_device_feature *
@@ -1068,7 +1076,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
 
 out:
     if (ret) {
-        vfio_devices_dma_logging_stop(bcontainer);
+        /* Ignore the potential errors when doing rollback */
+        vfio_devices_dma_logging_stop(bcontainer, NULL);
     }
 
     vfio_device_feature_dma_logging_start_destroy(feature);
@@ -1104,7 +1113,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     int ret = 0;
 
     if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
-        vfio_devices_dma_logging_stop(bcontainer);
+        ret = vfio_devices_dma_logging_stop(bcontainer, &local_err);
     } else {
         ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
                                                      &local_err);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (17 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:04   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Add an Error** argument to vfio_migration_set_state() and adjust
callers, including vfio_save_setup(). The error will be propagated up
to qemu_savevm_state_setup() where the save_setup() handler is
executed.

Modify vfio_vmstate_change_prepare() and vfio_vmstate_change() to
store a reported error under the migration stream if a migration is in
progress.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v3:

 - Use error_setg_errno() in vfio_save_setup() 
 - Made sure an error is always set in case of failure in
   vfio_load_setup()
   
 hw/vfio/migration.c | 67 ++++++++++++++++++++++++++-------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a3bb1a92ba0b9c2c585efe54cfda0b774a81dcb9..71ade14a7942358094371a86c00718f5979113ea 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -84,7 +84,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
 
 static int vfio_migration_set_state(VFIODevice *vbasedev,
                                     enum vfio_device_mig_state new_state,
-                                    enum vfio_device_mig_state recover_state)
+                                    enum vfio_device_mig_state recover_state,
+                                    Error **errp)
 {
     VFIOMigration *migration = vbasedev->migration;
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
@@ -104,15 +105,15 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
         ret = -errno;
 
         if (recover_state == VFIO_DEVICE_STATE_ERROR) {
-            error_report("%s: Failed setting device state to %s, err: %s. "
-                         "Recover state is ERROR. Resetting device",
-                         vbasedev->name, mig_state_to_str(new_state),
-                         strerror(errno));
+            error_setg(errp, "%s: Failed setting device state to %s, err: %s. "
+                       "Recover state is ERROR. Resetting device",
+                       vbasedev->name, mig_state_to_str(new_state),
+                       strerror(errno));
 
             goto reset_device;
         }
 
-        error_report(
+        error_setg(errp,
             "%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
                      vbasedev->name, mig_state_to_str(new_state),
                      strerror(errno), mig_state_to_str(recover_state));
@@ -120,7 +121,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
         mig_state->device_state = recover_state;
         if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
             ret = -errno;
-            error_report(
+            error_setg(errp,
                 "%s: Failed setting device in recover state, err: %s. Resetting device",
                          vbasedev->name, strerror(errno));
 
@@ -139,7 +140,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
              * This can happen if the device is asynchronously reset and
              * terminates a data transfer.
              */
-            error_report("%s: data_fd out of sync", vbasedev->name);
+            error_setg(errp, "%s: data_fd out of sync", vbasedev->name);
             close(mig_state->data_fd);
 
             return -EBADF;
@@ -170,10 +171,11 @@ reset_device:
  */
 static int
 vfio_migration_set_state_or_reset(VFIODevice *vbasedev,
-                                  enum vfio_device_mig_state new_state)
+                                  enum vfio_device_mig_state new_state,
+                                  Error **errp)
 {
     return vfio_migration_set_state(vbasedev, new_state,
-                                    VFIO_DEVICE_STATE_ERROR);
+                                    VFIO_DEVICE_STATE_ERROR, errp);
 }
 
 static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
@@ -401,10 +403,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
         switch (migration->device_state) {
         case VFIO_DEVICE_STATE_RUNNING:
             ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
-                                           VFIO_DEVICE_STATE_RUNNING);
+                                           VFIO_DEVICE_STATE_RUNNING, errp);
             if (ret) {
-                error_setg(errp, "%s: Failed to set new PRE_COPY state",
-                           vbasedev->name);
                 return ret;
             }
 
@@ -437,13 +437,20 @@ static void vfio_save_cleanup(void *opaque)
 {
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
+    Error *local_err = NULL;
+    int ret;
 
     /*
      * Changing device state from STOP_COPY to STOP can take time. Do it here,
      * after migration has completed, so it won't increase downtime.
      */
     if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
-        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_STOP);
+        ret = vfio_migration_set_state_or_reset(vbasedev,
+                                                VFIO_DEVICE_STATE_STOP,
+                                                &local_err);
+        if (ret) {
+            error_report_err(local_err);
+        }
     }
 
     g_free(migration->data_buffer);
@@ -549,11 +556,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     VFIODevice *vbasedev = opaque;
     ssize_t data_size;
     int ret;
+    Error *local_err = NULL;
 
     /* We reach here with device state STOP or STOP_COPY only */
     ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
-                                   VFIO_DEVICE_STATE_STOP);
+                                   VFIO_DEVICE_STATE_STOP, &local_err);
     if (ret) {
+        error_report_err(local_err);
         return ret;
     }
 
@@ -591,14 +600,9 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
 static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     VFIODevice *vbasedev = opaque;
-    int ret;
 
-    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
-                                   vbasedev->migration->device_state);
-    if (ret) {
-        error_setg(errp, "%s: Failed to set RESUMING state", vbasedev->name);
-    }
-    return ret;
+    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+                                    vbasedev->migration->device_state, errp);
 }
 
 static int vfio_load_cleanup(void *opaque)
@@ -714,20 +718,22 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running,
     VFIODevice *vbasedev = opaque;
     VFIOMigration *migration = vbasedev->migration;
     enum vfio_device_mig_state new_state;
+    Error *local_err = NULL;
     int ret;
 
     new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
                     VFIO_DEVICE_STATE_PRE_COPY_P2P :
                     VFIO_DEVICE_STATE_RUNNING_P2P;
 
-    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
+    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
     if (ret) {
         /*
          * Migration should be aborted in this case, but vm_state_notify()
          * currently does not support reporting failures.
          */
         if (migrate_get_current()->to_dst_file) {
-            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
+                                    local_err);
         }
     }
 
@@ -740,6 +746,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 {
     VFIODevice *vbasedev = opaque;
     enum vfio_device_mig_state new_state;
+    Error *local_err = NULL;
     int ret;
 
     if (running) {
@@ -752,14 +759,15 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
                 VFIO_DEVICE_STATE_STOP;
     }
 
-    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
+    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
     if (ret) {
         /*
          * Migration should be aborted in this case, but vm_state_notify()
          * currently does not support reporting failures.
          */
         if (migrate_get_current()->to_dst_file) {
-            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
+                                    local_err);
         }
     }
 
@@ -773,13 +781,16 @@ static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
     VFIOMigration *migration = container_of(notifier, VFIOMigration,
                                             migration_state);
     VFIODevice *vbasedev = migration->vbasedev;
+    int ret = 0;
 
     trace_vfio_migration_state_notifier(vbasedev->name, e->type);
 
     if (e->type == MIG_EVENT_PRECOPY_FAILED) {
-        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
+        ret = vfio_migration_set_state_or_reset(vbasedev,
+                                                VFIO_DEVICE_STATE_RUNNING,
+                                                errp);
     }
-    return 0;
+    return ret;
 }
 
 static void vfio_migration_free(VFIODevice *vbasedev)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (18 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:13   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Use vmstate_save_state_with_err() to improve error reporting in the
callers and store a reported error under the migration stream. Add
documentation while at it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h | 25 ++++++++++++++++++++++++-
 hw/vfio/migration.c           | 18 ++++++++++++------
 hw/vfio/pci.c                 |  5 +++--
 3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9da6c08ef41174610eb92726c590309a53696a3..46f88493634b5634a9c14a5caa33a463fbf2c50d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -133,7 +133,30 @@ struct VFIODeviceOps {
     int (*vfio_hot_reset_multi)(VFIODevice *vdev);
     void (*vfio_eoi)(VFIODevice *vdev);
     Object *(*vfio_get_object)(VFIODevice *vdev);
-    void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
+
+    /**
+     * @vfio_save_config
+     *
+     * Save device config state
+     *
+     * @vdev: #VFIODevice for which to save the config
+     * @f: #QEMUFile where to send the data
+     * @errp: pointer to Error*, to store an error if it happens.
+     *
+     * Returns zero to indicate success and negative for error
+     */
+    int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error **errp);
+
+    /**
+     * @vfio_load_config
+     *
+     * Load device config state
+     *
+     * @vdev: #VFIODevice for which to load the config
+     * @f: #QEMUFile where to get the data
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
 };
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 71ade14a7942358094371a86c00718f5979113ea..bd48f2ee472a5230c2c84bff829dae1e217db33f 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -190,14 +190,19 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
     return ret;
 }
 
-static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
+static int vfio_save_device_config_state(QEMUFile *f, void *opaque,
+                                         Error **errp)
 {
     VFIODevice *vbasedev = opaque;
+    int ret;
 
     qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE);
 
     if (vbasedev->ops && vbasedev->ops->vfio_save_config) {
-        vbasedev->ops->vfio_save_config(vbasedev, f);
+        ret = vbasedev->ops->vfio_save_config(vbasedev, f, errp);
+        if (ret) {
+            return ret;
+        }
     }
 
     qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
@@ -587,13 +592,14 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
 static void vfio_save_state(QEMUFile *f, void *opaque)
 {
     VFIODevice *vbasedev = opaque;
+    Error *local_err = NULL;
     int ret;
 
-    ret = vfio_save_device_config_state(f, opaque);
+    ret = vfio_save_device_config_state(f, opaque, &local_err);
     if (ret) {
-        error_report("%s: Failed to save device config space",
-                     vbasedev->name);
-        qemu_file_set_error(f, ret);
+        error_prepend(&local_err, "%s: Failed to save device config space",
+                      vbasedev->name);
+        qemu_file_set_error_obj(f, ret, local_err);
     }
 }
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4fa387f0430d62ca2ba1b5ae5b7037f8f06b33f9..99d86e1d40ef25133fc76ad6e58294b07bd20843 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2585,11 +2585,12 @@ const VMStateDescription vmstate_vfio_pci_config = {
     }
 };
 
-static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f, Error **errp)
 {
     VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 
-    vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL);
+    return vmstate_save_state_with_err(f, &vmstate_vfio_pci_config, vdev, NULL,
+                                       errp);
 }
 
 static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (19 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-06 20:51   ` Philippe Mathieu-Daudé
  2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

It will simplify the changes coming after.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/common.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5b2e6a179cdd5f8ca5be84b7097661e96b391456..6820d2efe4923d5043da7eb8deecb6ff20e1fd16 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1241,16 +1241,20 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     }
 
     rcu_read_lock();
-    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-        ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
-                                    translated_addr);
-        if (ret) {
-            error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%s)",
-                         bcontainer, iova, iotlb->addr_mask + 1, ret,
-                         strerror(-ret));
-        }
+    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+        goto out_lock;
     }
+
+    ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
+                                translated_addr);
+    if (ret) {
+        error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
+                     "0x%"HWADDR_PRIx") = %d (%s)",
+                     bcontainer, iova, iotlb->addr_mask + 1, ret,
+                     strerror(-ret));
+    }
+
+out_lock:
     rcu_read_unlock();
 
 out:
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (20 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-15 15:06   ` Peter Xu
  2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Michael S. Tsirkin, Paolo Bonzini, David Hildenbrand

Let the callers do the reporting. This will be useful in
vfio_iommu_map_dirty_notify().

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/exec/memory.h  | 15 ++++++++++++++-
 hw/vfio/common.c       | 13 +++++++++----
 hw/virtio/vhost-vdpa.c |  5 ++++-
 system/memory.c        | 10 +++++-----
 4 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c129ee6db7162504bd72d4cfc69b5affb2cd87e8..14b6c99765428ec399e4a9ed54ecc3e2f691d29c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -771,9 +771,22 @@ void ram_discard_manager_register_listener(RamDiscardManager *rdm,
 void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
                                              RamDiscardListener *rdl);
 
+/**
+ * memory_get_xlat_addr: Extract addresses from a TLB entry
+ *
+ * @iotlb: pointer to an #IOMMUTLBEntry
+ * @vaddr: virtual addressf
+ * @ram_addr: RAM address
+ * @read_only: indicates if writes are allowed
+ * @mr_has_discard_manager: indicates memory is controlled by a
+ *                          RamDiscardManager
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return: true on success, else false setting @errp with error.
+ */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
-                          bool *mr_has_discard_manager);
+                          bool *mr_has_discard_manager, Error **errp);
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6820d2efe4923d5043da7eb8deecb6ff20e1fd16..496e5adaf8f18e9ae7e86dd69be0b9e71e86404f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -262,12 +262,13 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
-                               ram_addr_t *ram_addr, bool *read_only)
+                               ram_addr_t *ram_addr, bool *read_only,
+                               Error **errp)
 {
     bool ret, mr_has_discard_manager;
 
     ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
-                               &mr_has_discard_manager);
+                               &mr_has_discard_manager, errp);
     if (ret && mr_has_discard_manager) {
         /*
          * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
@@ -297,6 +298,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     void *vaddr;
     int ret;
+    Error *local_err = NULL;
 
     trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : "MAP",
                                 iova, iova + iotlb->addr_mask);
@@ -313,7 +315,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
         bool read_only;
 
-        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
+        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &local_err)) {
+            error_report_err(local_err);
             goto out;
         }
         /*
@@ -1230,6 +1233,7 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     VFIOContainerBase *bcontainer = giommu->bcontainer;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     ram_addr_t translated_addr;
+    Error *local_err = NULL;
     int ret = -EINVAL;
 
     trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
@@ -1241,7 +1245,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     }
 
     rcu_read_lock();
-    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
+    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
+        error_report_err(local_err);
         goto out_lock;
     }
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ddae494ca8e8154ce03b88bc781fe9f1e639aceb..a6f06266cfc798b20b98001fa97ce771722175ec 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -203,6 +203,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     void *vaddr;
     int ret;
     Int128 llend;
+    Error *local_err = NULL;
 
     if (iotlb->target_as != &address_space_memory) {
         error_report("Wrong target AS \"%s\", only system memory is allowed",
@@ -222,7 +223,9 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
         bool read_only;
 
-        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL)) {
+        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL,
+                                  &local_err)) {
+            error_report_err(local_err);
             return;
         }
         ret = vhost_vdpa_dma_map(s, VHOST_VDPA_GUEST_PA_ASID, iova,
diff --git a/system/memory.c b/system/memory.c
index cbc098216b789f50460f1d1bc7ec122030693d9e..9dfebf44a4ae4d4942353213b18d05199e95d681 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2174,7 +2174,7 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
 /* Called with rcu_read_lock held.  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
-                          bool *mr_has_discard_manager)
+                          bool *mr_has_discard_manager, Error **errp)
 {
     MemoryRegion *mr;
     hwaddr xlat;
@@ -2192,7 +2192,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
     mr = address_space_translate(&address_space_memory, iotlb->translated_addr,
                                  &xlat, &len, writable, MEMTXATTRS_UNSPECIFIED);
     if (!memory_region_is_ram(mr)) {
-        error_report("iommu map to non memory area %" HWADDR_PRIx "", xlat);
+        error_setg(errp, "iommu map to non memory area %" HWADDR_PRIx "", xlat);
         return false;
     } else if (memory_region_has_ram_discard_manager(mr)) {
         RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
@@ -2211,8 +2211,8 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
          * were already restored before IOMMUs are restored.
          */
         if (!ram_discard_manager_is_populated(rdm, &tmp)) {
-            error_report("iommu map to discarded memory (e.g., unplugged via"
-                         " virtio-mem): %" HWADDR_PRIx "",
+            error_setg(errp, "iommu map to discarded memory (e.g., unplugged"
+                         " via virtio-mem): %" HWADDR_PRIx "",
                          iotlb->translated_addr);
             return false;
         }
@@ -2223,7 +2223,7 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
      * check that it did not truncate too much.
      */
     if (len & iotlb->addr_mask) {
-        error_report("iommu has granularity incompatible with target AS");
+        error_setg(errp, "iommu has granularity incompatible with target AS");
         return false;
     }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (21 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:23   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Let the callers do the error reporting. Add documentation while at it.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h         |  4 +-
 include/hw/vfio/vfio-container-base.h | 17 +++++++-
 hw/vfio/common.c                      | 59 ++++++++++++++++++---------
 hw/vfio/container-base.c              |  5 ++-
 hw/vfio/container.c                   | 13 +++---
 5 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 46f88493634b5634a9c14a5caa33a463fbf2c50d..68911d36676667352e94a97895828aff4b194b57 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -274,9 +274,9 @@ bool
 vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
 int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
-                                    hwaddr size);
+                                    hwaddr size, Error **errp);
 int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
-                          uint64_t size, ram_addr_t ram_addr);
+                          uint64_t size, ram_addr_t ram_addr, Error **errp);
 
 /* Returns 0 on success, or a negative errno. */
 int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index c76984654a596e3016a8cf833e10143eb872e102..ebc49ebfbe7de862450941b1129faad5d62b3769 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -85,7 +85,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
                                            bool start, Error **errp);
 int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                       VFIOBitmap *vbmap,
-                                      hwaddr iova, hwaddr size);
+                                      hwaddr iova, hwaddr size, Error **errp);
 
 void vfio_container_init(VFIOContainerBase *bcontainer,
                          VFIOAddressSpace *space,
@@ -138,9 +138,22 @@ struct VFIOIOMMUClass {
      */
     int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
                                    bool start, Error **errp);
+    /**
+     * @query_dirty_bitmap
+     *
+     * Get list of dirty pages from container
+     *
+     * @bcontainer: #VFIOContainerBase from which to get dirty pages
+     * @vbmap: #VFIOBitmap internal bitmap structure
+     * @iova: iova base address
+     * @size: size of iova range
+     * @errp: pointer to Error*, to store an error if it happens.
+     *
+     * Returns zero to indicate success and negative for error
+     */
     int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
                               VFIOBitmap *vbmap,
-                              hwaddr iova, hwaddr size);
+                              hwaddr iova, hwaddr size, Error **errp);
     /* PCI specific */
     int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 496e5adaf8f18e9ae7e86dd69be0b9e71e86404f..65a11dc088524647541db97b7b8d6f07e5044728 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1158,7 +1158,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
 
 int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
-                                    hwaddr size)
+                                    hwaddr size, Error **errp)
 {
     VFIODevice *vbasedev;
     int ret;
@@ -1167,10 +1167,10 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
         ret = vfio_device_dma_logging_report(vbasedev, iova, size,
                                              vbmap->bitmap);
         if (ret) {
-            error_report("%s: Failed to get DMA logging report, iova: "
-                         "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
-                         ", err: %d (%s)",
-                         vbasedev->name, iova, size, ret, strerror(-ret));
+            error_setg(errp, "%s: Failed to get DMA logging report, iova: "
+                       "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
+                       ", err: %d (%s)",
+                       vbasedev->name, iova, size, ret, strerror(-ret));
 
             return ret;
         }
@@ -1180,7 +1180,7 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
 }
 
 int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
-                          uint64_t size, ram_addr_t ram_addr)
+                          uint64_t size, ram_addr_t ram_addr, Error **errp)
 {
     bool all_device_dirty_tracking =
         vfio_devices_all_device_dirty_tracking(bcontainer);
@@ -1197,13 +1197,17 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
 
     ret = vfio_bitmap_alloc(&vbmap, size);
     if (ret) {
+        error_setg_errno(errp, -ret,
+                         "Failed to allocate dirty tracking bitmap");
         return ret;
     }
 
     if (all_device_dirty_tracking) {
-        ret = vfio_devices_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
+        ret = vfio_devices_query_dirty_bitmap(bcontainer, &vbmap, iova, size,
+                                              errp);
     } else {
-        ret = vfio_container_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
+        ret = vfio_container_query_dirty_bitmap(bcontainer, &vbmap, iova, size,
+                                                errp);
     }
 
     if (ret) {
@@ -1251,12 +1255,13 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     }
 
     ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
-                                translated_addr);
+                                translated_addr, &local_err);
     if (ret) {
-        error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
-                     "0x%"HWADDR_PRIx") = %d (%s)",
-                     bcontainer, iova, iotlb->addr_mask + 1, ret,
-                     strerror(-ret));
+        error_prepend(&local_err,
+                      "vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
+                      "0x%"HWADDR_PRIx") failed :", bcontainer, iova,
+                      iotlb->addr_mask + 1);
+        error_report_err(local_err);
     }
 
 out_lock:
@@ -1276,12 +1281,19 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
     const ram_addr_t ram_addr = memory_region_get_ram_addr(section->mr) +
                                 section->offset_within_region;
     VFIORamDiscardListener *vrdl = opaque;
+    Error *local_err = NULL;
+    int ret;
 
     /*
      * Sync the whole mapped region (spanning multiple individual mappings)
      * in one go.
      */
-    return vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr);
+    ret = vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr,
+                                &local_err);
+    if (ret) {
+        error_report_err(local_err);
+    }
+    return ret;
 }
 
 static int
@@ -1313,7 +1325,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
 }
 
 static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
-                                  MemoryRegionSection *section)
+                                  MemoryRegionSection *section, Error **errp)
 {
     ram_addr_t ram_addr;
 
@@ -1344,7 +1356,14 @@ static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
         }
         return 0;
     } else if (memory_region_has_ram_discard_manager(section->mr)) {
-        return vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
+        int ret;
+
+        ret = vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
+        if (ret) {
+            error_setg(errp,
+                       "Failed to sync dirty bitmap with RAM discard listener");
+            return ret;
+        }
     }
 
     ram_addr = memory_region_get_ram_addr(section->mr) +
@@ -1352,7 +1371,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
 
     return vfio_get_dirty_bitmap(bcontainer,
                    REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
-                   int128_get64(section->size), ram_addr);
+                                 int128_get64(section->size), ram_addr, errp);
 }
 
 static void vfio_listener_log_sync(MemoryListener *listener,
@@ -1361,16 +1380,16 @@ static void vfio_listener_log_sync(MemoryListener *listener,
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
     int ret;
+    Error *local_err = NULL;
 
     if (vfio_listener_skipped_section(section)) {
         return;
     }
 
     if (vfio_devices_all_dirty_tracking(bcontainer)) {
-        ret = vfio_sync_dirty_bitmap(bcontainer, section);
+        ret = vfio_sync_dirty_bitmap(bcontainer, section, &local_err);
         if (ret) {
-            error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
-                         strerror(-ret));
+            error_report_err(local_err);
             vfio_set_migration_error(ret);
         }
     }
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 7c0764121d24b02b6c4e66e368d7dff78a6d65aa..8db59881873c3b1edee81104b966af737e5fa6f6 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -65,10 +65,11 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
 
 int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                       VFIOBitmap *vbmap,
-                                      hwaddr iova, hwaddr size)
+                                      hwaddr iova, hwaddr size, Error **errp)
 {
     g_assert(bcontainer->ops->query_dirty_bitmap);
-    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size);
+    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size,
+                                               errp);
 }
 
 void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 6524575aeddcea8470b5fd10caf57475088d1813..475d96eaaa927998c6aa8cc9aa9f2115f5a1efda 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -131,6 +131,7 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
     };
     bool need_dirty_sync = false;
     int ret;
+    Error *local_err = NULL;
 
     if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) {
         if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
@@ -166,8 +167,9 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
 
     if (need_dirty_sync) {
         ret = vfio_get_dirty_bitmap(bcontainer, iova, size,
-                                    iotlb->translated_addr);
+                                    iotlb->translated_addr, &local_err);
         if (ret) {
+            error_report_err(local_err);
             return ret;
         }
     }
@@ -237,7 +239,8 @@ vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
 
 static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                           VFIOBitmap *vbmap,
-                                          hwaddr iova, hwaddr size)
+                                          hwaddr iova, hwaddr size,
+                                          Error **errp)
 {
     const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                                   bcontainer);
@@ -265,9 +268,9 @@ static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
     ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
     if (ret) {
         ret = -errno;
-        error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
-                " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
-                (uint64_t)range->size, errno);
+        error_setg(errp, "Failed to get dirty bitmap for iova: 0x%"PRIx64
+                   " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
+                   (uint64_t)range->size, errno);
     }
 
     g_free(dbitmap);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy()
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (22 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:28   ` Eric Auger
  2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
  2024-03-08  8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

vfio_save_complete_precopy() currently returns before doing the trace
event. Change that.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/migration.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index bd48f2ee472a5230c2c84bff829dae1e217db33f..c8aeb43b4249ec76ded2542d62792e8c469d5f97 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -580,9 +580,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
 
     qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
     ret = qemu_file_get_error(f);
-    if (ret) {
-        return ret;
-    }
 
     trace_vfio_save_complete_precopy(vbasedev->name, ret);
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (23 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
@ 2024-03-06 13:34 ` Cédric Le Goater
  2024-03-07  9:30   ` Eric Auger
  2024-03-08  8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
  25 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-06 13:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

vfio_set_migration_error() sets the 'return' error on the migration
stream if a migration is in progress. To improve error reporting, add
a new Error* argument to also set the Error object on the migration
stream, if a migration is progress.

Signed-off-by: Cédric Le Goater <clg@redhat.com>
---

 Changes in v4:

 - Dropped log_global_stop() and log_global_sync() changes
   
 hw/vfio/common.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 65a11dc088524647541db97b7b8d6f07e5044728..e26574617e5ef75c27a84dc9bb13c8f040353b6c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -148,16 +148,18 @@ bool vfio_viommu_preset(VFIODevice *vbasedev)
     return vbasedev->bcontainer->space->as != &address_space_memory;
 }
 
-static void vfio_set_migration_error(int err)
+static void vfio_set_migration_error(int ret, Error *err)
 {
     MigrationState *ms = migrate_get_current();
 
     if (migration_is_setup_or_active(ms->state)) {
         WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
             if (ms->to_dst_file) {
-                qemu_file_set_error(ms->to_dst_file, err);
+                qemu_file_set_error_obj(ms->to_dst_file, ret, err);
             }
         }
+    } else {
+        error_report_err(err);
     }
 }
 
@@ -304,9 +306,10 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                                 iova, iova + iotlb->addr_mask);
 
     if (iotlb->target_as != &address_space_memory) {
-        error_report("Wrong target AS \"%s\", only system memory is allowed",
-                     iotlb->target_as->name ? iotlb->target_as->name : "none");
-        vfio_set_migration_error(-EINVAL);
+        error_setg(&local_err,
+                   "Wrong target AS \"%s\", only system memory is allowed",
+                   iotlb->target_as->name ? iotlb->target_as->name : "none");
+        vfio_set_migration_error(-EINVAL, local_err);
         return;
     }
 
@@ -339,11 +342,12 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
         ret = vfio_container_dma_unmap(bcontainer, iova,
                                        iotlb->addr_mask + 1, iotlb);
         if (ret) {
-            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                         "0x%"HWADDR_PRIx") = %d (%s)",
-                         bcontainer, iova,
-                         iotlb->addr_mask + 1, ret, strerror(-ret));
-            vfio_set_migration_error(ret);
+            error_setg(&local_err,
+                       "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                       "0x%"HWADDR_PRIx") = %d (%s)",
+                       bcontainer, iova,
+                       iotlb->addr_mask + 1, ret, strerror(-ret));
+            vfio_set_migration_error(ret, local_err);
         }
     }
 out:
@@ -1125,8 +1129,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     if (ret) {
         error_prepend(&local_err,
                       "vfio: Could not stop dirty page tracking - ");
-        error_report_err(local_err);
-        vfio_set_migration_error(ret);
+        vfio_set_migration_error(ret, local_err);
     }
 }
 
@@ -1243,14 +1246,14 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
 
     if (iotlb->target_as != &address_space_memory) {
-        error_report("Wrong target AS \"%s\", only system memory is allowed",
-                     iotlb->target_as->name ? iotlb->target_as->name : "none");
+        error_setg(&local_err,
+                   "Wrong target AS \"%s\", only system memory is allowed",
+                   iotlb->target_as->name ? iotlb->target_as->name : "none");
         goto out;
     }
 
     rcu_read_lock();
     if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
-        error_report_err(local_err);
         goto out_lock;
     }
 
@@ -1261,7 +1264,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                       "vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
                       "0x%"HWADDR_PRIx") failed :", bcontainer, iova,
                       iotlb->addr_mask + 1);
-        error_report_err(local_err);
     }
 
 out_lock:
@@ -1269,7 +1271,7 @@ out_lock:
 
 out:
     if (ret) {
-        vfio_set_migration_error(ret);
+        vfio_set_migration_error(ret, local_err);
     }
 }
 
@@ -1389,8 +1391,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
     if (vfio_devices_all_dirty_tracking(bcontainer)) {
         ret = vfio_sync_dirty_bitmap(bcontainer, section, &local_err);
         if (ret) {
-            error_report_err(local_err);
-            vfio_set_migration_error(ret);
+            vfio_set_migration_error(ret, local_err);
         }
     }
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap()
  2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
@ 2024-03-06 20:51   ` Philippe Mathieu-Daudé
  2024-03-07  7:13     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-06 20:51 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Markus Armbruster, Prasad Pandit

On 6/3/24 14:34, Cédric Le Goater wrote:
> It will simplify the changes coming after.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>   hw/vfio/common.c | 22 +++++++++++++---------
>   1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5b2e6a179cdd5f8ca5be84b7097661e96b391456..6820d2efe4923d5043da7eb8deecb6ff20e1fd16 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1241,16 +1241,20 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       }
>   
>       rcu_read_lock();
> -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> -        ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
> -                                    translated_addr);
> -        if (ret) {
> -            error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
> -                         "0x%"HWADDR_PRIx") = %d (%s)",
> -                         bcontainer, iova, iotlb->addr_mask + 1, ret,
> -                         strerror(-ret));
> -        }
> +    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> +        goto out_lock;
>       }
> +
> +    ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
> +                                translated_addr);
> +    if (ret) {
> +        error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
> +                     "0x%"HWADDR_PRIx") = %d (%s)",
> +                     bcontainer, iova, iotlb->addr_mask + 1, ret,
> +                     strerror(-ret));
> +    }
> +
> +out_lock:

Alternatively use WITH_RCU_READ_LOCK_GUARD() to avoid label.

>       rcu_read_unlock();
>   
>   out:



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap()
  2024-03-06 20:51   ` Philippe Mathieu-Daudé
@ 2024-03-07  7:13     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07  7:13 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Markus Armbruster, Prasad Pandit

On 3/6/24 21:51, Philippe Mathieu-Daudé wrote:
> On 6/3/24 14:34, Cédric Le Goater wrote:
>> It will simplify the changes coming after.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/common.c | 22 +++++++++++++---------
>>   1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5b2e6a179cdd5f8ca5be84b7097661e96b391456..6820d2efe4923d5043da7eb8deecb6ff20e1fd16 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1241,16 +1241,20 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>>       }
>>       rcu_read_lock();
>> -    if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
>> -        ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
>> -                                    translated_addr);
>> -        if (ret) {
>> -            error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
>> -                         "0x%"HWADDR_PRIx") = %d (%s)",
>> -                         bcontainer, iova, iotlb->addr_mask + 1, ret,
>> -                         strerror(-ret));
>> -        }
>> +    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
>> +        goto out_lock;
>>       }
>> +
>> +    ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
>> +                                translated_addr);
>> +    if (ret) {
>> +        error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
>> +                     "0x%"HWADDR_PRIx") = %d (%s)",
>> +                     bcontainer, iova, iotlb->addr_mask + 1, ret,
>> +                     strerror(-ret));
>> +    }
>> +
>> +out_lock:
> 
> Alternatively use WITH_RCU_READ_LOCK_GUARD() to avoid label.

Sure. I remember your patch. I will resend with your suggestion when
the first part of this series is addressed.

Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler
  2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
@ 2024-03-07  8:09   ` Eric Auger
  2024-03-07 12:06     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  8:09 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Hi Cédric,

On 3/6/24 14:34, Cédric Le Goater wrote:
> We will use the Error object to improve error reporting in the
> .log_global*() handlers of VFIO. Add documentation while at it.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v3:
> 
>  - Use error_setg_errno() in vfio_legacy_set_dirty_page_tracking()
>  
>  include/hw/vfio/vfio-container-base.h | 18 ++++++++++++++++--
>  hw/vfio/common.c                      |  4 ++--
>  hw/vfio/container-base.c              |  4 ++--
>  hw/vfio/container.c                   |  6 +++---
>  4 files changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 3582d5f97a37877b2adfc0d0b06996c82403f8b7..c76984654a596e3016a8cf833e10143eb872e102 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -82,7 +82,7 @@ int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>  void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>                                         MemoryRegionSection *section);
>  int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> -                                           bool start);
> +                                           bool start, Error **errp);
>  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                        VFIOBitmap *vbmap,
>                                        hwaddr iova, hwaddr size);
> @@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
>      int (*attach_device)(const char *name, VFIODevice *vbasedev,
>                           AddressSpace *as, Error **errp);
>      void (*detach_device)(VFIODevice *vbasedev);
> +
>      /* migration feature */
> +
> +    /**
> +     * @set_dirty_page_tracking
> +     *
> +     * Start or stop dirty pages tracking on VFIO container
> +     *
> +     * @bcontainer: #VFIOContainerBase on which to de/activate dirty
> +     *              pages tracking
s/pages/page?
for my education is the "#"VFIOContainerBase formalized somewhere?
> +     * @start: indicates whether to start or stop dirty pages tracking
> +     * @errp: pointer to Error*, to store an error if it happens.
> +     *
> +     * Returns zero to indicate success and negative for error
> +     */
>      int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
> -                                   bool start);
> +                                   bool start, Error **errp);
>      int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
>                                VFIOBitmap *vbmap,
>                                hwaddr iova, hwaddr size);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4..5598a508399a6c0b3a20ba17311cbe83d84250c5 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1085,7 +1085,7 @@ static bool vfio_listener_log_global_start(MemoryListener *listener,
>      if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>          ret = vfio_devices_dma_logging_start(bcontainer);
>      } else {
> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
It is not obvious why we don't pass errp here. Also there is ana
error_report below. Why isn't the error propagated? (not related to your
patch though)
>      }
>  
>      if (ret) {
> @@ -1105,7 +1105,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>      if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>          vfio_devices_dma_logging_stop(bcontainer);
>      } else {
> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
>      }
>  
>      if (ret) {
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 913ae49077c4f09b7b27517c1231cfbe4befb7fb..7c0764121d24b02b6c4e66e368d7dff78a6d65aa 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -53,14 +53,14 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>  }
>  
>  int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> -                                           bool start)
> +                                           bool start, Error **errp)
>  {
>      if (!bcontainer->dirty_pages_supported) {
>          return 0;
>      }
>  
>      g_assert(bcontainer->ops->set_dirty_page_tracking);
> -    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
> +    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start, errp);
>  }
>  
>  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 096d77eac3946a9c38fc2a98116b93353f71f06e..6524575aeddcea8470b5fd10caf57475088d1813 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -210,7 +210,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>  
>  static int
>  vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> -                                    bool start)
> +                                    bool start, Error **errp)
>  {
>      const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                                    bcontainer);
> @@ -228,8 +228,8 @@ vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>      ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>      if (ret) {
>          ret = -errno;
> -        error_report("Failed to set dirty tracking flag 0x%x errno: %d",
> -                     dirty.flags, errno);
> +        error_setg_errno(errp, errno, "Failed to set dirty tracking flag 0x%x",
> +                         dirty.flags);
>      }
>  
>      return ret;

Thanks

Eric



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start()
  2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
@ 2024-03-07  8:15   ` Eric Auger
  2024-03-07 13:15     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  8:15 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Hi Cédric,

On 3/6/24 14:34, Cédric Le Goater wrote:
> This allows to update the Error argument of the VFIO log_global_start()
> handler. Errors detected when device level logging is started will be
> propagated up to qemu_savevm_state_setup() when the ram save_setup()
> handler is executed.
> 
> The vfio_set_migration_error() call becomes redundant. Remove it.
you may precise it becomes redundant in vfio_listener_log_global_start()
because it is kept in vfio_listener_log_global_stop
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v4:
> 
>  - Dropped log_global_stop() and log_global_sync() changes
>    
>  Changes in v3:
> 
>  - Use error_setg_errno() in vfio_devices_dma_logging_start() 
>  - ERRP_GUARD() because of error_prepend use in
>    vfio_listener_log_global_start()
>    
>  hw/vfio/common.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5598a508399a6c0b3a20ba17311cbe83d84250c5..d6790557da2f2890398fa03dbbef18129cd2c1bb 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1036,7 +1036,8 @@ static void vfio_device_feature_dma_logging_start_destroy(
>      g_free(feature);
>  }
>  
> -static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
> +static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
> +                                          Error **errp)
>  {
>      struct vfio_device_feature *feature;
>      VFIODirtyRanges ranges;
> @@ -1058,8 +1059,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
>          ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>          if (ret) {
>              ret = -errno;
there is another case of error if !feature. Don't we want t o set errp
in that case as well?
I think in general we should try to make the return value and the errp
consistent because the caller may try to exploit the errp in case or
negative returned value.
> -            error_report("%s: Failed to start DMA logging, err %d (%s)",
> -                         vbasedev->name, ret, strerror(errno));
> +            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
> +                             vbasedev->name);
>              goto out;
>          }
>          vbasedev->dirty_tracking = true;
> @@ -1078,20 +1079,19 @@ out:
>  static bool vfio_listener_log_global_start(MemoryListener *listener,
>                                             Error **errp)
>  {
> +    ERRP_GUARD(); /* error_prepend use */
>      VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                   listener);
>      int ret;
>  
>      if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
> -        ret = vfio_devices_dma_logging_start(bcontainer);
> +        ret = vfio_devices_dma_logging_start(bcontainer, errp);
>      } else {
> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, errp);
>      }
>  
>      if (ret) {
> -        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
> -                     ret, strerror(-ret));
> -        vfio_set_migration_error(ret);
> +        error_prepend(errp, "vfio: Could not start dirty page tracking - ");
>      }
>      return !ret;
>  }
> @@ -1100,17 +1100,20 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>  {
>      VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                   listener);
> +    Error *local_err = NULL;
>      int ret = 0;
>  
>      if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>          vfio_devices_dma_logging_stop(bcontainer);
>      } else {
> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
> +                                                     &local_err);
>      }
>  
>      if (ret) {
> -        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
> -                     ret, strerror(-ret));
> +        error_prepend(&local_err,
> +                      "vfio: Could not stop dirty page tracking - ");
> +        error_report_err(local_err);
>          vfio_set_migration_error(ret);
>      }
>  }
Eric



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
@ 2024-03-07  8:53   ` Eric Auger
  2024-03-07 14:05     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  8:53 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/6/24 14:34, Cédric Le Goater wrote:
> This improves error reporting in the log_global_stop() VFIO handler.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v4:
> 
>  - Dropped log_global_stop() and log_global_sync() changes
>    
>  Changes in v3:
> 
>  - Use error_setg_errno() in vfio_devices_dma_logging_stop() 
>  
>  hw/vfio/common.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index d6790557da2f2890398fa03dbbef18129cd2c1bb..5b2e6a179cdd5f8ca5be84b7097661e96b391456 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -938,12 +938,14 @@ static void vfio_dirty_tracking_init(VFIOContainerBase *bcontainer,
>      memory_listener_unregister(&dirty.listener);
>  }
>  
> -static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
> +static int vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer,
> +                                          Error **errp)
>  {
>      uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>                                sizeof(uint64_t))] = {};
>      struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>      VFIODevice *vbasedev;
> +    int ret = 0;
>  
>      feature->argsz = sizeof(buf);
>      feature->flags = VFIO_DEVICE_FEATURE_SET |
> @@ -955,11 +957,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>          }
>  
>          if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -            warn_report("%s: Failed to stop DMA logging, err %d (%s)",
> -                        vbasedev->name, -errno, strerror(errno));
> +            /* Keep first error */
> +            if (!ret) {
> +                ret = -errno;
> +                error_setg_errno(errp, errno, "%s: Failed to stop DMA logging",
> +                                 vbasedev->name);
maybe you can keep the previous warn_report in case errp is NULL
(rollback) or for subsequent failures?

Eric
> +            }
>          }
>          vbasedev->dirty_tracking = false;
>      }
> +
> +    return ret;
>  }
>  
>  static struct vfio_device_feature *
> @@ -1068,7 +1076,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>  
>  out:
>      if (ret) {
> -        vfio_devices_dma_logging_stop(bcontainer);
> +        /* Ignore the potential errors when doing rollback */
> +        vfio_devices_dma_logging_stop(bcontainer, NULL);
>      }
>  
>      vfio_device_feature_dma_logging_start_destroy(feature);
> @@ -1104,7 +1113,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>      int ret = 0;
>  
>      if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
> -        vfio_devices_dma_logging_stop(bcontainer);
> +        ret = vfio_devices_dma_logging_stop(bcontainer, &local_err);
>      } else {
>          ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
>                                                       &local_err);



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup()
  2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
@ 2024-03-07  9:04   ` Eric Auger
  2024-03-07 13:35     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:04 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Hi Cédric,

On 3/6/24 14:34, Cédric Le Goater wrote:
> Add an Error** argument to vfio_migration_set_state() and adjust
> callers, including vfio_save_setup(). The error will be propagated up
> to qemu_savevm_state_setup() where the save_setup() handler is
> executed.
> 
> Modify vfio_vmstate_change_prepare() and vfio_vmstate_change() to
> store a reported error under the migration stream if a migration is in
> progress.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v3:
> 
>  - Use error_setg_errno() in vfio_save_setup() 
>  - Made sure an error is always set in case of failure in
>    vfio_load_setup()
>    
>  hw/vfio/migration.c | 67 ++++++++++++++++++++++++++-------------------
>  1 file changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index a3bb1a92ba0b9c2c585efe54cfda0b774a81dcb9..71ade14a7942358094371a86c00718f5979113ea 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -84,7 +84,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
>  
>  static int vfio_migration_set_state(VFIODevice *vbasedev,
>                                      enum vfio_device_mig_state new_state,
> -                                    enum vfio_device_mig_state recover_state)
> +                                    enum vfio_device_mig_state recover_state,
> +                                    Error **errp)
>  {
>      VFIOMigration *migration = vbasedev->migration;
>      uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
> @@ -104,15 +105,15 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>          ret = -errno;
>  
>          if (recover_state == VFIO_DEVICE_STATE_ERROR) {
> -            error_report("%s: Failed setting device state to %s, err: %s. "
> -                         "Recover state is ERROR. Resetting device",
> -                         vbasedev->name, mig_state_to_str(new_state),
> -                         strerror(errno));
> +            error_setg(errp, "%s: Failed setting device state to %s, err: %s. "
> +                       "Recover state is ERROR. Resetting device",
> +                       vbasedev->name, mig_state_to_str(new_state),
> +                       strerror(errno));
you can use the error_setg_errno variant here and below.
>  
>              goto reset_device;
>          }
>  
> -        error_report(
> +        error_setg(errp,
>              "%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
>                       vbasedev->name, mig_state_to_str(new_state),
>                       strerror(errno), mig_state_to_str(recover_state));
> @@ -120,7 +121,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>          mig_state->device_state = recover_state;
>          if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>              ret = -errno;
> -            error_report(
> +            error_setg(errp,
>                  "%s: Failed setting device in recover state, err: %s. Resetting device",
>                           vbasedev->name, strerror(errno));
>  
> @@ -139,7 +140,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>               * This can happen if the device is asynchronously reset and
>               * terminates a data transfer.
>               */
> -            error_report("%s: data_fd out of sync", vbasedev->name);
> +            error_setg(errp, "%s: data_fd out of sync", vbasedev->name);
>              close(mig_state->data_fd);
>  
>              return -EBADF;
> @@ -170,10 +171,11 @@ reset_device:
>   */
>  static int
>  vfio_migration_set_state_or_reset(VFIODevice *vbasedev,
> -                                  enum vfio_device_mig_state new_state)
> +                                  enum vfio_device_mig_state new_state,
> +                                  Error **errp)
>  {
>      return vfio_migration_set_state(vbasedev, new_state,
> -                                    VFIO_DEVICE_STATE_ERROR);
> +                                    VFIO_DEVICE_STATE_ERROR, errp);
>  }
>  
>  static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
> @@ -401,10 +403,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
>          switch (migration->device_state) {
>          case VFIO_DEVICE_STATE_RUNNING:
>              ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
> -                                           VFIO_DEVICE_STATE_RUNNING);
> +                                           VFIO_DEVICE_STATE_RUNNING, errp);
>              if (ret) {
> -                error_setg(errp, "%s: Failed to set new PRE_COPY state",
> -                           vbasedev->name);
>                  return ret;
>              }
>  
> @@ -437,13 +437,20 @@ static void vfio_save_cleanup(void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      /*
>       * Changing device state from STOP_COPY to STOP can take time. Do it here,
>       * after migration has completed, so it won't increase downtime.
>       */
>      if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
> -        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_STOP);
> +        ret = vfio_migration_set_state_or_reset(vbasedev,
> +                                                VFIO_DEVICE_STATE_STOP,
> +                                                &local_err);
> +        if (ret) {
> +            error_report_err(local_err);
> +        }
>      }
>  
>      g_free(migration->data_buffer);
> @@ -549,11 +556,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>      VFIODevice *vbasedev = opaque;
>      ssize_t data_size;
>      int ret;
> +    Error *local_err = NULL;
>  
>      /* We reach here with device state STOP or STOP_COPY only */
>      ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> -                                   VFIO_DEVICE_STATE_STOP);
> +                                   VFIO_DEVICE_STATE_STOP, &local_err);
>      if (ret) {
> +        error_report_err(local_err);
>          return ret;
>      }
>  
> @@ -591,14 +600,9 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>  static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
>  {
>      VFIODevice *vbasedev = opaque;
> -    int ret;
>  
> -    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> -                                   vbasedev->migration->device_state);
> -    if (ret) {
> -        error_setg(errp, "%s: Failed to set RESUMING state", vbasedev->name);
> -    }
> -    return ret;
> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> +                                    vbasedev->migration->device_state, errp);
>  }
>  
>  static int vfio_load_cleanup(void *opaque)
> @@ -714,20 +718,22 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running,
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
>      enum vfio_device_mig_state new_state;
> +    Error *local_err = NULL;
>      int ret;
>  
>      new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
>                      VFIO_DEVICE_STATE_PRE_COPY_P2P :
>                      VFIO_DEVICE_STATE_RUNNING_P2P;
>  
> -    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
> +    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
>      if (ret) {
>          /*
>           * Migration should be aborted in this case, but vm_state_notify()
>           * currently does not support reporting failures.
>           */
if ret and we do not enter the condition below, we may leak the
local_err. Same below.

Eric
>          if (migrate_get_current()->to_dst_file) {
> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> +            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
> +                                    local_err);
>          }
>      }
>  
> @@ -740,6 +746,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>  {
>      VFIODevice *vbasedev = opaque;
>      enum vfio_device_mig_state new_state;
> +    Error *local_err = NULL;
>      int ret;
>  
>      if (running) {
> @@ -752,14 +759,15 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>                  VFIO_DEVICE_STATE_STOP;
>      }
>  
> -    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
> +    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
>      if (ret) {
>          /*
>           * Migration should be aborted in this case, but vm_state_notify()
>           * currently does not support reporting failures.
>           */
>          if (migrate_get_current()->to_dst_file) {
> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
> +            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
> +                                    local_err);
>          }
>      }
>  
> @@ -773,13 +781,16 @@ static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
>      VFIOMigration *migration = container_of(notifier, VFIOMigration,
>                                              migration_state);
>      VFIODevice *vbasedev = migration->vbasedev;
> +    int ret = 0;
>  
>      trace_vfio_migration_state_notifier(vbasedev->name, e->type);
>  
>      if (e->type == MIG_EVENT_PRECOPY_FAILED) {
> -        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
> +        ret = vfio_migration_set_state_or_reset(vbasedev,
> +                                                VFIO_DEVICE_STATE_RUNNING,
> +                                                errp);
>      }
> -    return 0;
> +    return ret;
>  }
>  
>  static void vfio_migration_free(VFIODevice *vbasedev)



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler
  2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
@ 2024-03-07  9:13   ` Eric Auger
  2024-03-07 13:55     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:13 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/6/24 14:34, Cédric Le Goater wrote:
> Use vmstate_save_state_with_err() to improve error reporting in the
> callers and store a reported error under the migration stream. Add
> documentation while at it.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>  include/hw/vfio/vfio-common.h | 25 ++++++++++++++++++++++++-
>  hw/vfio/migration.c           | 18 ++++++++++++------
>  hw/vfio/pci.c                 |  5 +++--
>  3 files changed, 39 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b9da6c08ef41174610eb92726c590309a53696a3..46f88493634b5634a9c14a5caa33a463fbf2c50d 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -133,7 +133,30 @@ struct VFIODeviceOps {
>      int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>      void (*vfio_eoi)(VFIODevice *vdev);
>      Object *(*vfio_get_object)(VFIODevice *vdev);
> -    void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
> +
> +    /**
> +     * @vfio_save_config
> +     *
> +     * Save device config state
> +     *
> +     * @vdev: #VFIODevice for which to save the config
> +     * @f: #QEMUFile where to send the data
> +     * @errp: pointer to Error*, to store an error if it happens.
> +     *
> +     * Returns zero to indicate success and negative for error
> +     */
> +    int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error **errp);
> +
> +    /**
> +     * @vfio_load_config
> +     *
> +     * Load device config state
> +     *
> +     * @vdev: #VFIODevice for which to load the config
> +     * @f: #QEMUFile where to get the data
> +     *
> +     * Returns zero to indicate success and negative for error
> +     */
>      int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
>  };
>  
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 71ade14a7942358094371a86c00718f5979113ea..bd48f2ee472a5230c2c84bff829dae1e217db33f 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -190,14 +190,19 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>      return ret;
>  }
>  
> -static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
> +static int vfio_save_device_config_state(QEMUFile *f, void *opaque,
> +                                         Error **errp)
>  {
>      VFIODevice *vbasedev = opaque;
> +    int ret;
>  
>      qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE);
>  
>      if (vbasedev->ops && vbasedev->ops->vfio_save_config) {
> -        vbasedev->ops->vfio_save_config(vbasedev, f);
> +        ret = vbasedev->ops->vfio_save_config(vbasedev, f, errp);
> +        if (ret) {
I am not familiar enough with that case but don't you still want to set
the VFIO_MIG_FLAG_END_OF_STATE to "close" the state?

Eric
> +            return ret;
> +        }
>      }
>  
>      qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> @@ -587,13 +592,14 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>  static void vfio_save_state(QEMUFile *f, void *opaque)
>  {
>      VFIODevice *vbasedev = opaque;
> +    Error *local_err = NULL;
>      int ret;
>  
> -    ret = vfio_save_device_config_state(f, opaque);
> +    ret = vfio_save_device_config_state(f, opaque, &local_err);
>      if (ret) {
> -        error_report("%s: Failed to save device config space",
> -                     vbasedev->name);
> -        qemu_file_set_error(f, ret);
> +        error_prepend(&local_err, "%s: Failed to save device config space",
> +                      vbasedev->name);
> +        qemu_file_set_error_obj(f, ret, local_err);
>      }
>  }
>  
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 4fa387f0430d62ca2ba1b5ae5b7037f8f06b33f9..99d86e1d40ef25133fc76ad6e58294b07bd20843 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2585,11 +2585,12 @@ const VMStateDescription vmstate_vfio_pci_config = {
>      }
>  };
>  
> -static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
> +static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f, Error **errp)
>  {
>      VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>  
> -    vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL);
> +    return vmstate_save_state_with_err(f, &vmstate_vfio_pci_config, vdev, NULL,
> +                                       errp);
>  }
>  
>  static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler
  2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
@ 2024-03-07  9:23   ` Eric Auger
  0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:23 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/6/24 14:34, Cédric Le Goater wrote:
> Let the callers do the error reporting. Add documentation while at it.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>  include/hw/vfio/vfio-common.h         |  4 +-
>  include/hw/vfio/vfio-container-base.h | 17 +++++++-
>  hw/vfio/common.c                      | 59 ++++++++++++++++++---------
>  hw/vfio/container-base.c              |  5 ++-
>  hw/vfio/container.c                   | 13 +++---
>  5 files changed, 67 insertions(+), 31 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 46f88493634b5634a9c14a5caa33a463fbf2c50d..68911d36676667352e94a97895828aff4b194b57 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -274,9 +274,9 @@ bool
>  vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
>  int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                      VFIOBitmap *vbmap, hwaddr iova,
> -                                    hwaddr size);
> +                                    hwaddr size, Error **errp);
>  int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> -                          uint64_t size, ram_addr_t ram_addr);
> +                          uint64_t size, ram_addr_t ram_addr, Error **errp);
>  
>  /* Returns 0 on success, or a negative errno. */
>  int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index c76984654a596e3016a8cf833e10143eb872e102..ebc49ebfbe7de862450941b1129faad5d62b3769 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -85,7 +85,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>                                             bool start, Error **errp);
>  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                        VFIOBitmap *vbmap,
> -                                      hwaddr iova, hwaddr size);
> +                                      hwaddr iova, hwaddr size, Error **errp);
>  
>  void vfio_container_init(VFIOContainerBase *bcontainer,
>                           VFIOAddressSpace *space,
> @@ -138,9 +138,22 @@ struct VFIOIOMMUClass {
>       */
>      int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
>                                     bool start, Error **errp);
> +    /**
> +     * @query_dirty_bitmap
> +     *
> +     * Get list of dirty pages from container
> +     *
> +     * @bcontainer: #VFIOContainerBase from which to get dirty pages
> +     * @vbmap: #VFIOBitmap internal bitmap structure
> +     * @iova: iova base address
> +     * @size: size of iova range
> +     * @errp: pointer to Error*, to store an error if it happens.
> +     *
> +     * Returns zero to indicate success and negative for error
> +     */
>      int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
>                                VFIOBitmap *vbmap,
> -                              hwaddr iova, hwaddr size);
> +                              hwaddr iova, hwaddr size, Error **errp);
>      /* PCI specific */
>      int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
>  
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 496e5adaf8f18e9ae7e86dd69be0b9e71e86404f..65a11dc088524647541db97b7b8d6f07e5044728 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1158,7 +1158,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>  
>  int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                      VFIOBitmap *vbmap, hwaddr iova,
> -                                    hwaddr size)
> +                                    hwaddr size, Error **errp)
>  {
>      VFIODevice *vbasedev;
>      int ret;
> @@ -1167,10 +1167,10 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>          ret = vfio_device_dma_logging_report(vbasedev, iova, size,
>                                               vbmap->bitmap);
>          if (ret) {
> -            error_report("%s: Failed to get DMA logging report, iova: "
> -                         "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
> -                         ", err: %d (%s)",
> -                         vbasedev->name, iova, size, ret, strerror(-ret));
> +            error_setg(errp, "%s: Failed to get DMA logging report, iova: "
> +                       "0x%" HWADDR_PRIx ", size: 0x%" HWADDR_PRIx
> +                       ", err: %d (%s)",
> +                       vbasedev->name, iova, size, ret, strerror(-ret));
use error_setg_errno as below?
>  
>              return ret;
>          }
> @@ -1180,7 +1180,7 @@ int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>  }
>  
>  int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> -                          uint64_t size, ram_addr_t ram_addr)
> +                          uint64_t size, ram_addr_t ram_addr, Error **errp)
>  {
>      bool all_device_dirty_tracking =
>          vfio_devices_all_device_dirty_tracking(bcontainer);
> @@ -1197,13 +1197,17 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>  
>      ret = vfio_bitmap_alloc(&vbmap, size);
>      if (ret) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to allocate dirty tracking bitmap");
>          return ret;
>      }
>  
>      if (all_device_dirty_tracking) {
> -        ret = vfio_devices_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
> +        ret = vfio_devices_query_dirty_bitmap(bcontainer, &vbmap, iova, size,
> +                                              errp);
>      } else {
> -        ret = vfio_container_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
> +        ret = vfio_container_query_dirty_bitmap(bcontainer, &vbmap, iova, size,
> +                                                errp);
>      }
>  
>      if (ret) {
> @@ -1251,12 +1255,13 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      }
>  
>      ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
> -                                translated_addr);
> +                                translated_addr, &local_err);
>      if (ret) {
> -        error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
> -                     "0x%"HWADDR_PRIx") = %d (%s)",
> -                     bcontainer, iova, iotlb->addr_mask + 1, ret,
> -                     strerror(-ret));
> +        error_prepend(&local_err,
> +                      "vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
> +                      "0x%"HWADDR_PRIx") failed :", bcontainer, iova,
nit in previous prepends you used "-" instead of ":" at the end. Maybe
align.
> +                      iotlb->addr_mask + 1);
> +        error_report_err(local_err);
>      }
>  
>  out_lock:
> @@ -1276,12 +1281,19 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
>      const ram_addr_t ram_addr = memory_region_get_ram_addr(section->mr) +
>                                  section->offset_within_region;
>      VFIORamDiscardListener *vrdl = opaque;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      /*
>       * Sync the whole mapped region (spanning multiple individual mappings)
>       * in one go.
>       */
> -    return vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr);
> +    ret = vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr,
> +                                &local_err);
> +    if (ret) {
> +        error_report_err(local_err);
> +    }
> +    return ret;
>  }
>  
>  static int
> @@ -1313,7 +1325,7 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
>  }
>  
>  static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
> -                                  MemoryRegionSection *section)
> +                                  MemoryRegionSection *section, Error **errp)
>  {
>      ram_addr_t ram_addr;
>  
> @@ -1344,7 +1356,14 @@ static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
>          }
>          return 0;
>      } else if (memory_region_has_ram_discard_manager(section->mr)) {
> -        return vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
> +        int ret;
> +
> +        ret = vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
> +        if (ret) {
> +            error_setg(errp,
> +                       "Failed to sync dirty bitmap with RAM discard listener");
> +            return ret;
> +        }
>      }
>  
>      ram_addr = memory_region_get_ram_addr(section->mr) +
> @@ -1352,7 +1371,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
>  
>      return vfio_get_dirty_bitmap(bcontainer,
>                     REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
> -                   int128_get64(section->size), ram_addr);
> +                                 int128_get64(section->size), ram_addr, errp);
>  }
>  
>  static void vfio_listener_log_sync(MemoryListener *listener,
> @@ -1361,16 +1380,16 @@ static void vfio_listener_log_sync(MemoryListener *listener,
>      VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                   listener);
>      int ret;
> +    Error *local_err = NULL;
>  
>      if (vfio_listener_skipped_section(section)) {
>          return;
>      }
>  
>      if (vfio_devices_all_dirty_tracking(bcontainer)) {
> -        ret = vfio_sync_dirty_bitmap(bcontainer, section);
> +        ret = vfio_sync_dirty_bitmap(bcontainer, section, &local_err);
>          if (ret) {
> -            error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
> -                         strerror(-ret));
> +            error_report_err(local_err);
>              vfio_set_migration_error(ret);
>          }
>      }
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 7c0764121d24b02b6c4e66e368d7dff78a6d65aa..8db59881873c3b1edee81104b966af737e5fa6f6 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -65,10 +65,11 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>  
>  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                        VFIOBitmap *vbmap,
> -                                      hwaddr iova, hwaddr size)
> +                                      hwaddr iova, hwaddr size, Error **errp)
>  {
>      g_assert(bcontainer->ops->query_dirty_bitmap);
> -    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size);
> +    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size,
> +                                               errp);
>  }
>  
>  void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 6524575aeddcea8470b5fd10caf57475088d1813..475d96eaaa927998c6aa8cc9aa9f2115f5a1efda 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -131,6 +131,7 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>      };
>      bool need_dirty_sync = false;
>      int ret;
> +    Error *local_err = NULL;
>  
>      if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) {
>          if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
> @@ -166,8 +167,9 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>  
>      if (need_dirty_sync) {
>          ret = vfio_get_dirty_bitmap(bcontainer, iova, size,
> -                                    iotlb->translated_addr);
> +                                    iotlb->translated_addr, &local_err);
>          if (ret) {
> +            error_report_err(local_err);
>              return ret;
>          }
>      }
> @@ -237,7 +239,8 @@ vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>  
>  static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>                                            VFIOBitmap *vbmap,
> -                                          hwaddr iova, hwaddr size)
> +                                          hwaddr iova, hwaddr size,
> +                                          Error **errp)
>  {
>      const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                                    bcontainer);
> @@ -265,9 +268,9 @@ static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>      ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
>      if (ret) {
>          ret = -errno;
> -        error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64
> -                " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
> -                (uint64_t)range->size, errno);
> +        error_setg(errp, "Failed to get dirty bitmap for iova: 0x%"PRIx64
> +                   " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova,
> +                   (uint64_t)range->size, errno);
use errno flavour?

Eric
>      }
>  
>      g_free(dbitmap);



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy()
  2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
@ 2024-03-07  9:28   ` Eric Auger
  2024-03-07 13:36     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:28 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/6/24 14:34, Cédric Le Goater wrote:
> vfio_save_complete_precopy() currently returns before doing the trace
> event. Change that.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>  hw/vfio/migration.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index bd48f2ee472a5230c2c84bff829dae1e217db33f..c8aeb43b4249ec76ded2542d62792e8c469d5f97 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -580,9 +580,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>  
>      qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>      ret = qemu_file_get_error(f);
> -    if (ret) {
> -        return ret;
> -    }
>  
>      trace_vfio_save_complete_precopy(vbasedev->name, ret);
it is arguable if you want to trace if an error occured. If you want to
unconditionally trace the function entry, want don't we put the trace at
the beginning of the function?

Eric
>  



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument
  2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
@ 2024-03-07  9:30   ` Eric Auger
  0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:30 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Hi Cédric,

On 3/6/24 14:34, Cédric Le Goater wrote:
> vfio_set_migration_error() sets the 'return' error on the migration
> stream if a migration is in progress. To improve error reporting, add
> a new Error* argument to also set the Error object on the migration
> stream, if a migration is progress.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v4:
> 
>  - Dropped log_global_stop() and log_global_sync() changes
>    
>  hw/vfio/common.c | 39 ++++++++++++++++++++-------------------
>  1 file changed, 20 insertions(+), 19 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 65a11dc088524647541db97b7b8d6f07e5044728..e26574617e5ef75c27a84dc9bb13c8f040353b6c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -148,16 +148,18 @@ bool vfio_viommu_preset(VFIODevice *vbasedev)
>      return vbasedev->bcontainer->space->as != &address_space_memory;
>  }
>  
> -static void vfio_set_migration_error(int err)
> +static void vfio_set_migration_error(int ret, Error *err)
>  {
>      MigrationState *ms = migrate_get_current();
>  
>      if (migration_is_setup_or_active(ms->state)) {
>          WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
>              if (ms->to_dst_file) {
> -                qemu_file_set_error(ms->to_dst_file, err);
> +                qemu_file_set_error_obj(ms->to_dst_file, ret, err);
>              }
>          }
> +    } else {
does that case exist?

Eric
> +        error_report_err(err);
>      }
>  }
>  
> @@ -304,9 +306,10 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>                                  iova, iova + iotlb->addr_mask);
>  
>      if (iotlb->target_as != &address_space_memory) {
> -        error_report("Wrong target AS \"%s\", only system memory is allowed",
> -                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> -        vfio_set_migration_error(-EINVAL);
> +        error_setg(&local_err,
> +                   "Wrong target AS \"%s\", only system memory is allowed",
> +                   iotlb->target_as->name ? iotlb->target_as->name : "none");
> +        vfio_set_migration_error(-EINVAL, local_err);
>          return;
>      }
>  
> @@ -339,11 +342,12 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>          ret = vfio_container_dma_unmap(bcontainer, iova,
>                                         iotlb->addr_mask + 1, iotlb);
>          if (ret) {
> -            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> -                         "0x%"HWADDR_PRIx") = %d (%s)",
> -                         bcontainer, iova,
> -                         iotlb->addr_mask + 1, ret, strerror(-ret));
> -            vfio_set_migration_error(ret);
> +            error_setg(&local_err,
> +                       "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> +                       "0x%"HWADDR_PRIx") = %d (%s)",
> +                       bcontainer, iova,
> +                       iotlb->addr_mask + 1, ret, strerror(-ret));
> +            vfio_set_migration_error(ret, local_err);
>          }
>      }
>  out:
> @@ -1125,8 +1129,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>      if (ret) {
>          error_prepend(&local_err,
>                        "vfio: Could not stop dirty page tracking - ");
> -        error_report_err(local_err);
> -        vfio_set_migration_error(ret);
> +        vfio_set_migration_error(ret, local_err);
>      }
>  }
>  
> @@ -1243,14 +1246,14 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>      trace_vfio_iommu_map_dirty_notify(iova, iova + iotlb->addr_mask);
>  
>      if (iotlb->target_as != &address_space_memory) {
> -        error_report("Wrong target AS \"%s\", only system memory is allowed",
> -                     iotlb->target_as->name ? iotlb->target_as->name : "none");
> +        error_setg(&local_err,
> +                   "Wrong target AS \"%s\", only system memory is allowed",
> +                   iotlb->target_as->name ? iotlb->target_as->name : "none");
>          goto out;
>      }
>  
>      rcu_read_lock();
>      if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
> -        error_report_err(local_err);
>          goto out_lock;
>      }
>  
> @@ -1261,7 +1264,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>                        "vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
>                        "0x%"HWADDR_PRIx") failed :", bcontainer, iova,
>                        iotlb->addr_mask + 1);
> -        error_report_err(local_err);
>      }
>  
>  out_lock:
> @@ -1269,7 +1271,7 @@ out_lock:
>  
>  out:
>      if (ret) {
> -        vfio_set_migration_error(ret);
> +        vfio_set_migration_error(ret, local_err);
>      }
>  }
>  
> @@ -1389,8 +1391,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
>      if (vfio_devices_all_dirty_tracking(bcontainer)) {
>          ret = vfio_sync_dirty_bitmap(bcontainer, section, &local_err);
>          if (ret) {
> -            error_report_err(local_err);
> -            vfio_set_migration_error(ret);
> +            vfio_set_migration_error(ret, local_err);
>          }
>      }
>  }



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup()
  2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
@ 2024-03-07  9:36   ` Eric Auger
  0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2024-03-07  9:36 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/6/24 14:34, Cédric Le Goater wrote:
> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> vfio_save_setup() always sets a new error.
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric

> ---
> 
>  Changes in v4:
> 
>  - Fixed state name printed out in error returned by vfio_save_setup()
>  - Fixed test on error returned by qemu_file_get_error()
>  
>  hw/vfio/migration.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 2050ac8897231ff89cc223f0570d5c7a65dede9e..330b3a28548e32b0b3268072895bb5e4875766a2 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -383,6 +383,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>      VFIODevice *vbasedev = opaque;
>      VFIOMigration *migration = vbasedev->migration;
>      uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
> +    int ret;
>  
>      qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
>  
> @@ -397,13 +398,13 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>      }
>  
>      if (vfio_precopy_supported(vbasedev)) {
> -        int ret;
> -
>          switch (migration->device_state) {
>          case VFIO_DEVICE_STATE_RUNNING:
>              ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
>                                             VFIO_DEVICE_STATE_RUNNING);
>              if (ret) {
> +                error_report("%s: Failed to set new PRE_COPY state",
> +                             vbasedev->name);
>                  return ret;
>              }
>  
> @@ -414,6 +415,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>              /* vfio_save_complete_precopy() will go to STOP_COPY */
>              break;
>          default:
> +            error_report("%s: Invalid device state %d", vbasedev->name,
> +                         migration->device_state);
>              return -EINVAL;
>          }
>      }
> @@ -422,7 +425,13 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>  
>      qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>  
> -    return qemu_file_get_error(f);
> +    ret = qemu_file_get_error(f);
> +    if (ret < 0) {
> +        error_report("%s: save setup failed : %s", vbasedev->name,
> +                     strerror(-ret));
> +    }
> +
> +    return ret;
>  }
>  
>  static void vfio_save_cleanup(void *opaque)



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
@ 2024-03-07  9:53   ` Vladimir Sementsov-Ogievskiy
  2024-03-07 10:31     ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-03-07  9:53 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Nicholas Piggin,
	Harsh Prateek Bora, Halil Pasic, Thomas Huth, Eric Blake,
	John Snow, Stefan Hajnoczi

On 06.03.24 16:34, Cédric Le Goater wrote:
> The purpose is to record a potential error in the migration stream if
> qemu_savevm_state_setup() fails. Most of the current .save_setup()
> handlers can be modified to use the Error argument instead of managing
> their own and calling locally error_report().
> 
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Cc: John Snow <jsnow@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Still, if you resend, please add error_prepend in the case below:

> diff --git a/migration/savevm.c b/migration/savevm.c
> index 63fdbb5ad7d4dbfaef1d2094350bf302cc677602..52d35b2a72c6238bfe5dcb4d81c1af8d2bf73013 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1342,11 +1342,9 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>           }
>           save_section_header(f, se, QEMU_VM_SECTION_START);
>   
> -        ret = se->ops->save_setup(f, se->opaque);
> +        ret = se->ops->save_setup(f, se->opaque, errp);
>           save_section_footer(f, se);
>           if (ret < 0) {
> -            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
> -                       "%d(%s): %d", se->section_id, se->idstr, ret);

You drop a good bit of information, let's use error_prepend to save it.

>               qemu_file_set_error(f, ret);
>               break;

Not about this patch:

Better do explicit "return ret" instead of this "break" (and one more break above in that loop):

1. making a jump to do just do "return ret" seems overkill. It would make sense if we had some more "cleanup" code than just a "return ret", and if so, more classic and readable thing is "goto fail;".
2. "break" make me think, that there may be more logic after it, which will probably fail, and I should be careful, as errp is already set (and second attempt to set it will crash). Again, "goto fail;" is better, as I don't expect more failures when see it.

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-07  9:53   ` Vladimir Sementsov-Ogievskiy
@ 2024-03-07 10:31     ` Cédric Le Goater
  2024-03-07 11:39       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 10:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Nicholas Piggin,
	Harsh Prateek Bora, Halil Pasic, Thomas Huth, Eric Blake,
	John Snow, Stefan Hajnoczi

On 3/7/24 10:53, Vladimir Sementsov-Ogievskiy wrote:
> On 06.03.24 16:34, Cédric Le Goater wrote:
>> The purpose is to record a potential error in the migration stream if
>> qemu_savevm_state_setup() fails. Most of the current .save_setup()
>> handlers can be modified to use the Error argument instead of managing
>> their own and calling locally error_report().
>>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> Cc: Halil Pasic <pasic@linux.ibm.com>
>> Cc: Thomas Huth <thuth@redhat.com>
>> Cc: Eric Blake <eblake@redhat.com>
>> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Cc: John Snow <jsnow@redhat.com>
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Reviewed-by: Thomas Huth <thuth@redhat.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> Still, if you resend, please add error_prepend in the case below:
> 
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 63fdbb5ad7d4dbfaef1d2094350bf302cc677602..52d35b2a72c6238bfe5dcb4d81c1af8d2bf73013 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1342,11 +1342,9 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>           }
>>           save_section_header(f, se, QEMU_VM_SECTION_START);
>> -        ret = se->ops->save_setup(f, se->opaque);
>> +        ret = se->ops->save_setup(f, se->opaque, errp);
>>           save_section_footer(f, se);
>>           if (ret < 0) {
>> -            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
>> -                       "%d(%s): %d", se->section_id, se->idstr, ret);
> 
> You drop a good bit of information, let's use error_prepend to save it.

I kind of agree but the call stack is quite deep and the callees also use
error_prepend. The error becomes quite long. Here's an example of what we
get today :

   (qemu) migrate -d tcp:10.8.3.15:1234
   (qemu)
   (qemu) qemu-system-x86_64: vfio: Could not start dirty page tracking - 0000:b1:00.2: Failed to start DMA logging: Invalid argument

If the subsystems implementing a .save_setup() handler use a component
identifier, the failure should be fairly easy to identify.

What's the best practice for such cases ?

Can we use multiline errors maybe ? Less practical for grep though.

May be a verbose error mode would help getting more information ?

Anyhow, I can add a new trace event for "failed to setup SaveStateEntry ... "
or use error_prepend() as you suggested.

Let's see what the others have to say.


> 
>>               qemu_file_set_error(f, ret);
>>               break;
> 
> Not about this patch:
> 
> Better do explicit "return ret" instead of this "break" (and one more break above in that loop):
>
> 1. making a jump to do just do "return ret" seems overkill. It would make sense if we had some more "cleanup" code than just a "return ret", and if so, more classic and readable thing is "goto fail;".
> 2. "break" make me think, that there may be more logic after it, which will probably fail, and I should be careful, as errp is already set (and second attempt to set it will crash). Again, "goto fail;" is better, as I don't expect more failures when see it.

Sure. If I respin, I can drop the break and simply return. Although,
I would be glad to have most of this series merged in QEMU 9.0. So,
unless there is something major, I will keep that for followups.


Thanks for the review,

C.








^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-07 10:31     ` Cédric Le Goater
@ 2024-03-07 11:39       ` Vladimir Sementsov-Ogievskiy
  2024-03-08  7:11         ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-03-07 11:39 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Nicholas Piggin,
	Harsh Prateek Bora, Halil Pasic, Thomas Huth, Eric Blake,
	John Snow, Stefan Hajnoczi

On 07.03.24 13:31, Cédric Le Goater wrote:
> On 3/7/24 10:53, Vladimir Sementsov-Ogievskiy wrote:
>> On 06.03.24 16:34, Cédric Le Goater wrote:
>>> The purpose is to record a potential error in the migration stream if
>>> qemu_savevm_state_setup() fails. Most of the current .save_setup()
>>> handlers can be modified to use the Error argument instead of managing
>>> their own and calling locally error_report().
>>>
>>> Cc: Nicholas Piggin <npiggin@gmail.com>
>>> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>> Cc: Halil Pasic <pasic@linux.ibm.com>
>>> Cc: Thomas Huth <thuth@redhat.com>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>> Cc: John Snow <jsnow@redhat.com>
>>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>> Reviewed-by: Thomas Huth <thuth@redhat.com>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>
>> Still, if you resend, please add error_prepend in the case below:
>>
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index 63fdbb5ad7d4dbfaef1d2094350bf302cc677602..52d35b2a72c6238bfe5dcb4d81c1af8d2bf73013 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -1342,11 +1342,9 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>>           }
>>>           save_section_header(f, se, QEMU_VM_SECTION_START);
>>> -        ret = se->ops->save_setup(f, se->opaque);
>>> +        ret = se->ops->save_setup(f, se->opaque, errp);
>>>           save_section_footer(f, se);
>>>           if (ret < 0) {
>>> -            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
>>> -                       "%d(%s): %d", se->section_id, se->idstr, ret);
>>
>> You drop a good bit of information, let's use error_prepend to save it.
> 
> I kind of agree but the call stack is quite deep and the callees also use
> error_prepend. The error becomes quite long. Here's an example of what we
> get today :
> 
>    (qemu) migrate -d tcp:10.8.3.15:1234
>    (qemu)
>    (qemu) qemu-system-x86_64: vfio: Could not start dirty page tracking - 0000:b1:00.2: Failed to start DMA logging: Invalid argument
> 
> If the subsystems implementing a .save_setup() handler use a component
> identifier, the failure should be fairly easy to identify.
> 
> What's the best practice for such cases ?
> 
> Can we use multiline errors maybe ? Less practical for grep though.
> 
> May be a verbose error mode would help getting more information ?
> 
> Anyhow, I can add a new trace event for "failed to setup SaveStateEntry ... "
> or use error_prepend() as you suggested.
> 
> Let's see what the others have to say.
> 
> 
>>
>>>               qemu_file_set_error(f, ret);
>>>               break;
>>
>> Not about this patch:
>>
>> Better do explicit "return ret" instead of this "break" (and one more break above in that loop):
>>
>> 1. making a jump to do just do "return ret" seems overkill. It would make sense if we had some more "cleanup" code than just a "return ret", and if so, more classic and readable thing is "goto fail;".
>> 2. "break" make me think, that there may be more logic after it, which will probably fail, and I should be careful, as errp is already set (and second attempt to set it will crash). Again, "goto fail;" is better, as I don't expect more failures when see it.
> 
> Sure. If I respin, I can drop the break and simply return. 

If so, you should also make simple return instead of one another break in same loop. And drop "if (ret < 0) { return ret }" after loop.

> Although,
> I would be glad to have most of this series merged in QEMU 9.0. So,
> unless there is something major, I will keep that for followups.
> 

Agree

> 
> Thanks for the review,
> 
> C.
> 
> 
> 
> 
> 
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler
  2024-03-07  8:09   ` Eric Auger
@ 2024-03-07 12:06     ` Cédric Le Goater
  2024-03-08  7:39       ` Eric Auger
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 12:06 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 09:09, Eric Auger wrote:
> Hi Cédric,
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> We will use the Error object to improve error reporting in the
>> .log_global*() handlers of VFIO. Add documentation while at it.
>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v3:
>>
>>   - Use error_setg_errno() in vfio_legacy_set_dirty_page_tracking()
>>   
>>   include/hw/vfio/vfio-container-base.h | 18 ++++++++++++++++--
>>   hw/vfio/common.c                      |  4 ++--
>>   hw/vfio/container-base.c              |  4 ++--
>>   hw/vfio/container.c                   |  6 +++---
>>   4 files changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index 3582d5f97a37877b2adfc0d0b06996c82403f8b7..c76984654a596e3016a8cf833e10143eb872e102 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -82,7 +82,7 @@ int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>>   void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>                                          MemoryRegionSection *section);
>>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>> -                                           bool start);
>> +                                           bool start, Error **errp);
>>   int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>>                                         VFIOBitmap *vbmap,
>>                                         hwaddr iova, hwaddr size);
>> @@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
>>       int (*attach_device)(const char *name, VFIODevice *vbasedev,
>>                            AddressSpace *as, Error **errp);
>>       void (*detach_device)(VFIODevice *vbasedev);
>> +
>>       /* migration feature */
>> +
>> +    /**
>> +     * @set_dirty_page_tracking
>> +     *
>> +     * Start or stop dirty pages tracking on VFIO container
>> +     *
>> +     * @bcontainer: #VFIOContainerBase on which to de/activate dirty
>> +     *              pages tracking
> s/pages/page?

yep

> for my education is the "#"VFIOContainerBase formalized somewhere?

It's QEMU specific. See 4cf41794411f ("docs: tweak kernel-doc for QEMU
coding standards").

> +     * @start: indicates whether to start or stop dirty pages tracking
>> +     * @errp: pointer to Error*, to store an error if it happens.
>> +     *
>> +     * Returns zero to indicate success and negative for error
>> +     */
>>       int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
>> -                                   bool start);
>> +                                   bool start, Error **errp);
>>       int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
>>                                 VFIOBitmap *vbmap,
>>                                 hwaddr iova, hwaddr size);
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4..5598a508399a6c0b3a20ba17311cbe83d84250c5 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1085,7 +1085,7 @@ static bool vfio_listener_log_global_start(MemoryListener *listener,
>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>           ret = vfio_devices_dma_logging_start(bcontainer);
>>       } else {
>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
> It is not obvious why we don't pass errp here. Also there is ana
> error_report below. Why isn't the error propagated? (not related to your
> patch though)

When I started this series, I was trying to find a way to introduce
progressively the changes and this patch is preparing ground for
what is coming next. It could be merged with the following if you prefer.


Thanks,

C.




>>       }
>>   
>>       if (ret) {
>> @@ -1105,7 +1105,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>           vfio_devices_dma_logging_stop(bcontainer);
>>       } else {
>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
>>       }
>>   
>>       if (ret) {
>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>> index 913ae49077c4f09b7b27517c1231cfbe4befb7fb..7c0764121d24b02b6c4e66e368d7dff78a6d65aa 100644
>> --- a/hw/vfio/container-base.c
>> +++ b/hw/vfio/container-base.c
>> @@ -53,14 +53,14 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>   }
>>   
>>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>> -                                           bool start)
>> +                                           bool start, Error **errp)
>>   {
>>       if (!bcontainer->dirty_pages_supported) {
>>           return 0;
>>       }
>>   
>>       g_assert(bcontainer->ops->set_dirty_page_tracking);
>> -    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
>> +    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start, errp);
>>   }
>>   
>>   int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 096d77eac3946a9c38fc2a98116b93353f71f06e..6524575aeddcea8470b5fd10caf57475088d1813 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -210,7 +210,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>>   
>>   static int
>>   vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>> -                                    bool start)
>> +                                    bool start, Error **errp)
>>   {
>>       const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>>                                                     bcontainer);
>> @@ -228,8 +228,8 @@ vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
>>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>>       if (ret) {
>>           ret = -errno;
>> -        error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>> -                     dirty.flags, errno);
>> +        error_setg_errno(errp, errno, "Failed to set dirty tracking flag 0x%x",
>> +                         dirty.flags);
>>       }
>>   
>>       return ret;
> 
> Thanks
> 
> Eric
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler
  2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
@ 2024-03-07 12:18   ` Fabiano Rosas
  2024-03-08  8:11   ` Peter Xu
  2024-03-08  8:45   ` Thomas Huth
  2 siblings, 0 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-07 12:18 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Halil Pasic, Christian Borntraeger, Thomas Huth

Cédric Le Goater <clg@redhat.com> writes:

> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> set_migrationmode() always sets a new error. See the Rules section in
> qapi/error.h.
>
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 07/25] migration: Always report an error in block_save_setup()
  2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
@ 2024-03-07 12:28   ` Fabiano Rosas
  2024-03-08  6:59   ` Peter Xu
  1 sibling, 0 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-07 12:28 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater,
	Stefan Hajnoczi

Cédric Le Goater <clg@redhat.com> writes:

> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> block_save_setup() always sets a new error.
>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 08/25] migration: Always report an error in ram_save_setup()
  2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
@ 2024-03-07 12:28   ` Fabiano Rosas
  0 siblings, 0 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-07 12:28 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Cédric Le Goater <clg@redhat.com> writes:

> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> ram_save_setup() sets a new error.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
@ 2024-03-07 12:45   ` Fabiano Rosas
  2024-03-08 12:56   ` Peter Xu
  2024-03-08 14:36   ` Fabiano Rosas
  2 siblings, 0 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-07 12:45 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Cédric Le Goater <clg@redhat.com> writes:

> This prepares ground for the changes coming next which add an Error**
> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> now handle the error and fail earlier setting the migration state from
> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>
> In qemu_savevm_state(), move the cleanup to preserve the error
> reported by .save_setup() handlers.
>
> Since the previous behavior was to ignore errors at this step of
> migration, this change should be examined closely to check that
> cleanups are still correctly done.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start()
  2024-03-07  8:15   ` Eric Auger
@ 2024-03-07 13:15     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 13:15 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 09:15, Eric Auger wrote:
> Hi Cédric,
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> This allows to update the Error argument of the VFIO log_global_start()
>> handler. Errors detected when device level logging is started will be
>> propagated up to qemu_savevm_state_setup() when the ram save_setup()
>> handler is executed.
>>
>> The vfio_set_migration_error() call becomes redundant. Remove it.
> you may precise it becomes redundant in vfio_listener_log_global_start()
> because it is kept in vfio_listener_log_global_stop

Yes. This is a leftover from v3 which still had the changes for
the .log_global_stop() handlers.


>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v4:
>>
>>   - Dropped log_global_stop() and log_global_sync() changes
>>     
>>   Changes in v3:
>>
>>   - Use error_setg_errno() in vfio_devices_dma_logging_start()
>>   - ERRP_GUARD() because of error_prepend use in
>>     vfio_listener_log_global_start()
>>     
>>   hw/vfio/common.c | 25 ++++++++++++++-----------
>>   1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5598a508399a6c0b3a20ba17311cbe83d84250c5..d6790557da2f2890398fa03dbbef18129cd2c1bb 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1036,7 +1036,8 @@ static void vfio_device_feature_dma_logging_start_destroy(
>>       g_free(feature);
>>   }
>>   
>> -static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
>> +static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>> +                                          Error **errp)
>>   {
>>       struct vfio_device_feature *feature;
>>       VFIODirtyRanges ranges;
>> @@ -1058,8 +1059,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
>>           ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
>>           if (ret) {
>>               ret = -errno;
> there is another case of error if !feature. Don't we want t o set errp
> in that case as well?

arf. How did I miss that ... Will fix.

> I think in general we should try to make the return value and the errp
> consistent because the caller may try to exploit the errp in case or
> negative returned value.

yes. That's the goal.

Thanks,

C.


>> -            error_report("%s: Failed to start DMA logging, err %d (%s)",
>> -                         vbasedev->name, ret, strerror(errno));
>> +            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
>> +                             vbasedev->name);
>>               goto out;
>>           }
>>           vbasedev->dirty_tracking = true;
>> @@ -1078,20 +1079,19 @@ out:
>>   static bool vfio_listener_log_global_start(MemoryListener *listener,
>>                                              Error **errp)
>>   {
>> +    ERRP_GUARD(); /* error_prepend use */
>>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>                                                    listener);
>>       int ret;
>>   
>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>> -        ret = vfio_devices_dma_logging_start(bcontainer);
>> +        ret = vfio_devices_dma_logging_start(bcontainer, errp);
>>       } else {
>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, NULL);
>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, true, errp);
>>       }
>>   
>>       if (ret) {
>> -        error_report("vfio: Could not start dirty page tracking, err: %d (%s)",
>> -                     ret, strerror(-ret));
>> -        vfio_set_migration_error(ret);
>> +        error_prepend(errp, "vfio: Could not start dirty page tracking - ");
>>       }
>>       return !ret;
>>   }
>> @@ -1100,17 +1100,20 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>   {
>>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>                                                    listener);
>> +    Error *local_err = NULL;
>>       int ret = 0;
>>   
>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>           vfio_devices_dma_logging_stop(bcontainer);
>>       } else {
>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, false, NULL);
>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
>> +                                                     &local_err);
>>       }
>>   
>>       if (ret) {
>> -        error_report("vfio: Could not stop dirty page tracking, err: %d (%s)",
>> -                     ret, strerror(-ret));
>> +        error_prepend(&local_err,
>> +                      "vfio: Could not stop dirty page tracking - ");
>> +        error_report_err(local_err);
>>           vfio_set_migration_error(ret);
>>       }
>>   }
> Eric
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup()
  2024-03-07  9:04   ` Eric Auger
@ 2024-03-07 13:35     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 13:35 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 10:04, Eric Auger wrote:
> Hi Cédric,
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> Add an Error** argument to vfio_migration_set_state() and adjust
>> callers, including vfio_save_setup(). The error will be propagated up
>> to qemu_savevm_state_setup() where the save_setup() handler is
>> executed.
>>
>> Modify vfio_vmstate_change_prepare() and vfio_vmstate_change() to
>> store a reported error under the migration stream if a migration is in
>> progress.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v3:
>>
>>   - Use error_setg_errno() in vfio_save_setup()
>>   - Made sure an error is always set in case of failure in
>>     vfio_load_setup()
>>     
>>   hw/vfio/migration.c | 67 ++++++++++++++++++++++++++-------------------
>>   1 file changed, 39 insertions(+), 28 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index a3bb1a92ba0b9c2c585efe54cfda0b774a81dcb9..71ade14a7942358094371a86c00718f5979113ea 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -84,7 +84,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
>>   
>>   static int vfio_migration_set_state(VFIODevice *vbasedev,
>>                                       enum vfio_device_mig_state new_state,
>> -                                    enum vfio_device_mig_state recover_state)
>> +                                    enum vfio_device_mig_state recover_state,
>> +                                    Error **errp)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>       uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
>> @@ -104,15 +105,15 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>>           ret = -errno;
>>   
>>           if (recover_state == VFIO_DEVICE_STATE_ERROR) {
>> -            error_report("%s: Failed setting device state to %s, err: %s. "
>> -                         "Recover state is ERROR. Resetting device",
>> -                         vbasedev->name, mig_state_to_str(new_state),
>> -                         strerror(errno));
>> +            error_setg(errp, "%s: Failed setting device state to %s, err: %s. "
>> +                       "Recover state is ERROR. Resetting device",
>> +                       vbasedev->name, mig_state_to_str(new_state),
>> +                       strerror(errno));
> you can use the error_setg_errno variant here and below.

sure.


>>   
>>               goto reset_device;
>>           }
>>   
>> -        error_report(
>> +        error_setg(errp,
>>               "%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
>>                        vbasedev->name, mig_state_to_str(new_state),
>>                        strerror(errno), mig_state_to_str(recover_state));
>> @@ -120,7 +121,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>>           mig_state->device_state = recover_state;
>>           if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>>               ret = -errno;
>> -            error_report(
>> +            error_setg(errp,
>>                   "%s: Failed setting device in recover state, err: %s. Resetting device",
>>                            vbasedev->name, strerror(errno));
>>   
>> @@ -139,7 +140,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>>                * This can happen if the device is asynchronously reset and
>>                * terminates a data transfer.
>>                */
>> -            error_report("%s: data_fd out of sync", vbasedev->name);
>> +            error_setg(errp, "%s: data_fd out of sync", vbasedev->name);
>>               close(mig_state->data_fd);
>>   
>>               return -EBADF;
>> @@ -170,10 +171,11 @@ reset_device:
>>    */
>>   static int
>>   vfio_migration_set_state_or_reset(VFIODevice *vbasedev,
>> -                                  enum vfio_device_mig_state new_state)
>> +                                  enum vfio_device_mig_state new_state,
>> +                                  Error **errp)
>>   {
>>       return vfio_migration_set_state(vbasedev, new_state,
>> -                                    VFIO_DEVICE_STATE_ERROR);
>> +                                    VFIO_DEVICE_STATE_ERROR, errp);
>>   }
>>   
>>   static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>> @@ -401,10 +403,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
>>           switch (migration->device_state) {
>>           case VFIO_DEVICE_STATE_RUNNING:
>>               ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
>> -                                           VFIO_DEVICE_STATE_RUNNING);
>> +                                           VFIO_DEVICE_STATE_RUNNING, errp);
>>               if (ret) {
>> -                error_setg(errp, "%s: Failed to set new PRE_COPY state",
>> -                           vbasedev->name);
>>                   return ret;
>>               }
>>   
>> @@ -437,13 +437,20 @@ static void vfio_save_cleanup(void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>>       VFIOMigration *migration = vbasedev->migration;
>> +    Error *local_err = NULL;
>> +    int ret;
>>   
>>       /*
>>        * Changing device state from STOP_COPY to STOP can take time. Do it here,
>>        * after migration has completed, so it won't increase downtime.
>>        */
>>       if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
>> -        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_STOP);
>> +        ret = vfio_migration_set_state_or_reset(vbasedev,
>> +                                                VFIO_DEVICE_STATE_STOP,
>> +                                                &local_err);
>> +        if (ret) {
>> +            error_report_err(local_err);
>> +        }
>>       }
>>   
>>       g_free(migration->data_buffer);
>> @@ -549,11 +556,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>       VFIODevice *vbasedev = opaque;
>>       ssize_t data_size;
>>       int ret;
>> +    Error *local_err = NULL;
>>   
>>       /* We reach here with device state STOP or STOP_COPY only */
>>       ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
>> -                                   VFIO_DEVICE_STATE_STOP);
>> +                                   VFIO_DEVICE_STATE_STOP, &local_err);
>>       if (ret) {
>> +        error_report_err(local_err);
>>           return ret;
>>       }
>>   
>> @@ -591,14 +600,9 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
>>   static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> -    int ret;
>>   
>> -    ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
>> -                                   vbasedev->migration->device_state);
>> -    if (ret) {
>> -        error_setg(errp, "%s: Failed to set RESUMING state", vbasedev->name);
>> -    }
>> -    return ret;
>> +    return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
>> +                                    vbasedev->migration->device_state, errp);
>>   }
>>   
>>   static int vfio_load_cleanup(void *opaque)
>> @@ -714,20 +718,22 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running,
>>       VFIODevice *vbasedev = opaque;
>>       VFIOMigration *migration = vbasedev->migration;
>>       enum vfio_device_mig_state new_state;
>> +    Error *local_err = NULL;
>>       int ret;
>>   
>>       new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
>>                       VFIO_DEVICE_STATE_PRE_COPY_P2P :
>>                       VFIO_DEVICE_STATE_RUNNING_P2P;
>>   
>> -    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
>> +    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
>>       if (ret) {
>>           /*
>>            * Migration should be aborted in this case, but vm_state_notify()
>>            * currently does not support reporting failures.
>>            */
> if ret and we do not enter the condition below, we may leak the
> local_err. Same below.

yes. I will export and use vfio_set_migration_error() from common.c instead.


Thanks,

C.




> 
> Eric
>>           if (migrate_get_current()->to_dst_file) {
>> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>> +            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
>> +                                    local_err);
>>           }
>>       }
>>   
>> @@ -740,6 +746,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>>   {
>>       VFIODevice *vbasedev = opaque;
>>       enum vfio_device_mig_state new_state;
>> +    Error *local_err = NULL;
>>       int ret;
>>   
>>       if (running) {
>> @@ -752,14 +759,15 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>>                   VFIO_DEVICE_STATE_STOP;
>>       }
>>   
>> -    ret = vfio_migration_set_state_or_reset(vbasedev, new_state);
>> +    ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
>>       if (ret) {
>>           /*
>>            * Migration should be aborted in this case, but vm_state_notify()
>>            * currently does not support reporting failures.
>>            */
>>           if (migrate_get_current()->to_dst_file) {
>> -            qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
>> +            qemu_file_set_error_obj(migrate_get_current()->to_dst_file, ret,
>> +                                    local_err);
>>           }
>>       }
>>   
>> @@ -773,13 +781,16 @@ static int vfio_migration_state_notifier(NotifierWithReturn *notifier,
>>       VFIOMigration *migration = container_of(notifier, VFIOMigration,
>>                                               migration_state);
>>       VFIODevice *vbasedev = migration->vbasedev;
>> +    int ret = 0;
>>   
>>       trace_vfio_migration_state_notifier(vbasedev->name, e->type);
>>   
>>       if (e->type == MIG_EVENT_PRECOPY_FAILED) {
>> -        vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
>> +        ret = vfio_migration_set_state_or_reset(vbasedev,
>> +                                                VFIO_DEVICE_STATE_RUNNING,
>> +                                                errp);
>>       }
>> -    return 0;
>> +    return ret;
>>   }
>>   
>>   static void vfio_migration_free(VFIODevice *vbasedev)
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy()
  2024-03-07  9:28   ` Eric Auger
@ 2024-03-07 13:36     ` Cédric Le Goater
  2024-03-08  7:42       ` Eric Auger
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 13:36 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 10:28, Eric Auger wrote:
> 
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> vfio_save_complete_precopy() currently returns before doing the trace
>> event. Change that.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   hw/vfio/migration.c | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index bd48f2ee472a5230c2c84bff829dae1e217db33f..c8aeb43b4249ec76ded2542d62792e8c469d5f97 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -580,9 +580,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>   
>>       qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>       ret = qemu_file_get_error(f);
>> -    if (ret) {
>> -        return ret;
>> -    }
>>   
>>       trace_vfio_save_complete_precopy(vbasedev->name, ret);
> it is arguable if you want to trace if an error occured. If you want to
> unconditionally trace the function entry, want don't we put the trace at
> the beginning of the function?

But, then, the 'ret' value is not set and the trace event is less useful.
I'd rather keep it that way.

Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler
  2024-03-07  9:13   ` Eric Auger
@ 2024-03-07 13:55     ` Cédric Le Goater
  2024-03-08  7:41       ` Eric Auger
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 13:55 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 10:13, Eric Auger wrote:
> 
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> Use vmstate_save_state_with_err() to improve error reporting in the
>> callers and store a reported error under the migration stream. Add
>> documentation while at it.
>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>   include/hw/vfio/vfio-common.h | 25 ++++++++++++++++++++++++-
>>   hw/vfio/migration.c           | 18 ++++++++++++------
>>   hw/vfio/pci.c                 |  5 +++--
>>   3 files changed, 39 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index b9da6c08ef41174610eb92726c590309a53696a3..46f88493634b5634a9c14a5caa33a463fbf2c50d 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -133,7 +133,30 @@ struct VFIODeviceOps {
>>       int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>>       void (*vfio_eoi)(VFIODevice *vdev);
>>       Object *(*vfio_get_object)(VFIODevice *vdev);
>> -    void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
>> +
>> +    /**
>> +     * @vfio_save_config
>> +     *
>> +     * Save device config state
>> +     *
>> +     * @vdev: #VFIODevice for which to save the config
>> +     * @f: #QEMUFile where to send the data
>> +     * @errp: pointer to Error*, to store an error if it happens.
>> +     *
>> +     * Returns zero to indicate success and negative for error
>> +     */
>> +    int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error **errp);
>> +
>> +    /**
>> +     * @vfio_load_config
>> +     *
>> +     * Load device config state
>> +     *
>> +     * @vdev: #VFIODevice for which to load the config
>> +     * @f: #QEMUFile where to get the data
>> +     *
>> +     * Returns zero to indicate success and negative for error
>> +     */
>>       int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
>>   };
>>   
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 71ade14a7942358094371a86c00718f5979113ea..bd48f2ee472a5230c2c84bff829dae1e217db33f 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -190,14 +190,19 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
>>       return ret;
>>   }
>>   
>> -static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
>> +static int vfio_save_device_config_state(QEMUFile *f, void *opaque,
>> +                                         Error **errp)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> +    int ret;
>>   
>>       qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE);
>>   
>>       if (vbasedev->ops && vbasedev->ops->vfio_save_config) {
>> -        vbasedev->ops->vfio_save_config(vbasedev, f);
>> +        ret = vbasedev->ops->vfio_save_config(vbasedev, f, errp);
>> +        if (ret) {
> I am not familiar enough with that case but don't you still want to set
> the VFIO_MIG_FLAG_END_OF_STATE to "close" the state?

This is a delimiter used on the target side when loading the state.

When QEMU fails to capture the device state, the whole migration is marked
as in error. There is no need to end cleanly the device state, it is bogus
anyhow.

Thanks,

C.


> 
> Eric
>> +            return ret;
>> +        }
>>       }
>>   
>>       qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> @@ -587,13 +592,14 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>>   static void vfio_save_state(QEMUFile *f, void *opaque)
>>   {
>>       VFIODevice *vbasedev = opaque;
>> +    Error *local_err = NULL;
>>       int ret;
>>   
>> -    ret = vfio_save_device_config_state(f, opaque);
>> +    ret = vfio_save_device_config_state(f, opaque, &local_err);
>>       if (ret) {
>> -        error_report("%s: Failed to save device config space",
>> -                     vbasedev->name);
>> -        qemu_file_set_error(f, ret);
>> +        error_prepend(&local_err, "%s: Failed to save device config space",
>> +                      vbasedev->name);
>> +        qemu_file_set_error_obj(f, ret, local_err);
>>       }
>>   }
>>   
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 4fa387f0430d62ca2ba1b5ae5b7037f8f06b33f9..99d86e1d40ef25133fc76ad6e58294b07bd20843 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2585,11 +2585,12 @@ const VMStateDescription vmstate_vfio_pci_config = {
>>       }
>>   };
>>   
>> -static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
>> +static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f, Error **errp)
>>   {
>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>   
>> -    vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL);
>> +    return vmstate_save_state_with_err(f, &vmstate_vfio_pci_config, vdev, NULL,
>> +                                       errp);
>>   }
>>   
>>   static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop()
  2024-03-07  8:53   ` Eric Auger
@ 2024-03-07 14:05     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-07 14:05 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/7/24 09:53, Eric Auger wrote:
> 
> 
> On 3/6/24 14:34, Cédric Le Goater wrote:
>> This improves error reporting in the log_global_stop() VFIO handler.
>>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v4:
>>
>>   - Dropped log_global_stop() and log_global_sync() changes
>>     
>>   Changes in v3:
>>
>>   - Use error_setg_errno() in vfio_devices_dma_logging_stop()
>>   
>>   hw/vfio/common.c | 19 ++++++++++++++-----
>>   1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index d6790557da2f2890398fa03dbbef18129cd2c1bb..5b2e6a179cdd5f8ca5be84b7097661e96b391456 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -938,12 +938,14 @@ static void vfio_dirty_tracking_init(VFIOContainerBase *bcontainer,
>>       memory_listener_unregister(&dirty.listener);
>>   }
>>   
>> -static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>> +static int vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer,
>> +                                          Error **errp)
>>   {
>>       uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>>                                 sizeof(uint64_t))] = {};
>>       struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>>       VFIODevice *vbasedev;
>> +    int ret = 0;
>>   
>>       feature->argsz = sizeof(buf);
>>       feature->flags = VFIO_DEVICE_FEATURE_SET |
>> @@ -955,11 +957,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>>           }
>>   
>>           if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
>> -            warn_report("%s: Failed to stop DMA logging, err %d (%s)",
>> -                        vbasedev->name, -errno, strerror(errno));
>> +            /* Keep first error */
>> +            if (!ret) {
>> +                ret = -errno;
>> +                error_setg_errno(errp, errno, "%s: Failed to stop DMA logging",
>> +                                 vbasedev->name);
> maybe you can keep the previous warn_report in case errp is NULL
> (rollback) or for subsequent failures?

Hmm, I wonder if we should keep this patch. It made sense when
vfio_listener_log_global_stop() had an Error ** parameter. We
dropped it in v4 for the sake of simplicity, so we might as
well keep the previous behavior and simply warn the user when
dirty tracking fails to stop. I will check for v5.

Thanks,

C.



> 
> Eric
>> +            }
>>           }
>>           vbasedev->dirty_tracking = false;
>>       }
>> +
>> +    return ret;
>>   }
>>   
>>   static struct vfio_device_feature *
>> @@ -1068,7 +1076,8 @@ static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>>   
>>   out:
>>       if (ret) {
>> -        vfio_devices_dma_logging_stop(bcontainer);
>> +        /* Ignore the potential errors when doing rollback */
>> +        vfio_devices_dma_logging_stop(bcontainer, NULL);
>>       }
>>   
>>       vfio_device_feature_dma_logging_start_destroy(feature);
>> @@ -1104,7 +1113,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>>       int ret = 0;
>>   
>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>> -        vfio_devices_dma_logging_stop(bcontainer);
>> +        ret = vfio_devices_dma_logging_stop(bcontainer, &local_err);
>>       } else {
>>           ret = vfio_container_set_dirty_page_tracking(bcontainer, false,
>>                                                        &local_err);
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 07/25] migration: Always report an error in block_save_setup()
  2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
  2024-03-07 12:28   ` Fabiano Rosas
@ 2024-03-08  6:59   ` Peter Xu
  2024-03-11 15:22     ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-08  6:59 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefan Hajnoczi

On Wed, Mar 06, 2024 at 02:34:22PM +0100, Cédric Le Goater wrote:
> @@ -404,6 +403,10 @@ static int init_blk_migration(QEMUFile *f)
>          sectors = bdrv_nb_sectors(bs);
>          if (sectors <= 0) {

Not directly relevant to this patch, but just to mention that this looks
suspicious (even if I know nothing about block migration..) - I am not sure
whether any block drive would return 0 here, if so it looks still like a
problem if we do the cleanup, ignoring the rest and return a success.

>              ret = sectors;
> +            if (ret < 0) {
> +                error_setg(errp, "Error getting length of block device %s",
> +                           bdrv_get_device_name(bs));
> +            }
>              bdrv_next_cleanup(&it);
>              goto out;
>          }

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-07 11:39       ` Vladimir Sementsov-Ogievskiy
@ 2024-03-08  7:11         ` Peter Xu
  2024-03-08  8:08           ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-08  7:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Cédric Le Goater
  Cc: Cédric Le Goater, qemu-devel, Fabiano Rosas,
	Alex Williamson, Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Nicholas Piggin,
	Harsh Prateek Bora, Halil Pasic, Thomas Huth, Eric Blake,
	John Snow, Stefan Hajnoczi

On Thu, Mar 07, 2024 at 02:39:31PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > I would be glad to have most of this series merged in QEMU 9.0. So,
> > unless there is something major, I will keep that for followups.

Unfortunately I found this series won't apply to master.. starting from
"migration: Always report an error in ram_save_setup()".  Perhaps forgot to
pull before the repost?

It'll also be nice if we can get an ACK for the s390 patch from a
maintainer.

Cedric, would you prefer a repost before this weekend, or we just wait for
9.1?  IMHO we don't need to rush this in 9.0 if it's still partially done,
so the latter option isn't that bad (I've already queued the initial four
irrelevant of that).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler
  2024-03-07 12:06     ` Cédric Le Goater
@ 2024-03-08  7:39       ` Eric Auger
  2024-03-08 13:00         ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Eric Auger @ 2024-03-08  7:39 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/7/24 13:06, Cédric Le Goater wrote:
> On 3/7/24 09:09, Eric Auger wrote:
>> Hi Cédric,
>>
>> On 3/6/24 14:34, Cédric Le Goater wrote:
>>> We will use the Error object to improve error reporting in the
>>> .log_global*() handlers of VFIO. Add documentation while at it.
>>>
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>
>>>   Changes in v3:
>>>
>>>   - Use error_setg_errno() in vfio_legacy_set_dirty_page_tracking()
>>>     include/hw/vfio/vfio-container-base.h | 18 ++++++++++++++++--
>>>   hw/vfio/common.c                      |  4 ++--
>>>   hw/vfio/container-base.c              |  4 ++--
>>>   hw/vfio/container.c                   |  6 +++---
>>>   4 files changed, 23 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-container-base.h
>>> b/include/hw/vfio/vfio-container-base.h
>>> index
>>> 3582d5f97a37877b2adfc0d0b06996c82403f8b7..c76984654a596e3016a8cf833e10143eb872e102 100644
>>> --- a/include/hw/vfio/vfio-container-base.h
>>> +++ b/include/hw/vfio/vfio-container-base.h
>>> @@ -82,7 +82,7 @@ int
>>> vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>>>   void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>>                                          MemoryRegionSection *section);
>>>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase
>>> *bcontainer,
>>> -                                           bool start);
>>> +                                           bool start, Error **errp);
>>>   int vfio_container_query_dirty_bitmap(const VFIOContainerBase
>>> *bcontainer,
>>>                                         VFIOBitmap *vbmap,
>>>                                         hwaddr iova, hwaddr size);
>>> @@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
>>>       int (*attach_device)(const char *name, VFIODevice *vbasedev,
>>>                            AddressSpace *as, Error **errp);
>>>       void (*detach_device)(VFIODevice *vbasedev);
>>> +
>>>       /* migration feature */
>>> +
>>> +    /**
>>> +     * @set_dirty_page_tracking
>>> +     *
>>> +     * Start or stop dirty pages tracking on VFIO container
>>> +     *
>>> +     * @bcontainer: #VFIOContainerBase on which to de/activate dirty
>>> +     *              pages tracking
>> s/pages/page?
> 
> yep
> 
>> for my education is the "#"VFIOContainerBase formalized somewhere?
> 
> It's QEMU specific. See 4cf41794411f ("docs: tweak kernel-doc for QEMU
> coding standards").
OK thank you for the education!
> 
>> +     * @start: indicates whether to start or stop dirty pages tracking
>>> +     * @errp: pointer to Error*, to store an error if it happens.
>>> +     *
>>> +     * Returns zero to indicate success and negative for error
>>> +     */
>>>       int (*set_dirty_page_tracking)(const VFIOContainerBase
>>> *bcontainer,
>>> -                                   bool start);
>>> +                                   bool start, Error **errp);
>>>       int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
>>>                                 VFIOBitmap *vbmap,
>>>                                 hwaddr iova, hwaddr size);
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index
>>> 800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4..5598a508399a6c0b3a20ba17311cbe83d84250c5 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -1085,7 +1085,7 @@ static bool
>>> vfio_listener_log_global_start(MemoryListener *listener,
>>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>>           ret = vfio_devices_dma_logging_start(bcontainer);
>>>       } else {
>>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
>>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer,
>>> true, NULL);
>> It is not obvious why we don't pass errp here. Also there is ana
>> error_report below. Why isn't the error propagated? (not related to your
>> patch though)
> 
> When I started this series, I was trying to find a way to introduce
> progressively the changes and this patch is preparing ground for
> what is coming next. It could be merged with the following if you prefer.
up to you or tweek the commit msg

Eric
> 
> 
> Thanks,
> 
> C.
> 
> 
> 
> 
>>>       }
>>>         if (ret) {
>>> @@ -1105,7 +1105,7 @@ static void
>>> vfio_listener_log_global_stop(MemoryListener *listener)
>>>       if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>>           vfio_devices_dma_logging_stop(bcontainer);
>>>       } else {
>>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer,
>>> false);
>>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer,
>>> false, NULL);
>>>       }
>>>         if (ret) {
>>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>>> index
>>> 913ae49077c4f09b7b27517c1231cfbe4befb7fb..7c0764121d24b02b6c4e66e368d7dff78a6d65aa 100644
>>> --- a/hw/vfio/container-base.c
>>> +++ b/hw/vfio/container-base.c
>>> @@ -53,14 +53,14 @@ void
>>> vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>>   }
>>>     int vfio_container_set_dirty_page_tracking(VFIOContainerBase
>>> *bcontainer,
>>> -                                           bool start)
>>> +                                           bool start, Error **errp)
>>>   {
>>>       if (!bcontainer->dirty_pages_supported) {
>>>           return 0;
>>>       }
>>>         g_assert(bcontainer->ops->set_dirty_page_tracking);
>>> -    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
>>> +    return bcontainer->ops->set_dirty_page_tracking(bcontainer,
>>> start, errp);
>>>   }
>>>     int vfio_container_query_dirty_bitmap(const VFIOContainerBase
>>> *bcontainer,
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index
>>> 096d77eac3946a9c38fc2a98116b93353f71f06e..6524575aeddcea8470b5fd10caf57475088d1813 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -210,7 +210,7 @@ static int vfio_legacy_dma_map(const
>>> VFIOContainerBase *bcontainer, hwaddr iova,
>>>     static int
>>>   vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase
>>> *bcontainer,
>>> -                                    bool start)
>>> +                                    bool start, Error **errp)
>>>   {
>>>       const VFIOContainer *container = container_of(bcontainer,
>>> VFIOContainer,
>>>                                                     bcontainer);
>>> @@ -228,8 +228,8 @@ vfio_legacy_set_dirty_page_tracking(const
>>> VFIOContainerBase *bcontainer,
>>>       ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>>>       if (ret) {
>>>           ret = -errno;
>>> -        error_report("Failed to set dirty tracking flag 0x%x errno:
>>> %d",
>>> -                     dirty.flags, errno);
>>> +        error_setg_errno(errp, errno, "Failed to set dirty tracking
>>> flag 0x%x",
>>> +                         dirty.flags);
>>>       }
>>>         return ret;
>>
>> Thanks
>>
>> Eric
>>
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler
  2024-03-07 13:55     ` Cédric Le Goater
@ 2024-03-08  7:41       ` Eric Auger
  0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2024-03-08  7:41 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/7/24 14:55, Cédric Le Goater wrote:
> On 3/7/24 10:13, Eric Auger wrote:
>>
>>
>> On 3/6/24 14:34, Cédric Le Goater wrote:
>>> Use vmstate_save_state_with_err() to improve error reporting in the
>>> callers and store a reported error under the migration stream. Add
>>> documentation while at it.
>>>
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>   include/hw/vfio/vfio-common.h | 25 ++++++++++++++++++++++++-
>>>   hw/vfio/migration.c           | 18 ++++++++++++------
>>>   hw/vfio/pci.c                 |  5 +++--
>>>   3 files changed, 39 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h
>>> b/include/hw/vfio/vfio-common.h
>>> index
>>> b9da6c08ef41174610eb92726c590309a53696a3..46f88493634b5634a9c14a5caa33a463fbf2c50d 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -133,7 +133,30 @@ struct VFIODeviceOps {
>>>       int (*vfio_hot_reset_multi)(VFIODevice *vdev);
>>>       void (*vfio_eoi)(VFIODevice *vdev);
>>>       Object *(*vfio_get_object)(VFIODevice *vdev);
>>> -    void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
>>> +
>>> +    /**
>>> +     * @vfio_save_config
>>> +     *
>>> +     * Save device config state
>>> +     *
>>> +     * @vdev: #VFIODevice for which to save the config
>>> +     * @f: #QEMUFile where to send the data
>>> +     * @errp: pointer to Error*, to store an error if it happens.
>>> +     *
>>> +     * Returns zero to indicate success and negative for error
>>> +     */
>>> +    int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error
>>> **errp);
>>> +
>>> +    /**
>>> +     * @vfio_load_config
>>> +     *
>>> +     * Load device config state
>>> +     *
>>> +     * @vdev: #VFIODevice for which to load the config
>>> +     * @f: #QEMUFile where to get the data
>>> +     *
>>> +     * Returns zero to indicate success and negative for error
>>> +     */
>>>       int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
>>>   };
>>>   diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index
>>> 71ade14a7942358094371a86c00718f5979113ea..bd48f2ee472a5230c2c84bff829dae1e217db33f 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -190,14 +190,19 @@ static int vfio_load_buffer(QEMUFile *f,
>>> VFIODevice *vbasedev,
>>>       return ret;
>>>   }
>>>   -static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
>>> +static int vfio_save_device_config_state(QEMUFile *f, void *opaque,
>>> +                                         Error **errp)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> +    int ret;
>>>         qemu_put_be64(f, VFIO_MIG_FLAG_DEV_CONFIG_STATE);
>>>         if (vbasedev->ops && vbasedev->ops->vfio_save_config) {
>>> -        vbasedev->ops->vfio_save_config(vbasedev, f);
>>> +        ret = vbasedev->ops->vfio_save_config(vbasedev, f, errp);
>>> +        if (ret) {
>> I am not familiar enough with that case but don't you still want to set
>> the VFIO_MIG_FLAG_END_OF_STATE to "close" the state?
> 
> This is a delimiter used on the target side when loading the state.
> 
> When QEMU fails to capture the device state, the whole migration is marked
> as in error. There is no need to end cleanly the device state, it is bogus
> anyhow.

OK thanks

Eric
> 
> Thanks,
> 
> C.
> 
> 
>>
>> Eric
>>> +            return ret;
>>> +        }
>>>       }
>>>         qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>> @@ -587,13 +592,14 @@ static int vfio_save_complete_precopy(QEMUFile
>>> *f, void *opaque)
>>>   static void vfio_save_state(QEMUFile *f, void *opaque)
>>>   {
>>>       VFIODevice *vbasedev = opaque;
>>> +    Error *local_err = NULL;
>>>       int ret;
>>>   -    ret = vfio_save_device_config_state(f, opaque);
>>> +    ret = vfio_save_device_config_state(f, opaque, &local_err);
>>>       if (ret) {
>>> -        error_report("%s: Failed to save device config space",
>>> -                     vbasedev->name);
>>> -        qemu_file_set_error(f, ret);
>>> +        error_prepend(&local_err, "%s: Failed to save device config
>>> space",
>>> +                      vbasedev->name);
>>> +        qemu_file_set_error_obj(f, ret, local_err);
>>>       }
>>>   }
>>>   diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index
>>> 4fa387f0430d62ca2ba1b5ae5b7037f8f06b33f9..99d86e1d40ef25133fc76ad6e58294b07bd20843 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2585,11 +2585,12 @@ const VMStateDescription
>>> vmstate_vfio_pci_config = {
>>>       }
>>>   };
>>>   -static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
>>> +static int vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f,
>>> Error **errp)
>>>   {
>>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>>> vbasedev);
>>>   -    vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL);
>>> +    return vmstate_save_state_with_err(f, &vmstate_vfio_pci_config,
>>> vdev, NULL,
>>> +                                       errp);
>>>   }
>>>     static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy()
  2024-03-07 13:36     ` Cédric Le Goater
@ 2024-03-08  7:42       ` Eric Auger
  0 siblings, 0 replies; 111+ messages in thread
From: Eric Auger @ 2024-03-08  7:42 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit



On 3/7/24 14:36, Cédric Le Goater wrote:
> On 3/7/24 10:28, Eric Auger wrote:
>>
>>
>> On 3/6/24 14:34, Cédric Le Goater wrote:
>>> vfio_save_complete_precopy() currently returns before doing the trace
>>> event. Change that.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>   hw/vfio/migration.c | 3 ---
>>>   1 file changed, 3 deletions(-)
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index
>>> bd48f2ee472a5230c2c84bff829dae1e217db33f..c8aeb43b4249ec76ded2542d62792e8c469d5f97 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -580,9 +580,6 @@ static int vfio_save_complete_precopy(QEMUFile
>>> *f, void *opaque)
>>>         qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>>       ret = qemu_file_get_error(f);
>>> -    if (ret) {
>>> -        return ret;
>>> -    }
>>>         trace_vfio_save_complete_precopy(vbasedev->name, ret);
>> it is arguable if you want to trace if an error occured. If you want to
>> unconditionally trace the function entry, want don't we put the trace at
>> the beginning of the function?
> 
> But, then, the 'ret' value is not set and the trace event is less useful.
> I'd rather keep it that way.
ah I did not notice the returned value was traced too. OK then

Sorry for the noise

Eric
> 
> Thanks,
> 
> C.
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler
  2024-03-08  7:11         ` Peter Xu
@ 2024-03-08  8:08           ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-08  8:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Nicholas Piggin,
	Harsh Prateek Bora, Halil Pasic, Thomas Huth, Eric Blake,
	John Snow, Stefan Hajnoczi

On Fri, Mar 08, 2024 at 03:11:04PM +0800, Peter Xu wrote:
> On Thu, Mar 07, 2024 at 02:39:31PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > I would be glad to have most of this series merged in QEMU 9.0. So,
> > > unless there is something major, I will keep that for followups.
> 
> Unfortunately I found this series won't apply to master.. starting from
> "migration: Always report an error in ram_save_setup()".  Perhaps forgot to
> pull before the repost?

Scratch this.  It's myself who forgot to pull... :-( It applies all fine.

> 
> It'll also be nice if we can get an ACK for the s390 patch from a
> maintainer.

I'll ping on the patch.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler
  2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
  2024-03-07 12:18   ` Fabiano Rosas
@ 2024-03-08  8:11   ` Peter Xu
  2024-03-08  8:45   ` Thomas Huth
  2 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-08  8:11 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Halil Pasic,
	Christian Borntraeger, Thomas Huth

On Wed, Mar 06, 2024 at 02:34:20PM +0100, Cédric Le Goater wrote:
> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> set_migrationmode() always sets a new error. See the Rules section in
> qapi/error.h.
> 
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Since this whole set is mostly migration relevant, I plan to include this
patch in next Monday's pull.

S390 maintainers, please let me know if you have comments / objections
before that, thanks!

> ---
> 
>  Changes in v4:
> 
>  - Fixed state name printed out in error returned by vfio_save_setup()
>  - Fixed test on error returned by qemu_file_get_error()
> 
>  include/hw/s390x/storage-attributes.h |  2 +-
>  hw/s390x/s390-stattrib-kvm.c          | 12 ++++++++++--
>  hw/s390x/s390-stattrib.c              | 15 ++++++++++-----
>  3 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/include/hw/s390x/storage-attributes.h b/include/hw/s390x/storage-attributes.h
> index 5239eb538c1b087797867a247abfc14551af6a4d..8921a04d514bf64a3113255ee10ed33fc598ae06 100644
> --- a/include/hw/s390x/storage-attributes.h
> +++ b/include/hw/s390x/storage-attributes.h
> @@ -39,7 +39,7 @@ struct S390StAttribClass {
>      int (*set_stattr)(S390StAttribState *sa, uint64_t start_gfn,
>                        uint32_t count, uint8_t *values);
>      void (*synchronize)(S390StAttribState *sa);
> -    int (*set_migrationmode)(S390StAttribState *sa, bool value);
> +    int (*set_migrationmode)(S390StAttribState *sa, bool value, Error **errp);
>      int (*get_active)(S390StAttribState *sa);
>      long long (*get_dirtycount)(S390StAttribState *sa);
>  };
> diff --git a/hw/s390x/s390-stattrib-kvm.c b/hw/s390x/s390-stattrib-kvm.c
> index 24cd01382e2d74d62c2d7e980eb6aca1077d893d..eeaa8110981c970e91a8948f027e398c34637321 100644
> --- a/hw/s390x/s390-stattrib-kvm.c
> +++ b/hw/s390x/s390-stattrib-kvm.c
> @@ -17,6 +17,7 @@
>  #include "sysemu/kvm.h"
>  #include "exec/ram_addr.h"
>  #include "kvm/kvm_s390x.h"
> +#include "qapi/error.h"
>  
>  Object *kvm_s390_stattrib_create(void)
>  {
> @@ -137,14 +138,21 @@ static void kvm_s390_stattrib_synchronize(S390StAttribState *sa)
>      }
>  }
>  
> -static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val)
> +static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val,
> +                                               Error **errp)
>  {
>      struct kvm_device_attr attr = {
>          .group = KVM_S390_VM_MIGRATION,
>          .attr = val,
>          .addr = 0,
>      };
> -    return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attr);
> +    int r;
> +
> +    r = kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, &attr);
> +    if (r) {
> +        error_setg_errno(errp, -r, "setting KVM_S390_VM_MIGRATION failed");
> +    }
> +    return r;
>  }
>  
>  static long long kvm_s390_stattrib_get_dirtycount(S390StAttribState *sa)
> diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
> index c483b62a9b5f71772639fc180bdad15ecb6711cb..b743e8a2fee84c7374460ccea6df1cf447cda44b 100644
> --- a/hw/s390x/s390-stattrib.c
> +++ b/hw/s390x/s390-stattrib.c
> @@ -60,11 +60,13 @@ void hmp_migrationmode(Monitor *mon, const QDict *qdict)
>      S390StAttribState *sas = s390_get_stattrib_device();
>      S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
>      uint64_t what = qdict_get_int(qdict, "mode");
> +    Error *local_err = NULL;
>      int r;
>  
> -    r = sac->set_migrationmode(sas, what);
> +    r = sac->set_migrationmode(sas, what, &local_err);
>      if (r < 0) {
> -        monitor_printf(mon, "Error: %s", strerror(-r));
> +        monitor_printf(mon, "Error: %s", error_get_pretty(local_err));
> +        error_free(local_err);
>      }
>  }
>  
> @@ -170,13 +172,15 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
>  {
>      S390StAttribState *sas = S390_STATTRIB(opaque);
>      S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
> +    Error *local_err = NULL;
>      int res;
>      /*
>       * Signal that we want to start a migration, thus needing PGSTE dirty
>       * tracking.
>       */
> -    res = sac->set_migrationmode(sas, 1);
> +    res = sac->set_migrationmode(sas, true, &local_err);
>      if (res) {
> +        error_report_err(local_err);
>          return res;
>      }
>      qemu_put_be64(f, STATTR_FLAG_EOS);
> @@ -260,7 +264,7 @@ static void cmma_save_cleanup(void *opaque)
>  {
>      S390StAttribState *sas = S390_STATTRIB(opaque);
>      S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
> -    sac->set_migrationmode(sas, 0);
> +    sac->set_migrationmode(sas, false, NULL);
>  }
>  
>  static bool cmma_active(void *opaque)
> @@ -293,7 +297,8 @@ static long long qemu_s390_get_dirtycount_stub(S390StAttribState *sa)
>  {
>      return 0;
>  }
> -static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value)
> +static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value,
> +                                            Error **errp)
>  {
>      return 0;
>  }
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
                   ` (24 preceding siblings ...)
  2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
@ 2024-03-08  8:15 ` Peter Xu
  2024-03-08 13:03   ` Cédric Le Goater
  2024-03-11 20:24   ` Peter Xu
  25 siblings, 2 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-08  8:15 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> * [1-4] already queued in migration-next.
>   
>   migration: Report error when shutdown fails
>   migration: Remove SaveStateHandler and LoadStateHandler typedefs
>   migration: Add documentation for SaveVMHandlers
>   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>   
> * [5-9] are prequisite changes in other components related to the
>   migration save_setup() handler. They make sure a failure is not
>   returned without setting an error.
>   
>   s390/stattrib: Add Error** argument to set_migrationmode() handler
>   vfio: Always report an error in vfio_save_setup()
>   migration: Always report an error in block_save_setup()
>   migration: Always report an error in ram_save_setup()
>   migration: Add Error** argument to vmstate_save()
> 
> * [10-15] are the core changes in migration and memory components to
>   propagate an error reported in a save_setup() handler.
> 
>   migration: Add Error** argument to qemu_savevm_state_setup()
>   migration: Add Error** argument to .save_setup() handler
>   migration: Add Error** argument to .load_setup() handler

Further queued 5-12 in migration-staging (until here), thanks.

>   memory: Add Error** argument to .log_global_start() handler
>   memory: Add Error** argument to the global_dirty_log routines
>   migration: Modify ram_init_bitmaps() to report dirty tracking errors
> 
> * [16-19] contains the VFIO changes we are interested in. Can go
>   through vfio-next.
> 
>   vfio: Add Error** argument to .set_dirty_page_tracking() handler
>   vfio: Add Error** argument to vfio_devices_dma_logging_start()
>   vfio: Add Error** argument to vfio_devices_dma_logging_stop()
>   vfio: Use new Error** argument in vfio_save_setup()
> 
> * [20-25] are followups for better error handling in VFIO. Good to
>   have but not necessary for the issue described in the intro. Can go
>   through vfio-next.
> 
>   vfio: Add Error** argument to .vfio_save_config() handler
>   vfio: Reverse test on vfio_get_dirty_bitmap()
>   memory: Add Error** argument to memory_get_xlat_addr()
>   vfio: Add Error** argument to .get_dirty_bitmap() handler
>   vfio: Also trace event failures in vfio_save_complete_precopy()
>   vfio: Extend vfio_set_migration_error() with Error* argument
> 
> Thanks,
> 
> C.
> 
> Changes in v4:
> 
>  - Fixed frenchism futur to future
>  - Fixed typo in set_migrationmode() handler
>  - Added error_free() in hmp_migrationmode()
>  - Fixed state name printed out in error returned by vfio_save_setup()
>  - Fixed test on error returned by qemu_file_get_error()
>  - Added an error when bdrv_nb_sectors() returns a negative value 
>  - Dropped log_global_stop() and log_global_sync() changes
>  - Dropped MEMORY_LISTENER_CALL_LOG_GLOBAL
>  - Modified memory_global_dirty_log_start() to loop on the list of
>    listeners and handle errors directly.
>  - Introduced memory_global_dirty_log_rollback() to revert operations
>    previously done
> 
> Changes in v3:
> 
>  - New changes to make sure an error is always set in case of failure.
>    This is the reason behing the 5/6 extra patches. (Markus)
>  - Documentation fixup (Peter + Avihai)
>  - Set migration state to MIGRATION_STATUS_FAILED always
>  - Fixed error handling in bg_migration_thread() (Peter)
>  - Fixed return value of vfio_listener_log_global_start/stop(). 
>    Went unnoticed because value is not tested. (Peter)
>  - Add ERRP_GUARD() when error_prepend is used 
>  - Use error_setg_errno() when possible
>     
> Changes in v2:
> 
> - Removed v1 patches addressing the return-path thread termination as
>   they are now superseded by :  
>   https://lore.kernel.org/qemu-devel/20240226203122.22894-1-farosas@suse.de/
> - Documentation updates of handlers
> - Removed call to PRECOPY_NOTIFY_SETUP notifiers in case of errors
> - Modified routines taking an Error** argument to return a bool when
>   possible and made adjustments in callers.
> - new MEMORY_LISTENER_CALL_LOG_GLOBAL macro for .log_global*()
>   handlers
> - Handled SETUP state when migration terminates
> - Modified memory_get_xlat_addr() to take an Error** argument
> - Various refinements on error handling
> 
> Cédric Le Goater (25):
>   migration: Report error when shutdown fails
>   migration: Remove SaveStateHandler and LoadStateHandler typedefs
>   migration: Add documentation for SaveVMHandlers
>   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>   s390/stattrib: Add Error** argument to set_migrationmode() handler
>   vfio: Always report an error in vfio_save_setup()
>   migration: Always report an error in block_save_setup()
>   migration: Always report an error in ram_save_setup()
>   migration: Add Error** argument to vmstate_save()
>   migration: Add Error** argument to qemu_savevm_state_setup()
>   migration: Add Error** argument to .save_setup() handler
>   migration: Add Error** argument to .load_setup() handler
>   memory: Add Error** argument to .log_global_start() handler
>   memory: Add Error** argument to the global_dirty_log routines
>   migration: Modify ram_init_bitmaps() to report dirty tracking errors
>   vfio: Add Error** argument to .set_dirty_page_tracking() handler
>   vfio: Add Error** argument to vfio_devices_dma_logging_start()
>   vfio: Add Error** argument to vfio_devices_dma_logging_stop()
>   vfio: Use new Error** argument in vfio_save_setup()
>   vfio: Add Error** argument to .vfio_save_config() handler
>   vfio: Reverse test on vfio_get_dirty_bitmap()
>   memory: Add Error** argument to memory_get_xlat_addr()
>   vfio: Add Error** argument to .get_dirty_bitmap() handler
>   vfio: Also trace event failures in vfio_save_complete_precopy()
>   vfio: Extend vfio_set_migration_error() with Error* argument
> 
>  include/exec/memory.h                 |  25 ++-
>  include/hw/s390x/storage-attributes.h |   2 +-
>  include/hw/vfio/vfio-common.h         |  29 ++-
>  include/hw/vfio/vfio-container-base.h |  35 +++-
>  include/migration/register.h          | 273 +++++++++++++++++++++++---
>  include/qemu/typedefs.h               |   2 -
>  migration/savevm.h                    |   2 +-
>  hw/i386/xen/xen-hvm.c                 |   5 +-
>  hw/ppc/spapr.c                        |   2 +-
>  hw/s390x/s390-stattrib-kvm.c          |  12 +-
>  hw/s390x/s390-stattrib.c              |  15 +-
>  hw/vfio/common.c                      | 161 +++++++++------
>  hw/vfio/container-base.c              |   9 +-
>  hw/vfio/container.c                   |  19 +-
>  hw/vfio/migration.c                   |  99 ++++++----
>  hw/vfio/pci.c                         |   5 +-
>  hw/virtio/vhost-vdpa.c                |   5 +-
>  hw/virtio/vhost.c                     |   3 +-
>  migration/block-dirty-bitmap.c        |   4 +-
>  migration/block.c                     |  19 +-
>  migration/dirtyrate.c                 |  13 +-
>  migration/migration.c                 |  27 ++-
>  migration/qemu-file.c                 |   5 +-
>  migration/ram.c                       |  46 ++++-
>  migration/savevm.c                    |  59 +++---
>  system/memory.c                       |  56 +++++-
>  26 files changed, 713 insertions(+), 219 deletions(-)
> 
> -- 
> 2.44.0
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler
  2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
  2024-03-07 12:18   ` Fabiano Rosas
  2024-03-08  8:11   ` Peter Xu
@ 2024-03-08  8:45   ` Thomas Huth
  2 siblings, 0 replies; 111+ messages in thread
From: Thomas Huth @ 2024-03-08  8:45 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Halil Pasic,
	Christian Borntraeger

On 06/03/2024 14.34, Cédric Le Goater wrote:
> This will prepare ground for future changes adding an Error** argument
> to the save_setup() handler. We need to make sure that on failure,
> set_migrationmode() always sets a new error. See the Rules section in
> qapi/error.h.
> 
> Cc: Halil Pasic <pasic@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---

Reviewed-by: Thomas Huth <thuth@redhat.com>




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
  2024-03-07 12:45   ` Fabiano Rosas
@ 2024-03-08 12:56   ` Peter Xu
  2024-03-08 13:14     ` Cédric Le Goater
  2024-03-08 14:11     ` Fabiano Rosas
  2024-03-08 14:36   ` Fabiano Rosas
  2 siblings, 2 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-08 12:56 UTC (permalink / raw)
  To: Cédric Le Goater, Laurent Vivier
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
> This prepares ground for the changes coming next which add an Error**
> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> now handle the error and fail earlier setting the migration state from
> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> 
> In qemu_savevm_state(), move the cleanup to preserve the error
> reported by .save_setup() handlers.
> 
> Since the previous behavior was to ignore errors at this step of
> migration, this change should be examined closely to check that
> cleanups are still correctly done.
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v4:
>  
>  - Merged cleanup change in qemu_savevm_state()
>    
>  Changes in v3:
>  
>  - Set migration state to MIGRATION_STATUS_FAILED 
>  - Fixed error handling to be done under lock in bg_migration_thread()
>  - Made sure an error is always set in case of failure in
>    qemu_savevm_state_setup()
>    
>  migration/savevm.h    |  2 +-
>  migration/migration.c | 27 ++++++++++++++++++++++++---
>  migration/savevm.c    | 26 +++++++++++++++-----------
>  3 files changed, 40 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -32,7 +32,7 @@
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_non_migratable_list(strList **reasons);
>  int qemu_savevm_state_prepare(Error **errp);
> -void qemu_savevm_state_setup(QEMUFile *f);
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>  bool qemu_savevm_state_guest_unplug_pending(void);
>  int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
> diff --git a/migration/migration.c b/migration/migration.c
> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      MigThrError thr_error;
>      bool urgent = false;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      thread = migration_threads_add("live_migration", qemu_get_thread_id());
>  
> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>      }
>  
>      bql_lock();
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>      bql_unlock();
>  
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> +                          MIGRATION_STATUS_FAILED);
> +        goto out;
> +     }

There's a small indent issue, I can fix it.

The bigger problem is I _think_ this will trigger a ci failure in the
virtio-net-failover test:

▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR         
121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I am not familiar enough with the failover code, and may not have time
today to follow this up, copy Laurent.  Cedric, if you have time, please
have a look.  I'll give it a shot on Monday to find a solution, otherwise
we may need to postpone some of the patches to 9.1.

Thanks,

> +
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
>  
> @@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
>      MigThrError thr_error;
>      QEMUFile *fb;
>      bool early_fail = true;
> +    bool setup_fail = true;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      rcu_register_thread();
>      object_ref(OBJECT(s));
> @@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
>  
>      bql_lock();
>      qemu_savevm_state_header(s->to_dst_file);
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        goto fail;
> +    }
>      bql_unlock();
>  
> +    setup_fail = false;
> +
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
>  
> @@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
>  
>  fail:
>      if (early_fail) {
> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> +        migrate_set_state(&s->state,
> +                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
>                  MIGRATION_STATUS_FAILED);
>          bql_unlock();
>      }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>      return 0;
>  }
>  
> -void qemu_savevm_state_setup(QEMUFile *f)
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>  {
> +    ERRP_GUARD();
>      MigrationState *ms = migrate_get_current();
>      SaveStateEntry *se;
> -    Error *local_err = NULL;
>      int ret = 0;
>  
>      json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>      trace_savevm_state_setup();
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (se->vmsd && se->vmsd->early_setup) {
> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>              if (ret) {
> -                migrate_set_error(ms, local_err);
> -                error_report_err(local_err);
> +                migrate_set_error(ms, *errp);
>                  qemu_file_set_error(f, ret);
>                  break;
>              }
> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>          ret = se->ops->save_setup(f, se->opaque);
>          save_section_footer(f, se);
>          if (ret < 0) {
> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>              qemu_file_set_error(f, ret);
>              break;
>          }
>      }
>  
>      if (ret) {
> -        return;
> +        return ret;
>      }
>  
> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
> -        error_report_err(local_err);
> -    }
> +    /* TODO: Should we check that errp is set in case of failure ? */
> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>  }
>  
>  int qemu_savevm_state_resume_prepare(MigrationState *s)
> @@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      ms->to_dst_file = f;
>  
>      qemu_savevm_state_header(f);
> -    qemu_savevm_state_setup(f);
> +    ret = qemu_savevm_state_setup(f, errp);
> +    if (ret) {
> +        goto cleanup;
> +    }
>  
>      while (qemu_file_get_error(f) == 0) {
>          if (qemu_savevm_state_iterate(f, false) > 0) {
> @@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>          qemu_savevm_state_complete_precopy(f, false, false);
>          ret = qemu_file_get_error(f);
>      }
> -    qemu_savevm_state_cleanup();
>      if (ret != 0) {
>          error_setg_errno(errp, -ret, "Error while writing VM state");
>      }
> +cleanup:
> +    qemu_savevm_state_cleanup();
>  
>      if (ret != 0) {
>          status = MIGRATION_STATUS_FAILED;
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler
  2024-03-08  7:39       ` Eric Auger
@ 2024-03-08 13:00         ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-08 13:00 UTC (permalink / raw)
  To: Eric Auger, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 08:39, Eric Auger wrote:
> 
> 
> On 3/7/24 13:06, Cédric Le Goater wrote:
>> On 3/7/24 09:09, Eric Auger wrote:
>>> Hi Cédric,
>>>
>>> On 3/6/24 14:34, Cédric Le Goater wrote:
>>>> We will use the Error object to improve error reporting in the
>>>> .log_global*() handlers of VFIO. Add documentation while at it.
>>>>
>>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>
>>>>    Changes in v3:
>>>>
>>>>    - Use error_setg_errno() in vfio_legacy_set_dirty_page_tracking()
>>>>      include/hw/vfio/vfio-container-base.h | 18 ++++++++++++++++--
>>>>    hw/vfio/common.c                      |  4 ++--
>>>>    hw/vfio/container-base.c              |  4 ++--
>>>>    hw/vfio/container.c                   |  6 +++---
>>>>    4 files changed, 23 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-container-base.h
>>>> b/include/hw/vfio/vfio-container-base.h
>>>> index
>>>> 3582d5f97a37877b2adfc0d0b06996c82403f8b7..c76984654a596e3016a8cf833e10143eb872e102 100644
>>>> --- a/include/hw/vfio/vfio-container-base.h
>>>> +++ b/include/hw/vfio/vfio-container-base.h
>>>> @@ -82,7 +82,7 @@ int
>>>> vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>>>>    void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>>>                                           MemoryRegionSection *section);
>>>>    int vfio_container_set_dirty_page_tracking(VFIOContainerBase
>>>> *bcontainer,
>>>> -                                           bool start);
>>>> +                                           bool start, Error **errp);
>>>>    int vfio_container_query_dirty_bitmap(const VFIOContainerBase
>>>> *bcontainer,
>>>>                                          VFIOBitmap *vbmap,
>>>>                                          hwaddr iova, hwaddr size);
>>>> @@ -121,9 +121,23 @@ struct VFIOIOMMUClass {
>>>>        int (*attach_device)(const char *name, VFIODevice *vbasedev,
>>>>                             AddressSpace *as, Error **errp);
>>>>        void (*detach_device)(VFIODevice *vbasedev);
>>>> +
>>>>        /* migration feature */
>>>> +
>>>> +    /**
>>>> +     * @set_dirty_page_tracking
>>>> +     *
>>>> +     * Start or stop dirty pages tracking on VFIO container
>>>> +     *
>>>> +     * @bcontainer: #VFIOContainerBase on which to de/activate dirty
>>>> +     *              pages tracking
>>> s/pages/page?
>>
>> yep
>>
>>> for my education is the "#"VFIOContainerBase formalized somewhere?
>>
>> It's QEMU specific. See 4cf41794411f ("docs: tweak kernel-doc for QEMU
>> coding standards").
> OK thank you for the education!

Took me a while do understand where it was come from. So you educated
me also :)

>>
>>> +     * @start: indicates whether to start or stop dirty pages tracking
>>>> +     * @errp: pointer to Error*, to store an error if it happens.
>>>> +     *
>>>> +     * Returns zero to indicate success and negative for error
>>>> +     */
>>>>        int (*set_dirty_page_tracking)(const VFIOContainerBase
>>>> *bcontainer,
>>>> -                                   bool start);
>>>> +                                   bool start, Error **errp);
>>>>        int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
>>>>                                  VFIOBitmap *vbmap,
>>>>                                  hwaddr iova, hwaddr size);
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index
>>>> 800ba0aeac84b8dcc83b042bb70c37b4bf78d3f4..5598a508399a6c0b3a20ba17311cbe83d84250c5 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -1085,7 +1085,7 @@ static bool
>>>> vfio_listener_log_global_start(MemoryListener *listener,
>>>>        if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
>>>>            ret = vfio_devices_dma_logging_start(bcontainer);
>>>>        } else {
>>>> -        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
>>>> +        ret = vfio_container_set_dirty_page_tracking(bcontainer,
>>>> true, NULL);
>>> It is not obvious why we don't pass errp here. Also there is ana
>>> error_report below. Why isn't the error propagated? (not related to your
>>> patch though)
>>
>> When I started this series, I was trying to find a way to introduce
>> progressively the changes and this patch is preparing ground for
>> what is coming next. It could be merged with the following if you prefer.
> up to you or tweek the commit msg

ok. Let's get the initial migration part in first and then I will resend the
VFIO part.

Thanks for your time.

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-08  8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
@ 2024-03-08 13:03   ` Cédric Le Goater
  2024-03-11 20:24   ` Peter Xu
  1 sibling, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-08 13:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 09:15, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>> * [1-4] already queued in migration-next.
>>    
>>    migration: Report error when shutdown fails
>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>    migration: Add documentation for SaveVMHandlers
>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>    
>> * [5-9] are prequisite changes in other components related to the
>>    migration save_setup() handler. They make sure a failure is not
>>    returned without setting an error.
>>    
>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>    vfio: Always report an error in vfio_save_setup()
>>    migration: Always report an error in block_save_setup()
>>    migration: Always report an error in ram_save_setup()
>>    migration: Add Error** argument to vmstate_save()
>>
>> * [10-15] are the core changes in migration and memory components to
>>    propagate an error reported in a save_setup() handler.
>>
>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>    migration: Add Error** argument to .save_setup() handler
>>    migration: Add Error** argument to .load_setup() handler
> 
> Further queued 5-12 in migration-staging (until here), thanks.

Thanks Peter. All the prereq changes should reach 9.0, which leaves
time to discuss the core changes for 9.1.

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 12:56   ` Peter Xu
@ 2024-03-08 13:14     ` Cédric Le Goater
  2024-03-08 13:39       ` Cédric Le Goater
  2024-03-08 14:11     ` Fabiano Rosas
  1 sibling, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-08 13:14 UTC (permalink / raw)
  To: Peter Xu, Laurent Vivier
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 13:56, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
>> This prepares ground for the changes coming next which add an Error**
>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>> now handle the error and fail earlier setting the migration state from
>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>
>> In qemu_savevm_state(), move the cleanup to preserve the error
>> reported by .save_setup() handlers.
>>
>> Since the previous behavior was to ignore errors at this step of
>> migration, this change should be examined closely to check that
>> cleanups are still correctly done.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v4:
>>   
>>   - Merged cleanup change in qemu_savevm_state()
>>     
>>   Changes in v3:
>>   
>>   - Set migration state to MIGRATION_STATUS_FAILED
>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>   - Made sure an error is always set in case of failure in
>>     qemu_savevm_state_setup()
>>     
>>   migration/savevm.h    |  2 +-
>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>
>> diff --git a/migration/savevm.h b/migration/savevm.h
>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>> --- a/migration/savevm.h
>> +++ b/migration/savevm.h
>> @@ -32,7 +32,7 @@
>>   bool qemu_savevm_state_blocked(Error **errp);
>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>   int qemu_savevm_state_prepare(Error **errp);
>> -void qemu_savevm_state_setup(QEMUFile *f);
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>   void qemu_savevm_state_header(QEMUFile *f);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>       MigThrError thr_error;
>>       bool urgent = false;
>> +    Error *local_err = NULL;
>> +    int ret;
>>   
>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>   
>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>       }
>>   
>>       bql_lock();
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>       bql_unlock();
>>   
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>> +                          MIGRATION_STATUS_FAILED);
>> +        goto out;
>> +     }
> 
> There's a small indent issue, I can fix it.

checkpatch did report anything.

> 
> The bigger problem is I _think_ this will trigger a ci failure in the
> virtio-net-failover test:
> 
> ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
> 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: ram_save_setup failed: Input/output error
> **
> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> (test program exited with status code -6)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> 
> I am not familiar enough with the failover code, and may not have time
> today to follow this up, copy Laurent.  Cedric, if you have time, please
> have a look.  


Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
aarch64. Let me check again.


Thanks,

C.



> I'll give it a shot on Monday to find a solution, otherwise
> we may need to postpone some of the patches to 9.1.
> 
> Thanks,
> 
>> +
>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                  MIGRATION_STATUS_ACTIVE);
>>   
>> @@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
>>       MigThrError thr_error;
>>       QEMUFile *fb;
>>       bool early_fail = true;
>> +    bool setup_fail = true;
>> +    Error *local_err = NULL;
>> +    int ret;
>>   
>>       rcu_register_thread();
>>       object_ref(OBJECT(s));
>> @@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
>>   
>>       bql_lock();
>>       qemu_savevm_state_header(s->to_dst_file);
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        goto fail;
>> +    }
>>       bql_unlock();
>>   
>> +    setup_fail = false;
>> +
>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                  MIGRATION_STATUS_ACTIVE);
>>   
>> @@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
>>   
>>   fail:
>>       if (early_fail) {
>> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>> +        migrate_set_state(&s->state,
>> +                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
>>                   MIGRATION_STATUS_FAILED);
>>           bql_unlock();
>>       }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>>       return 0;
>>   }
>>   
>> -void qemu_savevm_state_setup(QEMUFile *f)
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>   {
>> +    ERRP_GUARD();
>>       MigrationState *ms = migrate_get_current();
>>       SaveStateEntry *se;
>> -    Error *local_err = NULL;
>>       int ret = 0;
>>   
>>       json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
>> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>       trace_savevm_state_setup();
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>           if (se->vmsd && se->vmsd->early_setup) {
>> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
>> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>>               if (ret) {
>> -                migrate_set_error(ms, local_err);
>> -                error_report_err(local_err);
>> +                migrate_set_error(ms, *errp);
>>                   qemu_file_set_error(f, ret);
>>                   break;
>>               }
>> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>           ret = se->ops->save_setup(f, se->opaque);
>>           save_section_footer(f, se);
>>           if (ret < 0) {
>> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
>> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>>               qemu_file_set_error(f, ret);
>>               break;
>>           }
>>       }
>>   
>>       if (ret) {
>> -        return;
>> +        return ret;
>>       }
>>   
>> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
>> -        error_report_err(local_err);
>> -    }
>> +    /* TODO: Should we check that errp is set in case of failure ? */
>> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>>   }
>>   
>>   int qemu_savevm_state_resume_prepare(MigrationState *s)
>> @@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>       ms->to_dst_file = f;
>>   
>>       qemu_savevm_state_header(f);
>> -    qemu_savevm_state_setup(f);
>> +    ret = qemu_savevm_state_setup(f, errp);
>> +    if (ret) {
>> +        goto cleanup;
>> +    }
>>   
>>       while (qemu_file_get_error(f) == 0) {
>>           if (qemu_savevm_state_iterate(f, false) > 0) {
>> @@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>           qemu_savevm_state_complete_precopy(f, false, false);
>>           ret = qemu_file_get_error(f);
>>       }
>> -    qemu_savevm_state_cleanup();
>>       if (ret != 0) {
>>           error_setg_errno(errp, -ret, "Error while writing VM state");
>>       }
>> +cleanup:
>> +    qemu_savevm_state_cleanup();
>>   
>>       if (ret != 0) {
>>           status = MIGRATION_STATUS_FAILED;
>> -- 
>> 2.44.0
>>
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 13:14     ` Cédric Le Goater
@ 2024-03-08 13:39       ` Cédric Le Goater
  2024-03-08 13:55         ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-08 13:39 UTC (permalink / raw)
  To: Peter Xu, Laurent Vivier
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 14:14, Cédric Le Goater wrote:
> On 3/8/24 13:56, Peter Xu wrote:
>> On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
>>> This prepares ground for the changes coming next which add an Error**
>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>> now handle the error and fail earlier setting the migration state from
>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>
>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>> reported by .save_setup() handlers.
>>>
>>> Since the previous behavior was to ignore errors at this step of
>>> migration, this change should be examined closely to check that
>>> cleanups are still correctly done.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>
>>>   Changes in v4:
>>>   - Merged cleanup change in qemu_savevm_state()
>>>   Changes in v3:
>>>   - Set migration state to MIGRATION_STATUS_FAILED
>>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>>   - Made sure an error is always set in case of failure in
>>>     qemu_savevm_state_setup()
>>>   migration/savevm.h    |  2 +-
>>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>> --- a/migration/savevm.h
>>> +++ b/migration/savevm.h
>>> @@ -32,7 +32,7 @@
>>>   bool qemu_savevm_state_blocked(Error **errp);
>>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>>   int qemu_savevm_state_prepare(Error **errp);
>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>   void qemu_savevm_state_header(QEMUFile *f);
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>       MigThrError thr_error;
>>>       bool urgent = false;
>>> +    Error *local_err = NULL;
>>> +    int ret;
>>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>       }
>>>       bql_lock();
>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>       bql_unlock();
>>> +    if (ret) {
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>> +                          MIGRATION_STATUS_FAILED);
>>> +        goto out;
>>> +     }
>>
>> There's a small indent issue, I can fix it.
> 
> checkpatch did report anything.
> 
>>
>> The bigger problem is I _think_ this will trigger a ci failure in the
>> virtio-net-failover test:
>>
>> ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
>> 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>> stderr:
>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>> **
>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>> (test program exited with status code -6)
>> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>>
>> I am not familiar enough with the failover code, and may not have time
>> today to follow this up, copy Laurent.  Cedric, if you have time, please
>> have a look. 
> 
> 
> Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
> aarch64. Let me check again.

I see one timeout error on s390x but not always. See below. It occurs with
or without this patchset. the other x86_64, ppc64 arches run fine (a part
from one io  test failing from time to time)

Thanks,

C.






# Start of compress tests
# Running /s390x/migration/postcopy/recovery/compress/plain
# Using machine type: s390-ccw-virtio-9.0
# starting QEMU: exec ./qemu-system-s390x -qtest unix:/tmp/qtest-3064311.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3064311.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine s390-ccw-virtio-9.0, -name source,debug-threads=on -m 128M -serial file:/tmp/migration-test-TO8BK2/src_serial -bios /tmp/migration-test-TO8BK2/bootsect    2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-s390x -qtest unix:/tmp/qtest-3064311.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3064311.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg -machine s390-ccw-virtio-9.0, -name target,debug-threads=on -m 128M -serial file:/tmp/migration-test-TO8BK2/dest_serial -incoming defer -bios /tmp/migration-test-TO8BK2/bootsect    2>/dev/null -accel qtest

**
ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
Bail out! ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from signal 9 (Killed)
Aborted (core dumped)



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 13:39       ` Cédric Le Goater
@ 2024-03-08 13:55         ` Cédric Le Goater
  2024-03-08 14:17           ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-08 13:55 UTC (permalink / raw)
  To: Peter Xu, Laurent Vivier
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 14:39, Cédric Le Goater wrote:
> On 3/8/24 14:14, Cédric Le Goater wrote:
>> On 3/8/24 13:56, Peter Xu wrote:
>>> On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
>>>> This prepares ground for the changes coming next which add an Error**
>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>> now handle the error and fail earlier setting the migration state from
>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>
>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>> reported by .save_setup() handlers.
>>>>
>>>> Since the previous behavior was to ignore errors at this step of
>>>> migration, this change should be examined closely to check that
>>>> cleanups are still correctly done.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>
>>>>   Changes in v4:
>>>>   - Merged cleanup change in qemu_savevm_state()
>>>>   Changes in v3:
>>>>   - Set migration state to MIGRATION_STATUS_FAILED
>>>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>>>   - Made sure an error is always set in case of failure in
>>>>     qemu_savevm_state_setup()
>>>>   migration/savevm.h    |  2 +-
>>>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>> --- a/migration/savevm.h
>>>> +++ b/migration/savevm.h
>>>> @@ -32,7 +32,7 @@
>>>>   bool qemu_savevm_state_blocked(Error **errp);
>>>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>>>   int qemu_savevm_state_prepare(Error **errp);
>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>   void qemu_savevm_state_header(QEMUFile *f);
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>       MigThrError thr_error;
>>>>       bool urgent = false;
>>>> +    Error *local_err = NULL;
>>>> +    int ret;
>>>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>       }
>>>>       bql_lock();
>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>       bql_unlock();
>>>> +    if (ret) {
>>>> +        migrate_set_error(s, local_err);
>>>> +        error_free(local_err);
>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>> +                          MIGRATION_STATUS_FAILED);
>>>> +        goto out;
>>>> +     }
>>>
>>> There's a small indent issue, I can fix it.
>>
>> checkpatch did report anything.
>>
>>>
>>> The bigger problem is I _think_ this will trigger a ci failure in the
>>> virtio-net-failover test:
>>>
>>> ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
>>> 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>>>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>> stderr:
>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>> **
>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>> (test program exited with status code -6)
>>> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>>>
>>> I am not familiar enough with the failover code, and may not have time
>>> today to follow this up, copy Laurent.  Cedric, if you have time, please
>>> have a look. 
>>
>>
>> Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
>> aarch64. Let me check again.
> 
> I see one timeout error on s390x but not always. See below. It occurs with
> or without this patchset. the other x86_64, ppc64 arches run fine (a part
> from one io  test failing from time to time)

Ah ! I got this once on aarch64 :

  161/486 ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL ERROR
161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover                  ERROR            5.98s   killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL

(test program exited with status code -6)
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I couldn't reproduce yet :/

Thanks,

C.






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 12:56   ` Peter Xu
  2024-03-08 13:14     ` Cédric Le Goater
@ 2024-03-08 14:11     ` Fabiano Rosas
  1 sibling, 0 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-08 14:11 UTC (permalink / raw)
  To: Peter Xu, Cédric Le Goater, Laurent Vivier
  Cc: qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Peter Xu <peterx@redhat.com> writes:

> On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
>> This prepares ground for the changes coming next which add an Error**
>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>> now handle the error and fail earlier setting the migration state from
>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>> 
>> In qemu_savevm_state(), move the cleanup to preserve the error
>> reported by .save_setup() handlers.
>> 
>> Since the previous behavior was to ignore errors at this step of
>> migration, this change should be examined closely to check that
>> cleanups are still correctly done.
>> 
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> 
>>  Changes in v4:
>>  
>>  - Merged cleanup change in qemu_savevm_state()
>>    
>>  Changes in v3:
>>  
>>  - Set migration state to MIGRATION_STATUS_FAILED 
>>  - Fixed error handling to be done under lock in bg_migration_thread()
>>  - Made sure an error is always set in case of failure in
>>    qemu_savevm_state_setup()
>>    
>>  migration/savevm.h    |  2 +-
>>  migration/migration.c | 27 ++++++++++++++++++++++++---
>>  migration/savevm.c    | 26 +++++++++++++++-----------
>>  3 files changed, 40 insertions(+), 15 deletions(-)
>> 
>> diff --git a/migration/savevm.h b/migration/savevm.h
>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>> --- a/migration/savevm.h
>> +++ b/migration/savevm.h
>> @@ -32,7 +32,7 @@
>>  bool qemu_savevm_state_blocked(Error **errp);
>>  void qemu_savevm_non_migratable_list(strList **reasons);
>>  int qemu_savevm_state_prepare(Error **errp);
>> -void qemu_savevm_state_setup(QEMUFile *f);
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>  bool qemu_savevm_state_guest_unplug_pending(void);
>>  int qemu_savevm_state_resume_prepare(MigrationState *s);
>>  void qemu_savevm_state_header(QEMUFile *f);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>      MigThrError thr_error;
>>      bool urgent = false;
>> +    Error *local_err = NULL;
>> +    int ret;
>>  
>>      thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>  
>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>      }
>>  
>>      bql_lock();
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>      bql_unlock();
>>  
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>> +                          MIGRATION_STATUS_FAILED);
>> +        goto out;
>> +     }
>
> There's a small indent issue, I can fix it.
>
> The bigger problem is I _think_ this will trigger a ci failure in the
> virtio-net-failover test:
>
> ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR         
> 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: ram_save_setup failed: Input/output error
> **
> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")

I would say testing for the CANCELLING state is unreliable, there's even
code a few lines below in the test that does the proper thing of testing
for CANCELLED in a loop.

However, the comment: 

 /* while the card is not ejected, we must be in "cancelling" state */

seems to imply that after migrate_fd_cancel (state==CANCELLING), the
migrate_fd_cleanup (state==CANCELLED) would only be executed after
"unplugging the card". So there must be "some logic" that has the effect
of preventing cleanup.

> (test program exited with status code -6)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> I am not familiar enough with the failover code, and may not have time
> today to follow this up, copy Laurent.  Cedric, if you have time, please
> have a look.  I'll give it a shot on Monday to find a solution, otherwise
> we may need to postpone some of the patches to 9.1.
>
> Thanks,
>
>> +
>>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                 MIGRATION_STATUS_ACTIVE);
>>  
>> @@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
>>      MigThrError thr_error;
>>      QEMUFile *fb;
>>      bool early_fail = true;
>> +    bool setup_fail = true;
>> +    Error *local_err = NULL;
>> +    int ret;
>>  
>>      rcu_register_thread();
>>      object_ref(OBJECT(s));
>> @@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
>>  
>>      bql_lock();
>>      qemu_savevm_state_header(s->to_dst_file);
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        goto fail;
>> +    }
>>      bql_unlock();
>>  
>> +    setup_fail = false;
>> +
>>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                 MIGRATION_STATUS_ACTIVE);
>>  
>> @@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
>>  
>>  fail:
>>      if (early_fail) {
>> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>> +        migrate_set_state(&s->state,
>> +                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
>>                  MIGRATION_STATUS_FAILED);
>>          bql_unlock();
>>      }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>>      return 0;
>>  }
>>  
>> -void qemu_savevm_state_setup(QEMUFile *f)
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>  {
>> +    ERRP_GUARD();
>>      MigrationState *ms = migrate_get_current();
>>      SaveStateEntry *se;
>> -    Error *local_err = NULL;
>>      int ret = 0;
>>  
>>      json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
>> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>      trace_savevm_state_setup();
>>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>          if (se->vmsd && se->vmsd->early_setup) {
>> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
>> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>>              if (ret) {
>> -                migrate_set_error(ms, local_err);
>> -                error_report_err(local_err);
>> +                migrate_set_error(ms, *errp);
>>                  qemu_file_set_error(f, ret);
>>                  break;
>>              }
>> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>          ret = se->ops->save_setup(f, se->opaque);
>>          save_section_footer(f, se);
>>          if (ret < 0) {
>> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
>> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>>              qemu_file_set_error(f, ret);
>>              break;
>>          }
>>      }
>>  
>>      if (ret) {
>> -        return;
>> +        return ret;
>>      }
>>  
>> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
>> -        error_report_err(local_err);
>> -    }
>> +    /* TODO: Should we check that errp is set in case of failure ? */
>> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>>  }
>>  
>>  int qemu_savevm_state_resume_prepare(MigrationState *s)
>> @@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>      ms->to_dst_file = f;
>>  
>>      qemu_savevm_state_header(f);
>> -    qemu_savevm_state_setup(f);
>> +    ret = qemu_savevm_state_setup(f, errp);
>> +    if (ret) {
>> +        goto cleanup;
>> +    }
>>  
>>      while (qemu_file_get_error(f) == 0) {
>>          if (qemu_savevm_state_iterate(f, false) > 0) {
>> @@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>          qemu_savevm_state_complete_precopy(f, false, false);
>>          ret = qemu_file_get_error(f);
>>      }
>> -    qemu_savevm_state_cleanup();
>>      if (ret != 0) {
>>          error_setg_errno(errp, -ret, "Error while writing VM state");
>>      }
>> +cleanup:
>> +    qemu_savevm_state_cleanup();
>>  
>>      if (ret != 0) {
>>          status = MIGRATION_STATUS_FAILED;
>> -- 
>> 2.44.0
>> 


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 13:55         ` Cédric Le Goater
@ 2024-03-08 14:17           ` Peter Xu
  2024-03-11 18:12             ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-08 14:17 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Laurent Vivier, qemu-devel, Fabiano Rosas, Alex Williamson,
	Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 08, 2024 at 02:55:30PM +0100, Cédric Le Goater wrote:
> On 3/8/24 14:39, Cédric Le Goater wrote:
> > On 3/8/24 14:14, Cédric Le Goater wrote:
> > > On 3/8/24 13:56, Peter Xu wrote:
> > > > On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
> > > > > This prepares ground for the changes coming next which add an Error**
> > > > > argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> > > > > now handle the error and fail earlier setting the migration state from
> > > > > MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> > > > > 
> > > > > In qemu_savevm_state(), move the cleanup to preserve the error
> > > > > reported by .save_setup() handlers.
> > > > > 
> > > > > Since the previous behavior was to ignore errors at this step of
> > > > > migration, this change should be examined closely to check that
> > > > > cleanups are still correctly done.
> > > > > 
> > > > > Signed-off-by: Cédric Le Goater <clg@redhat.com>
> > > > > ---
> > > > > 
> > > > >   Changes in v4:
> > > > >   - Merged cleanup change in qemu_savevm_state()
> > > > >   Changes in v3:
> > > > >   - Set migration state to MIGRATION_STATUS_FAILED
> > > > >   - Fixed error handling to be done under lock in bg_migration_thread()
> > > > >   - Made sure an error is always set in case of failure in
> > > > >     qemu_savevm_state_setup()
> > > > >   migration/savevm.h    |  2 +-
> > > > >   migration/migration.c | 27 ++++++++++++++++++++++++---
> > > > >   migration/savevm.c    | 26 +++++++++++++++-----------
> > > > >   3 files changed, 40 insertions(+), 15 deletions(-)
> > > > > 
> > > > > diff --git a/migration/savevm.h b/migration/savevm.h
> > > > > index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> > > > > --- a/migration/savevm.h
> > > > > +++ b/migration/savevm.h
> > > > > @@ -32,7 +32,7 @@
> > > > >   bool qemu_savevm_state_blocked(Error **errp);
> > > > >   void qemu_savevm_non_migratable_list(strList **reasons);
> > > > >   int qemu_savevm_state_prepare(Error **errp);
> > > > > -void qemu_savevm_state_setup(QEMUFile *f);
> > > > > +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
> > > > >   bool qemu_savevm_state_guest_unplug_pending(void);
> > > > >   int qemu_savevm_state_resume_prepare(MigrationState *s);
> > > > >   void qemu_savevm_state_header(QEMUFile *f);
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
> > > > >       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > > >       MigThrError thr_error;
> > > > >       bool urgent = false;
> > > > > +    Error *local_err = NULL;
> > > > > +    int ret;
> > > > >       thread = migration_threads_add("live_migration", qemu_get_thread_id());
> > > > > @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
> > > > >       }
> > > > >       bql_lock();
> > > > > -    qemu_savevm_state_setup(s->to_dst_file);
> > > > > +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> > > > >       bql_unlock();
> > > > > +    if (ret) {
> > > > > +        migrate_set_error(s, local_err);
> > > > > +        error_free(local_err);
> > > > > +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > > > +                          MIGRATION_STATUS_FAILED);
> > > > > +        goto out;
> > > > > +     }
> > > > 
> > > > There's a small indent issue, I can fix it.
> > > 
> > > checkpatch did report anything.
> > > 
> > > > 
> > > > The bigger problem is I _think_ this will trigger a ci failure in the
> > > > virtio-net-failover test:
> > > > 
> > > > ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
> > > > 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
> > > > > > > PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
> > > > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> > > > stderr:
> > > > qemu-system-x86_64: ram_save_setup failed: Input/output error
> > > > **
> > > > ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> > > > (test program exited with status code -6)
> > > > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> > > > 
> > > > I am not familiar enough with the failover code, and may not have time
> > > > today to follow this up, copy Laurent.  Cedric, if you have time, please
> > > > have a look.
> > > 
> > > 
> > > Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
> > > aarch64. Let me check again.
> > 
> > I see one timeout error on s390x but not always. See below. It occurs with
> > or without this patchset. the other x86_64, ppc64 arches run fine (a part
> > from one io  test failing from time to time)
> 
> Ah ! I got this once on aarch64 :
> 
>  161/486 ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL ERROR
> 161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover                  ERROR            5.98s   killed by signal 6 SIGABRT
> > > > G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: ram_save_setup failed: Input/output error
> **
> ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL
> 
> (test program exited with status code -6)
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Hmm, this one seems different..

> 
> I couldn't reproduce yet :/

I never reproduced it locally on x86, and my failure is always at checking
"cancelling" v.s. "cancelled" rather than the NULL check.  It's much easier
to trigger on CI in check-system-centos (I don't know why centos..):

https://gitlab.com/peterx/qemu/-/jobs/6351020546

I think at least for the error I hit, the problem is the failover test will
cancel the migration, but if it cancels too fast and during setup now it
can already fail it (while it won't fail before when we ignore
qemu_savevm_state_setup() errors), and I think it'll skip:

    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                               MIGRATION_STATUS_ACTIVE);

It seems the test wants the "cancelling" to hold until later:

    /* while the card is not ejected, we must be in "cancelling" state */
    ret = migrate_status(qts);

    status = qdict_get_str(ret, "status");
    g_assert_cmpstr(status, ==, "cancelling");
    qobject_unref(ret);

    /* OS unplugs the cards, QEMU can move from wait-unplug state */
    qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);

Again, since I'll need to read the failover code, not much I can tell.
Laurent might have a clue.

/me disappears..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
  2024-03-07 12:45   ` Fabiano Rosas
  2024-03-08 12:56   ` Peter Xu
@ 2024-03-08 14:36   ` Fabiano Rosas
  2024-03-11 18:15     ` Cédric Le Goater
  2 siblings, 1 reply; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-08 14:36 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Cédric Le Goater

Cédric Le Goater <clg@redhat.com> writes:

> This prepares ground for the changes coming next which add an Error**
> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> now handle the error and fail earlier setting the migration state from
> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>
> In qemu_savevm_state(), move the cleanup to preserve the error
> reported by .save_setup() handlers.
>
> Since the previous behavior was to ignore errors at this step of
> migration, this change should be examined closely to check that
> cleanups are still correctly done.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>
>  Changes in v4:
>  
>  - Merged cleanup change in qemu_savevm_state()
>    
>  Changes in v3:
>  
>  - Set migration state to MIGRATION_STATUS_FAILED 
>  - Fixed error handling to be done under lock in bg_migration_thread()
>  - Made sure an error is always set in case of failure in
>    qemu_savevm_state_setup()
>    
>  migration/savevm.h    |  2 +-
>  migration/migration.c | 27 ++++++++++++++++++++++++---
>  migration/savevm.c    | 26 +++++++++++++++-----------
>  3 files changed, 40 insertions(+), 15 deletions(-)
>
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -32,7 +32,7 @@
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_non_migratable_list(strList **reasons);
>  int qemu_savevm_state_prepare(Error **errp);
> -void qemu_savevm_state_setup(QEMUFile *f);
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>  bool qemu_savevm_state_guest_unplug_pending(void);
>  int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
> diff --git a/migration/migration.c b/migration/migration.c
> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      MigThrError thr_error;
>      bool urgent = false;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      thread = migration_threads_add("live_migration", qemu_get_thread_id());
>  
> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>      }
>  
>      bql_lock();
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>      bql_unlock();
>  
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> +                          MIGRATION_STATUS_FAILED);
> +        goto out;
> +     }
> +
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);

This^ should be before the new block it seems:

GOOD:
migrate_set_state new state setup
migrate_set_state new state wait-unplug
migrate_fd_cancel 
migrate_set_state new state cancelling
migrate_fd_cleanup 
migrate_set_state new state cancelled
migrate_fd_cancel 
ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug

BAD:
migrate_set_state new state setup
migrate_fd_cancel 
migrate_set_state new state cancelling
migrate_fd_cleanup 
migrate_set_state new state cancelled
qemu-system-x86_64: ram_save_setup failed: Input/output error
**
ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
assertion failed (status == "cancelling"): ("cancelled" == "cancelling")

Otherwise migration_iteration_finish() will schedule the cleanup BH and
that will run concurrently with migrate_fd_cancel() issued by the test
and bad things happens.

=====
PS: I guess the next level in our Freestyle Concurrency video-game is to
make migrate_fd_cancel() stop setting state and poking files and only
set a flag that's tested in the other parts of the code.

>  
> @@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
>      MigThrError thr_error;
>      QEMUFile *fb;
>      bool early_fail = true;
> +    bool setup_fail = true;
> +    Error *local_err = NULL;
> +    int ret;
>  
>      rcu_register_thread();
>      object_ref(OBJECT(s));
> @@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
>  
>      bql_lock();
>      qemu_savevm_state_header(s->to_dst_file);
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        goto fail;
> +    }
>      bql_unlock();
>  
> +    setup_fail = false;
> +
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
>  
> @@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
>  
>  fail:
>      if (early_fail) {
> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> +        migrate_set_state(&s->state,
> +                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
>                  MIGRATION_STATUS_FAILED);
>          bql_unlock();
>      }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>      return 0;
>  }
>  
> -void qemu_savevm_state_setup(QEMUFile *f)
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>  {
> +    ERRP_GUARD();
>      MigrationState *ms = migrate_get_current();
>      SaveStateEntry *se;
> -    Error *local_err = NULL;
>      int ret = 0;
>  
>      json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>      trace_savevm_state_setup();
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (se->vmsd && se->vmsd->early_setup) {
> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>              if (ret) {
> -                migrate_set_error(ms, local_err);
> -                error_report_err(local_err);
> +                migrate_set_error(ms, *errp);
>                  qemu_file_set_error(f, ret);
>                  break;
>              }
> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>          ret = se->ops->save_setup(f, se->opaque);
>          save_section_footer(f, se);
>          if (ret < 0) {
> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>              qemu_file_set_error(f, ret);
>              break;
>          }
>      }
>  
>      if (ret) {
> -        return;
> +        return ret;
>      }
>  
> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
> -        error_report_err(local_err);
> -    }
> +    /* TODO: Should we check that errp is set in case of failure ? */
> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>  }
>  
>  int qemu_savevm_state_resume_prepare(MigrationState *s)
> @@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      ms->to_dst_file = f;
>  
>      qemu_savevm_state_header(f);
> -    qemu_savevm_state_setup(f);
> +    ret = qemu_savevm_state_setup(f, errp);
> +    if (ret) {
> +        goto cleanup;
> +    }
>  
>      while (qemu_file_get_error(f) == 0) {
>          if (qemu_savevm_state_iterate(f, false) > 0) {
> @@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>          qemu_savevm_state_complete_precopy(f, false, false);
>          ret = qemu_file_get_error(f);
>      }
> -    qemu_savevm_state_cleanup();
>      if (ret != 0) {
>          error_setg_errno(errp, -ret, "Error while writing VM state");
>      }
> +cleanup:
> +    qemu_savevm_state_cleanup();
>  
>      if (ret != 0) {
>          status = MIGRATION_STATUS_FAILED;


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 07/25] migration: Always report an error in block_save_setup()
  2024-03-08  6:59   ` Peter Xu
@ 2024-03-11 15:22     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-11 15:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefan Hajnoczi, Kevin Wolf

On 3/8/24 07:59, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:22PM +0100, Cédric Le Goater wrote:
>> @@ -404,6 +403,10 @@ static int init_blk_migration(QEMUFile *f)
>>           sectors = bdrv_nb_sectors(bs);
>>           if (sectors <= 0) {
> 
> Not directly relevant to this patch, but just to mention that this looks
> suspicious (even if I know nothing about block migration..) - I am not sure
> whether any block drive would return 0 here, if so it looks still like a
> problem if we do the cleanup, ignoring the rest and return a success.

yes and it is not symmetric with block_load() :

                 total_sectors = blk_nb_sectors(blk);
                 if (total_sectors <= 0) {
                     error_report("Error getting length of block device %s",
                                  device_name);
                     return -EINVAL;
                 }


> 
>>               ret = sectors;
>> +            if (ret < 0) {
>> +                error_setg(errp, "Error getting length of block device %s",
>> +                           bdrv_get_device_name(bs));
>> +            }
>>               bdrv_next_cleanup(&it);
>>               goto out;
>>           }
> 

May be Kevin could tell if bdrv_nb_sectors(bs) == 0 should be considered
and error in the save_setup() context ?


Thanks,

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 14:17           ` Peter Xu
@ 2024-03-11 18:12             ` Cédric Le Goater
  2024-03-11 20:15               ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-11 18:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Laurent Vivier, qemu-devel, Fabiano Rosas, Alex Williamson,
	Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 15:17, Peter Xu wrote:
> On Fri, Mar 08, 2024 at 02:55:30PM +0100, Cédric Le Goater wrote:
>> On 3/8/24 14:39, Cédric Le Goater wrote:
>>> On 3/8/24 14:14, Cédric Le Goater wrote:
>>>> On 3/8/24 13:56, Peter Xu wrote:
>>>>> On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
>>>>>> This prepares ground for the changes coming next which add an Error**
>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>>> now handle the error and fail earlier setting the migration state from
>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>>
>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>>> reported by .save_setup() handlers.
>>>>>>
>>>>>> Since the previous behavior was to ignore errors at this step of
>>>>>> migration, this change should be examined closely to check that
>>>>>> cleanups are still correctly done.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>>> ---
>>>>>>
>>>>>>    Changes in v4:
>>>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>>>    Changes in v3:
>>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>>    - Made sure an error is always set in case of failure in
>>>>>>      qemu_savevm_state_setup()
>>>>>>    migration/savevm.h    |  2 +-
>>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>>> --- a/migration/savevm.h
>>>>>> +++ b/migration/savevm.h
>>>>>> @@ -32,7 +32,7 @@
>>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>>        MigThrError thr_error;
>>>>>>        bool urgent = false;
>>>>>> +    Error *local_err = NULL;
>>>>>> +    int ret;
>>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>>        }
>>>>>>        bql_lock();
>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>>        bql_unlock();
>>>>>> +    if (ret) {
>>>>>> +        migrate_set_error(s, local_err);
>>>>>> +        error_free(local_err);
>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>>> +        goto out;
>>>>>> +     }
>>>>>
>>>>> There's a small indent issue, I can fix it.
>>>>
>>>> checkpatch did report anything.
>>>>
>>>>>
>>>>> The bigger problem is I _think_ this will trigger a ci failure in the
>>>>> virtio-net-failover test:
>>>>>
>>>>> ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
>>>>> 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
>>>>>>>> PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
>>>>> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
>>>>> stderr:
>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>>> **
>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>> (test program exited with status code -6)
>>>>> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>>>>>
>>>>> I am not familiar enough with the failover code, and may not have time
>>>>> today to follow this up, copy Laurent.  Cedric, if you have time, please
>>>>> have a look.
>>>>
>>>>
>>>> Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
>>>> aarch64. Let me check again.
>>>
>>> I see one timeout error on s390x but not always. See below. It occurs with
>>> or without this patchset. the other x86_64, ppc64 arches run fine (a part
>>> from one io  test failing from time to time)
>>
>> Ah ! I got this once on aarch64 :
>>
>>   161/486 ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL ERROR
>> 161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover                  ERROR            5.98s   killed by signal 6 SIGABRT
>>>>> G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
>> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>> stderr:
>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>> **
>> ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL
>>
>> (test program exited with status code -6)
>> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> 
> Hmm, this one seems different..
> 
>>
>> I couldn't reproduce yet :/
> 
> I never reproduced it locally on x86, and my failure is always at checking
> "cancelling" v.s. "cancelled" rather than the NULL check.  It's much easier
> to trigger on CI in check-system-centos (I don't know why centos..):
> 
> https://gitlab.com/peterx/qemu/-/jobs/6351020546
> 
> I think at least for the error I hit, the problem is the failover test will
> cancel the migration, but if it cancels too fast and during setup now it
> can already fail it (while it won't fail before when we ignore
> qemu_savevm_state_setup() errors), and I think it'll skip:
> 
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
> 
> It seems the test wants the "cancelling" to hold until later:
> 
>      /* while the card is not ejected, we must be in "cancelling" state */
>      ret = migrate_status(qts);
> 
>      status = qdict_get_str(ret, "status");
>      g_assert_cmpstr(status, ==, "cancelling");
>      qobject_unref(ret);
> 
>      /* OS unplugs the cards, QEMU can move from wait-unplug state */
>      qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);
> 
> Again, since I'll need to read the failover code, not much I can tell.
> Laurent might have a clue.

I guess we need to fix the test to handle failures and this looks
like a complex task.


C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-08 14:36   ` Fabiano Rosas
@ 2024-03-11 18:15     ` Cédric Le Goater
  2024-03-11 19:03       ` Fabiano Rosas
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-11 18:15 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/8/24 15:36, Fabiano Rosas wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
>> This prepares ground for the changes coming next which add an Error**
>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>> now handle the error and fail earlier setting the migration state from
>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>
>> In qemu_savevm_state(), move the cleanup to preserve the error
>> reported by .save_setup() handlers.
>>
>> Since the previous behavior was to ignore errors at this step of
>> migration, this change should be examined closely to check that
>> cleanups are still correctly done.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v4:
>>   
>>   - Merged cleanup change in qemu_savevm_state()
>>     
>>   Changes in v3:
>>   
>>   - Set migration state to MIGRATION_STATUS_FAILED
>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>   - Made sure an error is always set in case of failure in
>>     qemu_savevm_state_setup()
>>     
>>   migration/savevm.h    |  2 +-
>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>
>> diff --git a/migration/savevm.h b/migration/savevm.h
>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>> --- a/migration/savevm.h
>> +++ b/migration/savevm.h
>> @@ -32,7 +32,7 @@
>>   bool qemu_savevm_state_blocked(Error **errp);
>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>   int qemu_savevm_state_prepare(Error **errp);
>> -void qemu_savevm_state_setup(QEMUFile *f);
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>   void qemu_savevm_state_header(QEMUFile *f);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>       MigThrError thr_error;
>>       bool urgent = false;
>> +    Error *local_err = NULL;
>> +    int ret;
>>   
>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>   
>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>       }
>>   
>>       bql_lock();
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>       bql_unlock();
>>   
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>> +                          MIGRATION_STATUS_FAILED);
>> +        goto out;
>> +     }
>> +
>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                  MIGRATION_STATUS_ACTIVE);
> 
> This^ should be before the new block it seems:
> 
> GOOD:
> migrate_set_state new state setup
> migrate_set_state new state wait-unplug
> migrate_fd_cancel
> migrate_set_state new state cancelling
> migrate_fd_cleanup
> migrate_set_state new state cancelled
> migrate_fd_cancel
> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
> 
> BAD:
> migrate_set_state new state setup
> migrate_fd_cancel
> migrate_set_state new state cancelling
> migrate_fd_cleanup
> migrate_set_state new state cancelled
> qemu-system-x86_64: ram_save_setup failed: Input/output error
> **
> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> 
> Otherwise migration_iteration_finish() will schedule the cleanup BH and
> that will run concurrently with migrate_fd_cancel() issued by the test
> and bad things happens.

This hack makes things work :

@@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
          qemu_savevm_send_colo_enable(s->to_dst_file);
      }
  
+    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
+                            MIGRATION_STATUS_SETUP);
+
      bql_lock();
      ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
      bql_unlock();

We should fix the test instead :) Unless waiting for failover devices
to unplug before the save_setup handlers and not after is ok.

commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
is not clear about the justification.:

     This patch adds a new migration state called wait-unplug.  It is entered
     after the SETUP state if failover devices are present. It will transition
     into ACTIVE once all devices were succesfully unplugged from the guest.


> =====
> PS: I guess the next level in our Freestyle Concurrency video-game is to
> make migrate_fd_cancel() stop setting state and poking files and only
> set a flag that's tested in the other parts of the code.

Is that a new item on the TODO list?

Thanks,

C.


> 
>>   
>> @@ -3530,6 +3540,9 @@ static void *bg_migration_thread(void *opaque)
>>       MigThrError thr_error;
>>       QEMUFile *fb;
>>       bool early_fail = true;
>> +    bool setup_fail = true;
>> +    Error *local_err = NULL;
>> +    int ret;
>>   
>>       rcu_register_thread();
>>       object_ref(OBJECT(s));
>> @@ -3563,9 +3576,16 @@ static void *bg_migration_thread(void *opaque)
>>   
>>       bql_lock();
>>       qemu_savevm_state_header(s->to_dst_file);
>> -    qemu_savevm_state_setup(s->to_dst_file);
>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>> +    if (ret) {
>> +        migrate_set_error(s, local_err);
>> +        error_free(local_err);
>> +        goto fail;
>> +    }
>>       bql_unlock();
>>   
>> +    setup_fail = false;
>> +
>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>                                  MIGRATION_STATUS_ACTIVE);
>>   
>> @@ -3632,7 +3652,8 @@ static void *bg_migration_thread(void *opaque)
>>   
>>   fail:
>>       if (early_fail) {
>> -        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>> +        migrate_set_state(&s->state,
>> +                setup_fail ? MIGRATION_STATUS_SETUP : MIGRATION_STATUS_ACTIVE,
>>                   MIGRATION_STATUS_FAILED);
>>           bql_unlock();
>>       }
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index ee31ffb5e88cea723039c754c30ce2c8a0ef35f3..63fdbb5ad7d4dbfaef1d2094350bf302cc677602 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>>       return 0;
>>   }
>>   
>> -void qemu_savevm_state_setup(QEMUFile *f)
>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>>   {
>> +    ERRP_GUARD();
>>       MigrationState *ms = migrate_get_current();
>>       SaveStateEntry *se;
>> -    Error *local_err = NULL;
>>       int ret = 0;
>>   
>>       json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
>> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>       trace_savevm_state_setup();
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>>           if (se->vmsd && se->vmsd->early_setup) {
>> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
>> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>>               if (ret) {
>> -                migrate_set_error(ms, local_err);
>> -                error_report_err(local_err);
>> +                migrate_set_error(ms, *errp);
>>                   qemu_file_set_error(f, ret);
>>                   break;
>>               }
>> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>>           ret = se->ops->save_setup(f, se->opaque);
>>           save_section_footer(f, se);
>>           if (ret < 0) {
>> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
>> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>>               qemu_file_set_error(f, ret);
>>               break;
>>           }
>>       }
>>   
>>       if (ret) {
>> -        return;
>> +        return ret;
>>       }
>>   
>> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
>> -        error_report_err(local_err);
>> -    }
>> +    /* TODO: Should we check that errp is set in case of failure ? */
>> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>>   }
>>   
>>   int qemu_savevm_state_resume_prepare(MigrationState *s)
>> @@ -1728,7 +1728,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>       ms->to_dst_file = f;
>>   
>>       qemu_savevm_state_header(f);
>> -    qemu_savevm_state_setup(f);
>> +    ret = qemu_savevm_state_setup(f, errp);
>> +    if (ret) {
>> +        goto cleanup;
>> +    }
>>   
>>       while (qemu_file_get_error(f) == 0) {
>>           if (qemu_savevm_state_iterate(f, false) > 0) {
>> @@ -1741,10 +1744,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>           qemu_savevm_state_complete_precopy(f, false, false);
>>           ret = qemu_file_get_error(f);
>>       }
>> -    qemu_savevm_state_cleanup();
>>       if (ret != 0) {
>>           error_setg_errno(errp, -ret, "Error while writing VM state");
>>       }
>> +cleanup:
>> +    qemu_savevm_state_cleanup();
>>   
>>       if (ret != 0) {
>>           status = MIGRATION_STATUS_FAILED;
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-11 18:15     ` Cédric Le Goater
@ 2024-03-11 19:03       ` Fabiano Rosas
  2024-03-11 20:10         ` Peter Xu
  2024-03-12 12:32         ` Cédric Le Goater
  0 siblings, 2 replies; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-11 19:03 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Cédric Le Goater <clg@redhat.com> writes:

> On 3/8/24 15:36, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>>> This prepares ground for the changes coming next which add an Error**
>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>> now handle the error and fail earlier setting the migration state from
>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>
>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>> reported by .save_setup() handlers.
>>>
>>> Since the previous behavior was to ignore errors at this step of
>>> migration, this change should be examined closely to check that
>>> cleanups are still correctly done.
>>>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>>
>>>   Changes in v4:
>>>   
>>>   - Merged cleanup change in qemu_savevm_state()
>>>     
>>>   Changes in v3:
>>>   
>>>   - Set migration state to MIGRATION_STATUS_FAILED
>>>   - Fixed error handling to be done under lock in bg_migration_thread()
>>>   - Made sure an error is always set in case of failure in
>>>     qemu_savevm_state_setup()
>>>     
>>>   migration/savevm.h    |  2 +-
>>>   migration/migration.c | 27 ++++++++++++++++++++++++---
>>>   migration/savevm.c    | 26 +++++++++++++++-----------
>>>   3 files changed, 40 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>> --- a/migration/savevm.h
>>> +++ b/migration/savevm.h
>>> @@ -32,7 +32,7 @@
>>>   bool qemu_savevm_state_blocked(Error **errp);
>>>   void qemu_savevm_non_migratable_list(strList **reasons);
>>>   int qemu_savevm_state_prepare(Error **errp);
>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>   bool qemu_savevm_state_guest_unplug_pending(void);
>>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>   void qemu_savevm_state_header(QEMUFile *f);
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>       MigThrError thr_error;
>>>       bool urgent = false;
>>> +    Error *local_err = NULL;
>>> +    int ret;
>>>   
>>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>   
>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>       }
>>>   
>>>       bql_lock();
>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>       bql_unlock();
>>>   
>>> +    if (ret) {
>>> +        migrate_set_error(s, local_err);
>>> +        error_free(local_err);
>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>> +                          MIGRATION_STATUS_FAILED);
>>> +        goto out;
>>> +     }
>>> +
>>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>                                  MIGRATION_STATUS_ACTIVE);
>> 
>> This^ should be before the new block it seems:
>> 
>> GOOD:
>> migrate_set_state new state setup
>> migrate_set_state new state wait-unplug
>> migrate_fd_cancel
>> migrate_set_state new state cancelling
>> migrate_fd_cleanup
>> migrate_set_state new state cancelled
>> migrate_fd_cancel
>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>> 
>> BAD:
>> migrate_set_state new state setup
>> migrate_fd_cancel
>> migrate_set_state new state cancelling
>> migrate_fd_cleanup
>> migrate_set_state new state cancelled
>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>> **
>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>> 
>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>> that will run concurrently with migrate_fd_cancel() issued by the test
>> and bad things happens.
>
> This hack makes things work :
>
> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>           qemu_savevm_send_colo_enable(s->to_dst_file);
>       }
>   
> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> +                            MIGRATION_STATUS_SETUP);
> +

Why move it all the way up here? Has moving the wait_unplug before the
'if (ret)' block not worked for you?

>       bql_lock();
>       ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>       bql_unlock();
>
> We should fix the test instead :) Unless waiting for failover devices
> to unplug before the save_setup handlers and not after is ok.
>
> commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
> is not clear about the justification.:
>
>      This patch adds a new migration state called wait-unplug.  It is entered
>      after the SETUP state if failover devices are present. It will transition
>      into ACTIVE once all devices were succesfully unplugged from the guest.

This is not clear indeed, but to me it seems having the wait-unplug
after setup was important.

>
>
>> =====
>> PS: I guess the next level in our Freestyle Concurrency video-game is to
>> make migrate_fd_cancel() stop setting state and poking files and only
>> set a flag that's tested in the other parts of the code.
>
> Is that a new item on the TODO list?

Yep, I'll add it to the wiki.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-11 19:03       ` Fabiano Rosas
@ 2024-03-11 20:10         ` Peter Xu
  2024-03-12 13:01           ` Cédric Le Goater
  2024-03-12 12:32         ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-11 20:10 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Cédric Le Goater, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Mon, Mar 11, 2024 at 04:03:14PM -0300, Fabiano Rosas wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
> > On 3/8/24 15:36, Fabiano Rosas wrote:
> >> Cédric Le Goater <clg@redhat.com> writes:
> >> 
> >>> This prepares ground for the changes coming next which add an Error**
> >>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> >>> now handle the error and fail earlier setting the migration state from
> >>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> >>>
> >>> In qemu_savevm_state(), move the cleanup to preserve the error
> >>> reported by .save_setup() handlers.
> >>>
> >>> Since the previous behavior was to ignore errors at this step of
> >>> migration, this change should be examined closely to check that
> >>> cleanups are still correctly done.
> >>>
> >>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> >>> ---
> >>>
> >>>   Changes in v4:
> >>>   
> >>>   - Merged cleanup change in qemu_savevm_state()
> >>>     
> >>>   Changes in v3:
> >>>   
> >>>   - Set migration state to MIGRATION_STATUS_FAILED
> >>>   - Fixed error handling to be done under lock in bg_migration_thread()
> >>>   - Made sure an error is always set in case of failure in
> >>>     qemu_savevm_state_setup()
> >>>     
> >>>   migration/savevm.h    |  2 +-
> >>>   migration/migration.c | 27 ++++++++++++++++++++++++---
> >>>   migration/savevm.c    | 26 +++++++++++++++-----------
> >>>   3 files changed, 40 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/migration/savevm.h b/migration/savevm.h
> >>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> >>> --- a/migration/savevm.h
> >>> +++ b/migration/savevm.h
> >>> @@ -32,7 +32,7 @@
> >>>   bool qemu_savevm_state_blocked(Error **errp);
> >>>   void qemu_savevm_non_migratable_list(strList **reasons);
> >>>   int qemu_savevm_state_prepare(Error **errp);
> >>> -void qemu_savevm_state_setup(QEMUFile *f);
> >>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
> >>>   bool qemu_savevm_state_guest_unplug_pending(void);
> >>>   int qemu_savevm_state_resume_prepare(MigrationState *s);
> >>>   void qemu_savevm_state_header(QEMUFile *f);
> >>> diff --git a/migration/migration.c b/migration/migration.c
> >>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> >>> --- a/migration/migration.c
> >>> +++ b/migration/migration.c
> >>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
> >>>       int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >>>       MigThrError thr_error;
> >>>       bool urgent = false;
> >>> +    Error *local_err = NULL;
> >>> +    int ret;
> >>>   
> >>>       thread = migration_threads_add("live_migration", qemu_get_thread_id());
> >>>   
> >>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
> >>>       }
> >>>   
> >>>       bql_lock();
> >>> -    qemu_savevm_state_setup(s->to_dst_file);
> >>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> >>>       bql_unlock();
> >>>   
> >>> +    if (ret) {
> >>> +        migrate_set_error(s, local_err);
> >>> +        error_free(local_err);
> >>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> >>> +                          MIGRATION_STATUS_FAILED);
> >>> +        goto out;
> >>> +     }
> >>> +
> >>>       qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> >>>                                  MIGRATION_STATUS_ACTIVE);
> >> 
> >> This^ should be before the new block it seems:
> >> 
> >> GOOD:
> >> migrate_set_state new state setup
> >> migrate_set_state new state wait-unplug
> >> migrate_fd_cancel
> >> migrate_set_state new state cancelling
> >> migrate_fd_cleanup
> >> migrate_set_state new state cancelled
> >> migrate_fd_cancel
> >> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
> >> 
> >> BAD:
> >> migrate_set_state new state setup
> >> migrate_fd_cancel
> >> migrate_set_state new state cancelling
> >> migrate_fd_cleanup
> >> migrate_set_state new state cancelled
> >> qemu-system-x86_64: ram_save_setup failed: Input/output error
> >> **
> >> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
> >> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> >> 
> >> Otherwise migration_iteration_finish() will schedule the cleanup BH and
> >> that will run concurrently with migrate_fd_cancel() issued by the test
> >> and bad things happens.
> >
> > This hack makes things work :
> >
> > @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
> >           qemu_savevm_send_colo_enable(s->to_dst_file);
> >       }
> >   
> > +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> > +                            MIGRATION_STATUS_SETUP);
> > +
> 
> Why move it all the way up here? Has moving the wait_unplug before the
> 'if (ret)' block not worked for you?
> 
> >       bql_lock();
> >       ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> >       bql_unlock();
> >
> > We should fix the test instead :) Unless waiting for failover devices
> > to unplug before the save_setup handlers and not after is ok.
> >
> > commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
> > is not clear about the justification.:
> >
> >      This patch adds a new migration state called wait-unplug.  It is entered
> >      after the SETUP state if failover devices are present. It will transition
> >      into ACTIVE once all devices were succesfully unplugged from the guest.
> 
> This is not clear indeed, but to me it seems having the wait-unplug
> after setup was important.

Finally got some time to read this code..

So far I didn't see an issue if it's called before the setup hooks.
Actually it looks to me it should better do that before those hooks.

IIUC what that qemu_savevm_wait_unplug() does is waiting for all the
primary devices to be completely unplugged before moving on the migration.

Here setup() hook, or to be explicit, the primary devices' VMSDs (if ever
existed, and if that was the concern) should have zero impact on such wait,
because the "unplug" should also contain one step to unregister those
vmsds; see the virtio_net_handle_migration_primary() where it has:

        if (failover_unplug_primary(n, dev)) {
            vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
            ...
        }

So qemu_savevm_wait_unplug() looks like a pure wait function to me until
all the unplug is processed by the guest OS.  And it makes some sense to me
to avoid calling setup() (which can start to hold resources, like in RAM we
create bitmaps etc to prepare for migration) before such possible long halts.

In all cases, I guess it's still too rush to figure out a plan, meanwhile
anything proposed for either test/code changes would better get some
reviews from either Laurent or other virtio-net guys.  I think I'll go
ahead the pull without the 2nd batch of patches.

> 
> >
> >
> >> =====
> >> PS: I guess the next level in our Freestyle Concurrency video-game is to
> >> make migrate_fd_cancel() stop setting state and poking files and only
> >> set a flag that's tested in the other parts of the code.
> >
> > Is that a new item on the TODO list?
> 
> Yep, I'll add it to the wiki.

Sounds like a good thing, however let's be aware of the evils (that are
always in the details..), where there can be users/tests relying on that
"CANCELLING" state, so it can be part of the ABIs.. :-(

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-11 18:12             ` Cédric Le Goater
@ 2024-03-11 20:15               ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-11 20:15 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Laurent Vivier, qemu-devel, Fabiano Rosas, Alex Williamson,
	Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Mon, Mar 11, 2024 at 07:12:11PM +0100, Cédric Le Goater wrote:
> On 3/8/24 15:17, Peter Xu wrote:
> > On Fri, Mar 08, 2024 at 02:55:30PM +0100, Cédric Le Goater wrote:
> > > On 3/8/24 14:39, Cédric Le Goater wrote:
> > > > On 3/8/24 14:14, Cédric Le Goater wrote:
> > > > > On 3/8/24 13:56, Peter Xu wrote:
> > > > > > On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
> > > > > > > This prepares ground for the changes coming next which add an Error**
> > > > > > > argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> > > > > > > now handle the error and fail earlier setting the migration state from
> > > > > > > MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> > > > > > > 
> > > > > > > In qemu_savevm_state(), move the cleanup to preserve the error
> > > > > > > reported by .save_setup() handlers.
> > > > > > > 
> > > > > > > Since the previous behavior was to ignore errors at this step of
> > > > > > > migration, this change should be examined closely to check that
> > > > > > > cleanups are still correctly done.
> > > > > > > 
> > > > > > > Signed-off-by: Cédric Le Goater <clg@redhat.com>
> > > > > > > ---
> > > > > > > 
> > > > > > >    Changes in v4:
> > > > > > >    - Merged cleanup change in qemu_savevm_state()
> > > > > > >    Changes in v3:
> > > > > > >    - Set migration state to MIGRATION_STATUS_FAILED
> > > > > > >    - Fixed error handling to be done under lock in bg_migration_thread()
> > > > > > >    - Made sure an error is always set in case of failure in
> > > > > > >      qemu_savevm_state_setup()
> > > > > > >    migration/savevm.h    |  2 +-
> > > > > > >    migration/migration.c | 27 ++++++++++++++++++++++++---
> > > > > > >    migration/savevm.c    | 26 +++++++++++++++-----------
> > > > > > >    3 files changed, 40 insertions(+), 15 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/migration/savevm.h b/migration/savevm.h
> > > > > > > index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> > > > > > > --- a/migration/savevm.h
> > > > > > > +++ b/migration/savevm.h
> > > > > > > @@ -32,7 +32,7 @@
> > > > > > >    bool qemu_savevm_state_blocked(Error **errp);
> > > > > > >    void qemu_savevm_non_migratable_list(strList **reasons);
> > > > > > >    int qemu_savevm_state_prepare(Error **errp);
> > > > > > > -void qemu_savevm_state_setup(QEMUFile *f);
> > > > > > > +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
> > > > > > >    bool qemu_savevm_state_guest_unplug_pending(void);
> > > > > > >    int qemu_savevm_state_resume_prepare(MigrationState *s);
> > > > > > >    void qemu_savevm_state_header(QEMUFile *f);
> > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> > > > > > > --- a/migration/migration.c
> > > > > > > +++ b/migration/migration.c
> > > > > > > @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
> > > > > > >        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > > > > >        MigThrError thr_error;
> > > > > > >        bool urgent = false;
> > > > > > > +    Error *local_err = NULL;
> > > > > > > +    int ret;
> > > > > > >        thread = migration_threads_add("live_migration", qemu_get_thread_id());
> > > > > > > @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
> > > > > > >        }
> > > > > > >        bql_lock();
> > > > > > > -    qemu_savevm_state_setup(s->to_dst_file);
> > > > > > > +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> > > > > > >        bql_unlock();
> > > > > > > +    if (ret) {
> > > > > > > +        migrate_set_error(s, local_err);
> > > > > > > +        error_free(local_err);
> > > > > > > +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > > > > > +                          MIGRATION_STATUS_FAILED);
> > > > > > > +        goto out;
> > > > > > > +     }
> > > > > > 
> > > > > > There's a small indent issue, I can fix it.
> > > > > 
> > > > > checkpatch did report anything.
> > > > > 
> > > > > > 
> > > > > > The bigger problem is I _think_ this will trigger a ci failure in the
> > > > > > virtio-net-failover test:
> > > > > > 
> > > > > > ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
> > > > > > 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover    ERROR            4.77s   killed by signal 6 SIGABRT
> > > > > > > > > PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
> > > > > > ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> > > > > > stderr:
> > > > > > qemu-system-x86_64: ram_save_setup failed: Input/output error
> > > > > > **
> > > > > > ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> > > > > > (test program exited with status code -6)
> > > > > > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> > > > > > 
> > > > > > I am not familiar enough with the failover code, and may not have time
> > > > > > today to follow this up, copy Laurent.  Cedric, if you have time, please
> > > > > > have a look.
> > > > > 
> > > > > 
> > > > > Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
> > > > > aarch64. Let me check again.
> > > > 
> > > > I see one timeout error on s390x but not always. See below. It occurs with
> > > > or without this patchset. the other x86_64, ppc64 arches run fine (a part
> > > > from one io  test failing from time to time)
> > > 
> > > Ah ! I got this once on aarch64 :
> > > 
> > >   161/486 ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL ERROR
> > > 161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover                  ERROR            5.98s   killed by signal 6 SIGABRT
> > > > > > G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
> > > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> > > stderr:
> > > qemu-system-x86_64: ram_save_setup failed: Input/output error
> > > **
> > > ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL
> > > 
> > > (test program exited with status code -6)
> > > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> > 
> > Hmm, this one seems different..
> > 
> > > 
> > > I couldn't reproduce yet :/
> > 
> > I never reproduced it locally on x86, and my failure is always at checking
> > "cancelling" v.s. "cancelled" rather than the NULL check.  It's much easier
> > to trigger on CI in check-system-centos (I don't know why centos..):
> > 
> > https://gitlab.com/peterx/qemu/-/jobs/6351020546
> > 
> > I think at least for the error I hit, the problem is the failover test will
> > cancel the migration, but if it cancels too fast and during setup now it
> > can already fail it (while it won't fail before when we ignore
> > qemu_savevm_state_setup() errors), and I think it'll skip:
> > 
> >      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> >                                 MIGRATION_STATUS_ACTIVE);
> > 
> > It seems the test wants the "cancelling" to hold until later:
> > 
> >      /* while the card is not ejected, we must be in "cancelling" state */
> >      ret = migrate_status(qts);
> > 
> >      status = qdict_get_str(ret, "status");
> >      g_assert_cmpstr(status, ==, "cancelling");
> >      qobject_unref(ret);
> > 
> >      /* OS unplugs the cards, QEMU can move from wait-unplug state */
> >      qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);
> > 
> > Again, since I'll need to read the failover code, not much I can tell.
> > Laurent might have a clue.
> 
> I guess we need to fix the test to handle failures and this looks
> like a complex task.

It may not be as complicated, but that'll need some reviews outside
migration people, and someone will need to post a formal patch.  That'll
definitely take some time.

We can keep working on this issue during the 9.0-rc if you want.  Our
current plan for migration (at least what I have in mind in this
release... Fabiano will take over 9.1 release pulls, so he has the freedom
to change this) is we will allow patches to be queued even during RCs,
while I'll prepare two trees, one for -next and one for -stable, then -next
candidates will only be included in the first 9.1 pull, -stable for RC
pulls if necessary.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-08  8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
  2024-03-08 13:03   ` Cédric Le Goater
@ 2024-03-11 20:24   ` Peter Xu
  2024-03-12  7:16     ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-11 20:24 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> > * [1-4] already queued in migration-next.
> >   
> >   migration: Report error when shutdown fails
> >   migration: Remove SaveStateHandler and LoadStateHandler typedefs
> >   migration: Add documentation for SaveVMHandlers
> >   migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
> >   
> > * [5-9] are prequisite changes in other components related to the
> >   migration save_setup() handler. They make sure a failure is not
> >   returned without setting an error.
> >   
> >   s390/stattrib: Add Error** argument to set_migrationmode() handler
> >   vfio: Always report an error in vfio_save_setup()
> >   migration: Always report an error in block_save_setup()
> >   migration: Always report an error in ram_save_setup()
> >   migration: Add Error** argument to vmstate_save()
> > 
> > * [10-15] are the core changes in migration and memory components to
> >   propagate an error reported in a save_setup() handler.
> > 
> >   migration: Add Error** argument to qemu_savevm_state_setup()
> >   migration: Add Error** argument to .save_setup() handler
> >   migration: Add Error** argument to .load_setup() handler
> 
> Further queued 5-12 in migration-staging (until here), thanks.

Just to keep a record: due to the virtio failover test failure and the
other block migration uncertainty in patch 7 (in which case we may want to
have a fix on sectors==0 case), I unqueued this chunk for 9.0.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-11 20:24   ` Peter Xu
@ 2024-03-12  7:16     ` Cédric Le Goater
  2024-03-12  9:58       ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12  7:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/11/24 21:24, Peter Xu wrote:
> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>> * [1-4] already queued in migration-next.
>>>    
>>>    migration: Report error when shutdown fails
>>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>    migration: Add documentation for SaveVMHandlers
>>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>    
>>> * [5-9] are prequisite changes in other components related to the
>>>    migration save_setup() handler. They make sure a failure is not
>>>    returned without setting an error.
>>>    
>>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>    vfio: Always report an error in vfio_save_setup()
>>>    migration: Always report an error in block_save_setup()
>>>    migration: Always report an error in ram_save_setup()
>>>    migration: Add Error** argument to vmstate_save()
>>>
>>> * [10-15] are the core changes in migration and memory components to
>>>    propagate an error reported in a save_setup() handler.
>>>
>>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>>    migration: Add Error** argument to .save_setup() handler
>>>    migration: Add Error** argument to .load_setup() handler
>>
>> Further queued 5-12 in migration-staging (until here), thanks.
> 
> Just to keep a record: due to the virtio failover test failure and the
> other block migration uncertainty in patch 7 (in which case we may want to
> have a fix on sectors==0 case), I unqueued this chunk for 9.0.

ok. I will ask the block folks for help to understand if sectors==0
is also an error in the save_setup context. May be  we can still
merge these in 9.0 cycle.
  
Thanks,

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-12  7:16     ` Cédric Le Goater
@ 2024-03-12  9:58       ` Cédric Le Goater
  2024-03-12 11:50         ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12  9:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/12/24 08:16, Cédric Le Goater wrote:
> On 3/11/24 21:24, Peter Xu wrote:
>> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>>> * [1-4] already queued in migration-next.
>>>>    migration: Report error when shutdown fails
>>>>    migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>>    migration: Add documentation for SaveVMHandlers
>>>>    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>> * [5-9] are prequisite changes in other components related to the
>>>>    migration save_setup() handler. They make sure a failure is not
>>>>    returned without setting an error.
>>>>    s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>>    vfio: Always report an error in vfio_save_setup()
>>>>    migration: Always report an error in block_save_setup()
>>>>    migration: Always report an error in ram_save_setup()
>>>>    migration: Add Error** argument to vmstate_save()
>>>>
>>>> * [10-15] are the core changes in migration and memory components to
>>>>    propagate an error reported in a save_setup() handler.
>>>>
>>>>    migration: Add Error** argument to qemu_savevm_state_setup()
>>>>    migration: Add Error** argument to .save_setup() handler
>>>>    migration: Add Error** argument to .load_setup() handler
>>>
>>> Further queued 5-12 in migration-staging (until here), thanks.
>>
>> Just to keep a record: due to the virtio failover test failure and the
>> other block migration uncertainty in patch 7 (in which case we may want to
>> have a fix on sectors==0 case), I unqueued this chunk for 9.0.
> 
> ok. I will ask the block folks for help to understand if sectors==0
> is also an error in the save_setup context. May be  we can still
> merge these in 9.0 cycle.

I discussed with Kevin and sectors==0 is not an error case, the loop
should simply continue. That said, commit 66db46ca83b8 ("migration:
Deprecate block migration") would let us remove all that code in
the next cycle which is even simpler.

Thanks,

C.






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-12  9:58       ` Cédric Le Goater
@ 2024-03-12 11:50         ` Peter Xu
  2024-03-12 12:09           ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-12 11:50 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Tue, Mar 12, 2024 at 10:58:51AM +0100, Cédric Le Goater wrote:
> On 3/12/24 08:16, Cédric Le Goater wrote:
> > On 3/11/24 21:24, Peter Xu wrote:
> > > On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
> > > > On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
> > > > > * [1-4] already queued in migration-next.
> > > > >    migration: Report error when shutdown fails
> > > > >    migration: Remove SaveStateHandler and LoadStateHandler typedefs
> > > > >    migration: Add documentation for SaveVMHandlers
> > > > >    migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
> > > > > * [5-9] are prequisite changes in other components related to the
> > > > >    migration save_setup() handler. They make sure a failure is not
> > > > >    returned without setting an error.
> > > > >    s390/stattrib: Add Error** argument to set_migrationmode() handler
> > > > >    vfio: Always report an error in vfio_save_setup()
> > > > >    migration: Always report an error in block_save_setup()
> > > > >    migration: Always report an error in ram_save_setup()
> > > > >    migration: Add Error** argument to vmstate_save()
> > > > > 
> > > > > * [10-15] are the core changes in migration and memory components to
> > > > >    propagate an error reported in a save_setup() handler.
> > > > > 
> > > > >    migration: Add Error** argument to qemu_savevm_state_setup()
> > > > >    migration: Add Error** argument to .save_setup() handler
> > > > >    migration: Add Error** argument to .load_setup() handler
> > > > 
> > > > Further queued 5-12 in migration-staging (until here), thanks.
> > > 
> > > Just to keep a record: due to the virtio failover test failure and the
> > > other block migration uncertainty in patch 7 (in which case we may want to
> > > have a fix on sectors==0 case), I unqueued this chunk for 9.0.
> > 
> > ok. I will ask the block folks for help to understand if sectors==0
> > is also an error in the save_setup context. May be  we can still
> > merge these in 9.0 cycle.
> 
> I discussed with Kevin and sectors==0 is not an error case, the loop
> should simply continue. That said, commit 66db46ca83b8 ("migration:
> Deprecate block migration") would let us remove all that code in
> the next cycle which is even simpler.

Thanks for taking a look.  I can try to have a look at removing block
migration in 9.1.

Regarding to the failover failure - I still think what you posted as a
"hack" could be an official patch.  Do you plan to send it?  Or do you have
anything else in mind?

For 9.0, we're missing softfreeze. IIUC we can only merge things like
regression fixes, documentation updates, some test changess, etc.. into rc
windows. With QEMU's heavy reliance on CI now I don't even think most test
case changes would be applicable for RCs unless it's never run in a CI.  So
unless there's a strong need, it'll be easier if we wait for 9.1 (but yet
again, we can still queue them earlier, so they will appear in the 1st 9.1
pull).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-12 11:50         ` Peter Xu
@ 2024-03-12 12:09           ` Cédric Le Goater
  2024-03-12 12:25             ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 12:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/12/24 12:50, Peter Xu wrote:
> On Tue, Mar 12, 2024 at 10:58:51AM +0100, Cédric Le Goater wrote:
>> On 3/12/24 08:16, Cédric Le Goater wrote:
>>> On 3/11/24 21:24, Peter Xu wrote:
>>>> On Fri, Mar 08, 2024 at 04:15:08PM +0800, Peter Xu wrote:
>>>>> On Wed, Mar 06, 2024 at 02:34:15PM +0100, Cédric Le Goater wrote:
>>>>>> * [1-4] already queued in migration-next.
>>>>>>     migration: Report error when shutdown fails
>>>>>>     migration: Remove SaveStateHandler and LoadStateHandler typedefs
>>>>>>     migration: Add documentation for SaveVMHandlers
>>>>>>     migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
>>>>>> * [5-9] are prequisite changes in other components related to the
>>>>>>     migration save_setup() handler. They make sure a failure is not
>>>>>>     returned without setting an error.
>>>>>>     s390/stattrib: Add Error** argument to set_migrationmode() handler
>>>>>>     vfio: Always report an error in vfio_save_setup()
>>>>>>     migration: Always report an error in block_save_setup()
>>>>>>     migration: Always report an error in ram_save_setup()
>>>>>>     migration: Add Error** argument to vmstate_save()
>>>>>>
>>>>>> * [10-15] are the core changes in migration and memory components to
>>>>>>     propagate an error reported in a save_setup() handler.
>>>>>>
>>>>>>     migration: Add Error** argument to qemu_savevm_state_setup()
>>>>>>     migration: Add Error** argument to .save_setup() handler
>>>>>>     migration: Add Error** argument to .load_setup() handler
>>>>>
>>>>> Further queued 5-12 in migration-staging (until here), thanks.
>>>>
>>>> Just to keep a record: due to the virtio failover test failure and the
>>>> other block migration uncertainty in patch 7 (in which case we may want to
>>>> have a fix on sectors==0 case), I unqueued this chunk for 9.0.
>>>
>>> ok. I will ask the block folks for help to understand if sectors==0
>>> is also an error in the save_setup context. May be  we can still
>>> merge these in 9.0 cycle.
>>
>> I discussed with Kevin and sectors==0 is not an error case, the loop
>> should simply continue. That said, commit 66db46ca83b8 ("migration:
>> Deprecate block migration") would let us remove all that code in
>> the next cycle which is even simpler.
> 
> Thanks for taking a look.  I can try to have a look at removing block
> migration in 9.1.

Just sent a 9.0 fix for the block part.

> Regarding to the failover failure - I still think what you posted as a
> "hack" could be an official patch.  Do you plan to send it?  
> Or do you have anything else in mind?

I was hoping to fix the test case instead. I can try to improve the hack
I sent this afternoon.

Thanks,

C.


> 
> For 9.0, we're missing softfreeze. IIUC we can only merge things like
> regression fixes, documentation updates, some test changess, etc.. into rc
> windows. With QEMU's heavy reliance on CI now I don't even think most test
> case changes would be applicable for RCs unless it's never run in a CI.  So
> unless there's a strong need, it'll be easier if we wait for 9.1 (but yet
> again, we can still queue them earlier, so they will appear in the 1st 9.1
> pull).
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 00/25] migration: Improve error reporting
  2024-03-12 12:09           ` Cédric Le Goater
@ 2024-03-12 12:25             ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-12 12:25 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Tue, Mar 12, 2024 at 01:09:42PM +0100, Cédric Le Goater wrote:
> I was hoping to fix the test case instead. I can try to improve the hack
> I sent this afternoon.

Thanks, please go whatever way you think is the right approach.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-11 19:03       ` Fabiano Rosas
  2024-03-11 20:10         ` Peter Xu
@ 2024-03-12 12:32         ` Cédric Le Goater
  2024-03-12 13:34           ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 12:32 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/11/24 20:03, Fabiano Rosas wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>> Cédric Le Goater <clg@redhat.com> writes:
>>>
>>>> This prepares ground for the changes coming next which add an Error**
>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>> now handle the error and fail earlier setting the migration state from
>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>
>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>> reported by .save_setup() handlers.
>>>>
>>>> Since the previous behavior was to ignore errors at this step of
>>>> migration, this change should be examined closely to check that
>>>> cleanups are still correctly done.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>> ---
>>>>
>>>>    Changes in v4:
>>>>    
>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>      
>>>>    Changes in v3:
>>>>    
>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>    - Made sure an error is always set in case of failure in
>>>>      qemu_savevm_state_setup()
>>>>      
>>>>    migration/savevm.h    |  2 +-
>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>> --- a/migration/savevm.h
>>>> +++ b/migration/savevm.h
>>>> @@ -32,7 +32,7 @@
>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>        MigThrError thr_error;
>>>>        bool urgent = false;
>>>> +    Error *local_err = NULL;
>>>> +    int ret;
>>>>    
>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>    
>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>        }
>>>>    
>>>>        bql_lock();
>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>        bql_unlock();
>>>>    
>>>> +    if (ret) {
>>>> +        migrate_set_error(s, local_err);
>>>> +        error_free(local_err);
>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>> +                          MIGRATION_STATUS_FAILED);
>>>> +        goto out;
>>>> +     }
>>>> +
>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>                                   MIGRATION_STATUS_ACTIVE);
>>>
>>> This^ should be before the new block it seems:
>>>
>>> GOOD:
>>> migrate_set_state new state setup
>>> migrate_set_state new state wait-unplug
>>> migrate_fd_cancel
>>> migrate_set_state new state cancelling
>>> migrate_fd_cleanup
>>> migrate_set_state new state cancelled
>>> migrate_fd_cancel
>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>
>>> BAD:
>>> migrate_set_state new state setup
>>> migrate_fd_cancel
>>> migrate_set_state new state cancelling
>>> migrate_fd_cleanup
>>> migrate_set_state new state cancelled
>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>> **
>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>
>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>> and bad things happens.
>>
>> This hack makes things work :
>>
>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>>        }
>>    
>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>> +                            MIGRATION_STATUS_SETUP);
>> +
> 
> Why move it all the way up here? Has moving the wait_unplug before the
> 'if (ret)' block not worked for you?

We could be sleeping while holding the BQL. It looked wrong.


Thanks,

C.


> 
>>        bql_lock();
>>        ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>        bql_unlock();
>>
>> We should fix the test instead :) Unless waiting for failover devices
>> to unplug before the save_setup handlers and not after is ok.
>>
>> commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
>> is not clear about the justification.:
>>
>>       This patch adds a new migration state called wait-unplug.  It is entered
>>       after the SETUP state if failover devices are present. It will transition
>>       into ACTIVE once all devices were succesfully unplugged from the guest.
> 
> This is not clear indeed, but to me it seems having the wait-unplug
> after setup was important.
> 
>>
>>
>>> =====
>>> PS: I guess the next level in our Freestyle Concurrency video-game is to
>>> make migrate_fd_cancel() stop setting state and poking files and only
>>> set a flag that's tested in the other parts of the code.
>>
>> Is that a new item on the TODO list?
> 
> Yep, I'll add it to the wiki.
> 



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-11 20:10         ` Peter Xu
@ 2024-03-12 13:01           ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 13:01 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas
  Cc: qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/11/24 21:10, Peter Xu wrote:
> On Mon, Mar 11, 2024 at 04:03:14PM -0300, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>>
>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>
>>>>> This prepares ground for the changes coming next which add an Error**
>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>> now handle the error and fail earlier setting the migration state from
>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>
>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>> reported by .save_setup() handlers.
>>>>>
>>>>> Since the previous behavior was to ignore errors at this step of
>>>>> migration, this change should be examined closely to check that
>>>>> cleanups are still correctly done.
>>>>>
>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>> ---
>>>>>
>>>>>    Changes in v4:
>>>>>    
>>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>>      
>>>>>    Changes in v3:
>>>>>    
>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>    - Made sure an error is always set in case of failure in
>>>>>      qemu_savevm_state_setup()
>>>>>      
>>>>>    migration/savevm.h    |  2 +-
>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>> --- a/migration/savevm.h
>>>>> +++ b/migration/savevm.h
>>>>> @@ -32,7 +32,7 @@
>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>        MigThrError thr_error;
>>>>>        bool urgent = false;
>>>>> +    Error *local_err = NULL;
>>>>> +    int ret;
>>>>>    
>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>>    
>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>        }
>>>>>    
>>>>>        bql_lock();
>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>        bql_unlock();
>>>>>    
>>>>> +    if (ret) {
>>>>> +        migrate_set_error(s, local_err);
>>>>> +        error_free(local_err);
>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>> +        goto out;
>>>>> +     }
>>>>> +
>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>                                   MIGRATION_STATUS_ACTIVE);
>>>>
>>>> This^ should be before the new block it seems:
>>>>
>>>> GOOD:
>>>> migrate_set_state new state setup
>>>> migrate_set_state new state wait-unplug
>>>> migrate_fd_cancel
>>>> migrate_set_state new state cancelling
>>>> migrate_fd_cleanup
>>>> migrate_set_state new state cancelled
>>>> migrate_fd_cancel
>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>>
>>>> BAD:
>>>> migrate_set_state new state setup
>>>> migrate_fd_cancel
>>>> migrate_set_state new state cancelling
>>>> migrate_fd_cleanup
>>>> migrate_set_state new state cancelled
>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>> **
>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>
>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>>> and bad things happens.
>>>
>>> This hack makes things work :
>>>
>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>>>        }
>>>    
>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>> +                            MIGRATION_STATUS_SETUP);
>>> +
>>
>> Why move it all the way up here? Has moving the wait_unplug before the
>> 'if (ret)' block not worked for you?
>>
>>>        bql_lock();
>>>        ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>        bql_unlock();
>>>
>>> We should fix the test instead :) Unless waiting for failover devices
>>> to unplug before the save_setup handlers and not after is ok.
>>>
>>> commit c7e0acd5a3f8 ("migration: add new migration state wait-unplug")
>>> is not clear about the justification.:
>>>
>>>       This patch adds a new migration state called wait-unplug.  It is entered
>>>       after the SETUP state if failover devices are present. It will transition
>>>       into ACTIVE once all devices were succesfully unplugged from the guest.
>>
>> This is not clear indeed, but to me it seems having the wait-unplug
>> after setup was important.
> 
> Finally got some time to read this code..
> 
> So far I didn't see an issue if it's called before the setup hooks.
> Actually it looks to me it should better do that before those hooks.
>
> IIUC what that qemu_savevm_wait_unplug() does is waiting for all the
> primary devices to be completely unplugged before moving on the migration.
> 
> Here setup() hook, or to be explicit, the primary devices' VMSDs (if ever
> existed, and if that was the concern) should have zero impact on such wait,
> because the "unplug" should also contain one step to unregister those
> vmsds; see the virtio_net_handle_migration_primary() where it has:
> 
>          if (failover_unplug_primary(n, dev)) {
>              vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
>              ...
>          }
> 
> So qemu_savevm_wait_unplug() looks like a pure wait function to me until
> all the unplug is processed by the guest OS.  And it makes some sense to me
> to avoid calling setup() (which can start to hold resources, like in RAM we
> create bitmaps etc to prepare for migration) before such possible long halts.

I think so too now. VFIO is already sending state.

> 
> In all cases, I guess it's still too rush to figure out a plan, meanwhile
> anything proposed for either test/code changes would better get some
> reviews from either Laurent or other virtio-net guys.  I think I'll go
> ahead the pull without the 2nd batch of patches.
> 
>>
>>>
>>>
>>>> =====
>>>> PS: I guess the next level in our Freestyle Concurrency video-game is to
>>>> make migrate_fd_cancel() stop setting state and poking files and only
>>>> set a flag that's tested in the other parts of the code.
>>>
>>> Is that a new item on the TODO list?
>>
>> Yep, I'll add it to the wiki.
> 
> Sounds like a good thing, however let's be aware of the evils (that are
> always in the details..), where there can be users/tests relying on that
> "CANCELLING" state, so it can be part of the ABIs.. :-(

That's a good reason to move qemu_savevm_wait_unplug() and avoid breaking
the ABI.

Thanks,

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 12:32         ` Cédric Le Goater
@ 2024-03-12 13:34           ` Cédric Le Goater
  2024-03-12 14:01             ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 13:34 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/12/24 13:32, Cédric Le Goater wrote:
> On 3/11/24 20:03, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>>
>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>
>>>>> This prepares ground for the changes coming next which add an Error**
>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>> now handle the error and fail earlier setting the migration state from
>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>
>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>> reported by .save_setup() handlers.
>>>>>
>>>>> Since the previous behavior was to ignore errors at this step of
>>>>> migration, this change should be examined closely to check that
>>>>> cleanups are still correctly done.
>>>>>
>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>> ---
>>>>>
>>>>>    Changes in v4:
>>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>>    Changes in v3:
>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>    - Made sure an error is always set in case of failure in
>>>>>      qemu_savevm_state_setup()
>>>>>    migration/savevm.h    |  2 +-
>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>> --- a/migration/savevm.h
>>>>> +++ b/migration/savevm.h
>>>>> @@ -32,7 +32,7 @@
>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>        MigThrError thr_error;
>>>>>        bool urgent = false;
>>>>> +    Error *local_err = NULL;
>>>>> +    int ret;
>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>        }
>>>>>        bql_lock();
>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>        bql_unlock();
>>>>> +    if (ret) {
>>>>> +        migrate_set_error(s, local_err);
>>>>> +        error_free(local_err);
>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>> +        goto out;
>>>>> +     }
>>>>> +
>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>                                   MIGRATION_STATUS_ACTIVE);
>>>>
>>>> This^ should be before the new block it seems:
>>>>
>>>> GOOD:
>>>> migrate_set_state new state setup
>>>> migrate_set_state new state wait-unplug
>>>> migrate_fd_cancel
>>>> migrate_set_state new state cancelling
>>>> migrate_fd_cleanup
>>>> migrate_set_state new state cancelled
>>>> migrate_fd_cancel
>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>>
>>>> BAD:
>>>> migrate_set_state new state setup
>>>> migrate_fd_cancel
>>>> migrate_set_state new state cancelling
>>>> migrate_fd_cleanup
>>>> migrate_set_state new state cancelled
>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>> **
>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>
>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>>> and bad things happens.
>>>
>>> This hack makes things work :
>>>
>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>>>        }
>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>> +                            MIGRATION_STATUS_SETUP);
>>> +
>>
>> Why move it all the way up here? Has moving the wait_unplug before the
>> 'if (ret)' block not worked for you?
> 
> We could be sleeping while holding the BQL. It looked wrong.

Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
I can reproduce easily with an x86 guest running on PPC64.

Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 13:34           ` Cédric Le Goater
@ 2024-03-12 14:01             ` Cédric Le Goater
  2024-03-12 14:24               ` Fabiano Rosas
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 14:01 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/12/24 14:34, Cédric Le Goater wrote:
> On 3/12/24 13:32, Cédric Le Goater wrote:
>> On 3/11/24 20:03, Fabiano Rosas wrote:
>>> Cédric Le Goater <clg@redhat.com> writes:
>>>
>>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>>
>>>>>> This prepares ground for the changes coming next which add an Error**
>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>>> now handle the error and fail earlier setting the migration state from
>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>>
>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>>> reported by .save_setup() handlers.
>>>>>>
>>>>>> Since the previous behavior was to ignore errors at this step of
>>>>>> migration, this change should be examined closely to check that
>>>>>> cleanups are still correctly done.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>>> ---
>>>>>>
>>>>>>    Changes in v4:
>>>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>>>    Changes in v3:
>>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>>    - Made sure an error is always set in case of failure in
>>>>>>      qemu_savevm_state_setup()
>>>>>>    migration/savevm.h    |  2 +-
>>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>>> --- a/migration/savevm.h
>>>>>> +++ b/migration/savevm.h
>>>>>> @@ -32,7 +32,7 @@
>>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>>        MigThrError thr_error;
>>>>>>        bool urgent = false;
>>>>>> +    Error *local_err = NULL;
>>>>>> +    int ret;
>>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>>        }
>>>>>>        bql_lock();
>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>>        bql_unlock();
>>>>>> +    if (ret) {
>>>>>> +        migrate_set_error(s, local_err);
>>>>>> +        error_free(local_err);
>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>>> +        goto out;
>>>>>> +     }
>>>>>> +
>>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>>                                   MIGRATION_STATUS_ACTIVE);
>>>>>
>>>>> This^ should be before the new block it seems:
>>>>>
>>>>> GOOD:
>>>>> migrate_set_state new state setup
>>>>> migrate_set_state new state wait-unplug
>>>>> migrate_fd_cancel
>>>>> migrate_set_state new state cancelling
>>>>> migrate_fd_cleanup
>>>>> migrate_set_state new state cancelled
>>>>> migrate_fd_cancel
>>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>>>
>>>>> BAD:
>>>>> migrate_set_state new state setup
>>>>> migrate_fd_cancel
>>>>> migrate_set_state new state cancelling
>>>>> migrate_fd_cleanup
>>>>> migrate_set_state new state cancelled
>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>>> **
>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>>
>>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>>>> and bad things happens.
>>>>
>>>> This hack makes things work :
>>>>
>>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>>>>        }
>>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>> +                            MIGRATION_STATUS_SETUP);
>>>> +
>>>
>>> Why move it all the way up here? Has moving the wait_unplug before the
>>> 'if (ret)' block not worked for you?
>>
>> We could be sleeping while holding the BQL. It looked wrong.
> 
> Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
> I can reproduce easily with an x86 guest running on PPC64.

That works just the same.

Peter, Fabiano,

What would you prefer  ?

1. move qemu_savevm_wait_unplug() before qemu_savevm_state_setup(),
    means one new patch.

2. leave qemu_savevm_wait_unplug() after qemu_savevm_state_setup()
    and handle state_setup() errors after waiting. means an update
    of this patch.


Thanks,

C.


  



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 14:01             ` Cédric Le Goater
@ 2024-03-12 14:24               ` Fabiano Rosas
  2024-03-12 15:18                 ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-12 14:24 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: Peter Xu, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Cédric Le Goater <clg@redhat.com> writes:

> On 3/12/24 14:34, Cédric Le Goater wrote:
>> On 3/12/24 13:32, Cédric Le Goater wrote:
>>> On 3/11/24 20:03, Fabiano Rosas wrote:
>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>
>>>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>>>
>>>>>>> This prepares ground for the changes coming next which add an Error**
>>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>>>> now handle the error and fail earlier setting the migration state from
>>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>>>
>>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>>>> reported by .save_setup() handlers.
>>>>>>>
>>>>>>> Since the previous behavior was to ignore errors at this step of
>>>>>>> migration, this change should be examined closely to check that
>>>>>>> cleanups are still correctly done.
>>>>>>>
>>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>>>> ---
>>>>>>>
>>>>>>>    Changes in v4:
>>>>>>>    - Merged cleanup change in qemu_savevm_state()
>>>>>>>    Changes in v3:
>>>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>>>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>>>    - Made sure an error is always set in case of failure in
>>>>>>>      qemu_savevm_state_setup()
>>>>>>>    migration/savevm.h    |  2 +-
>>>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>>>
>>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>>>> --- a/migration/savevm.h
>>>>>>> +++ b/migration/savevm.h
>>>>>>> @@ -32,7 +32,7 @@
>>>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>>>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>>>    int qemu_savevm_state_prepare(Error **errp);
>>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>>>> --- a/migration/migration.c
>>>>>>> +++ b/migration/migration.c
>>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>>>        MigThrError thr_error;
>>>>>>>        bool urgent = false;
>>>>>>> +    Error *local_err = NULL;
>>>>>>> +    int ret;
>>>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>>>        }
>>>>>>>        bql_lock();
>>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>>>        bql_unlock();
>>>>>>> +    if (ret) {
>>>>>>> +        migrate_set_error(s, local_err);
>>>>>>> +        error_free(local_err);
>>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>>>> +        goto out;
>>>>>>> +     }
>>>>>>> +
>>>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>>>                                   MIGRATION_STATUS_ACTIVE);
>>>>>>
>>>>>> This^ should be before the new block it seems:
>>>>>>
>>>>>> GOOD:
>>>>>> migrate_set_state new state setup
>>>>>> migrate_set_state new state wait-unplug
>>>>>> migrate_fd_cancel
>>>>>> migrate_set_state new state cancelling
>>>>>> migrate_fd_cleanup
>>>>>> migrate_set_state new state cancelled
>>>>>> migrate_fd_cancel
>>>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>>>>
>>>>>> BAD:
>>>>>> migrate_set_state new state setup
>>>>>> migrate_fd_cancel
>>>>>> migrate_set_state new state cancelling
>>>>>> migrate_fd_cleanup
>>>>>> migrate_set_state new state cancelled
>>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>>>> **
>>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>>>
>>>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>>>>> and bad things happens.
>>>>>
>>>>> This hack makes things work :
>>>>>
>>>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>>>>>        }
>>>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>> +                            MIGRATION_STATUS_SETUP);
>>>>> +
>>>>
>>>> Why move it all the way up here? Has moving the wait_unplug before the
>>>> 'if (ret)' block not worked for you?
>>>
>>> We could be sleeping while holding the BQL. It looked wrong.
>> 
>> Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
>> I can reproduce easily with an x86 guest running on PPC64.
>
> That works just the same.
>
> Peter, Fabiano,
>
> What would you prefer  ?
>
> 1. move qemu_savevm_wait_unplug() before qemu_savevm_state_setup(),
>     means one new patch.

Is there a point to this except "because we can"? Honest question, I
might have missed the motivation.

Also a couple of points:

- The current version of this proposal seems it will lose the transition
from SETUP->ACTIVE no? As in, after qemu_savevm_state_setup, there's
nothing changing the state to ACTIVE anymore.

- You also need to change the bg migration path.

>
> 2. leave qemu_savevm_wait_unplug() after qemu_savevm_state_setup()
>     and handle state_setup() errors after waiting. means an update
>     of this patch.

I vote for this. This failover feature is a pretty complex one, let's
not risk changing the behavior for no good reason. Just look at the
amount of head-banging going on in these threads:

https://patchwork.ozlabs.org/project/qemu-devel/cover/20181025140631.634922-1-sameeh@daynix.com/
https://www.mail-archive.com/qemu-devel@nongnu.org/msg609296.html

>
>
> Thanks,
>
> C.
>
>
>   


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 14:24               ` Fabiano Rosas
@ 2024-03-12 15:18                 ` Peter Xu
  2024-03-12 18:06                   ` Cédric Le Goater
  2024-03-12 18:28                   ` Fabiano Rosas
  0 siblings, 2 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-12 15:18 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Cédric Le Goater, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Tue, Mar 12, 2024 at 11:24:39AM -0300, Fabiano Rosas wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
> > On 3/12/24 14:34, Cédric Le Goater wrote:
> >> On 3/12/24 13:32, Cédric Le Goater wrote:
> >>> On 3/11/24 20:03, Fabiano Rosas wrote:
> >>>> Cédric Le Goater <clg@redhat.com> writes:
> >>>>
> >>>>> On 3/8/24 15:36, Fabiano Rosas wrote:
> >>>>>> Cédric Le Goater <clg@redhat.com> writes:
> >>>>>>
> >>>>>>> This prepares ground for the changes coming next which add an Error**
> >>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> >>>>>>> now handle the error and fail earlier setting the migration state from
> >>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> >>>>>>>
> >>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
> >>>>>>> reported by .save_setup() handlers.
> >>>>>>>
> >>>>>>> Since the previous behavior was to ignore errors at this step of
> >>>>>>> migration, this change should be examined closely to check that
> >>>>>>> cleanups are still correctly done.
> >>>>>>>
> >>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> >>>>>>> ---
> >>>>>>>
> >>>>>>>    Changes in v4:
> >>>>>>>    - Merged cleanup change in qemu_savevm_state()
> >>>>>>>    Changes in v3:
> >>>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
> >>>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
> >>>>>>>    - Made sure an error is always set in case of failure in
> >>>>>>>      qemu_savevm_state_setup()
> >>>>>>>    migration/savevm.h    |  2 +-
> >>>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
> >>>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
> >>>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
> >>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> >>>>>>> --- a/migration/savevm.h
> >>>>>>> +++ b/migration/savevm.h
> >>>>>>> @@ -32,7 +32,7 @@
> >>>>>>>    bool qemu_savevm_state_blocked(Error **errp);
> >>>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
> >>>>>>>    int qemu_savevm_state_prepare(Error **errp);
> >>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
> >>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
> >>>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
> >>>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
> >>>>>>>    void qemu_savevm_state_header(QEMUFile *f);
> >>>>>>> diff --git a/migration/migration.c b/migration/migration.c
> >>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> >>>>>>> --- a/migration/migration.c
> >>>>>>> +++ b/migration/migration.c
> >>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
> >>>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >>>>>>>        MigThrError thr_error;
> >>>>>>>        bool urgent = false;
> >>>>>>> +    Error *local_err = NULL;
> >>>>>>> +    int ret;
> >>>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
> >>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
> >>>>>>>        }
> >>>>>>>        bql_lock();
> >>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
> >>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> >>>>>>>        bql_unlock();
> >>>>>>> +    if (ret) {
> >>>>>>> +        migrate_set_error(s, local_err);
> >>>>>>> +        error_free(local_err);
> >>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> >>>>>>> +                          MIGRATION_STATUS_FAILED);
> >>>>>>> +        goto out;
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> >>>>>>>                                   MIGRATION_STATUS_ACTIVE);
> >>>>>>
> >>>>>> This^ should be before the new block it seems:
> >>>>>>
> >>>>>> GOOD:
> >>>>>> migrate_set_state new state setup
> >>>>>> migrate_set_state new state wait-unplug
> >>>>>> migrate_fd_cancel
> >>>>>> migrate_set_state new state cancelling
> >>>>>> migrate_fd_cleanup
> >>>>>> migrate_set_state new state cancelled
> >>>>>> migrate_fd_cancel
> >>>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
> >>>>>>
> >>>>>> BAD:
> >>>>>> migrate_set_state new state setup
> >>>>>> migrate_fd_cancel
> >>>>>> migrate_set_state new state cancelling
> >>>>>> migrate_fd_cleanup
> >>>>>> migrate_set_state new state cancelled
> >>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
> >>>>>> **
> >>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
> >>>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> >>>>>>
> >>>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
> >>>>>> that will run concurrently with migrate_fd_cancel() issued by the test
> >>>>>> and bad things happens.
> >>>>>
> >>>>> This hack makes things work :
> >>>>>
> >>>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
> >>>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
> >>>>>        }
> >>>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> >>>>> +                            MIGRATION_STATUS_SETUP);
> >>>>> +
> >>>>
> >>>> Why move it all the way up here? Has moving the wait_unplug before the
> >>>> 'if (ret)' block not worked for you?
> >>>
> >>> We could be sleeping while holding the BQL. It looked wrong.
> >> 
> >> Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
> >> I can reproduce easily with an x86 guest running on PPC64.
> >
> > That works just the same.
> >
> > Peter, Fabiano,
> >
> > What would you prefer  ?
> >
> > 1. move qemu_savevm_wait_unplug() before qemu_savevm_state_setup(),
> >     means one new patch.
> 
> Is there a point to this except "because we can"? Honest question, I
> might have missed the motivation.

My previous point was, it avoids holding the resources (that will be
allocated in setup() routines) while we know we can wait for a long time.

But then I found that the ordering is indeed needed at least if we don't
change migrate_set_state() first - it is the only place we set the status
to START (which I overlooked, sorry)...

IMHO the function is not well designed; the state update of the next stage
should not reside in a function to wait for failover primary devices
conditionally. It's a bit of a mess.

> 
> Also a couple of points:
> 
> - The current version of this proposal seems it will lose the transition
> from SETUP->ACTIVE no? As in, after qemu_savevm_state_setup, there's
> nothing changing the state to ACTIVE anymore.
> 
> - You also need to change the bg migration path.
> 
> >
> > 2. leave qemu_savevm_wait_unplug() after qemu_savevm_state_setup()
> >     and handle state_setup() errors after waiting. means an update
> >     of this patch.
> 
> I vote for this. This failover feature is a pretty complex one, let's
> not risk changing the behavior for no good reason. Just look at the
> amount of head-banging going on in these threads:
> 
> https://patchwork.ozlabs.org/project/qemu-devel/cover/20181025140631.634922-1-sameeh@daynix.com/
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg609296.html

Do we know who is consuming this feature?

Now VFIO allows a migration to happen without this trick.  I'm wondering
whether all relevant NICs can also support VFIO migrations in the future,
then we can drop this tricky feature for good.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 15:18                 ` Peter Xu
@ 2024-03-12 18:06                   ` Cédric Le Goater
  2024-03-12 18:28                   ` Fabiano Rosas
  1 sibling, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-12 18:06 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas
  Cc: qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

[ ...]

> Now VFIO allows a migration to happen without this trick.  I'm wondering
> whether all relevant NICs can also support VFIO migrations in the future,
> then we can drop this tricky feature for good.

Currently, VFIO migration requires a VFIO (PCI) variant driver implementing
the specific ops for migration. Only a few NICs are supported. It's growing
though and we should expect more in the future, specially entreprise grade
NICs.

Thanks,

C.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 15:18                 ` Peter Xu
  2024-03-12 18:06                   ` Cédric Le Goater
@ 2024-03-12 18:28                   ` Fabiano Rosas
  2024-03-15 10:17                     ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Fabiano Rosas @ 2024-03-12 18:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Cédric Le Goater, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

Peter Xu <peterx@redhat.com> writes:

> On Tue, Mar 12, 2024 at 11:24:39AM -0300, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>> > On 3/12/24 14:34, Cédric Le Goater wrote:
>> >> On 3/12/24 13:32, Cédric Le Goater wrote:
>> >>> On 3/11/24 20:03, Fabiano Rosas wrote:
>> >>>> Cédric Le Goater <clg@redhat.com> writes:
>> >>>>
>> >>>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>> >>>>>> Cédric Le Goater <clg@redhat.com> writes:
>> >>>>>>
>> >>>>>>> This prepares ground for the changes coming next which add an Error**
>> >>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>> >>>>>>> now handle the error and fail earlier setting the migration state from
>> >>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>> >>>>>>>
>> >>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>> >>>>>>> reported by .save_setup() handlers.
>> >>>>>>>
>> >>>>>>> Since the previous behavior was to ignore errors at this step of
>> >>>>>>> migration, this change should be examined closely to check that
>> >>>>>>> cleanups are still correctly done.
>> >>>>>>>
>> >>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> >>>>>>> ---
>> >>>>>>>
>> >>>>>>>    Changes in v4:
>> >>>>>>>    - Merged cleanup change in qemu_savevm_state()
>> >>>>>>>    Changes in v3:
>> >>>>>>>    - Set migration state to MIGRATION_STATUS_FAILED
>> >>>>>>>    - Fixed error handling to be done under lock in bg_migration_thread()
>> >>>>>>>    - Made sure an error is always set in case of failure in
>> >>>>>>>      qemu_savevm_state_setup()
>> >>>>>>>    migration/savevm.h    |  2 +-
>> >>>>>>>    migration/migration.c | 27 ++++++++++++++++++++++++---
>> >>>>>>>    migration/savevm.c    | 26 +++++++++++++++-----------
>> >>>>>>>    3 files changed, 40 insertions(+), 15 deletions(-)
>> >>>>>>>
>> >>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>> >>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>> >>>>>>> --- a/migration/savevm.h
>> >>>>>>> +++ b/migration/savevm.h
>> >>>>>>> @@ -32,7 +32,7 @@
>> >>>>>>>    bool qemu_savevm_state_blocked(Error **errp);
>> >>>>>>>    void qemu_savevm_non_migratable_list(strList **reasons);
>> >>>>>>>    int qemu_savevm_state_prepare(Error **errp);
>> >>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>> >>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>> >>>>>>>    bool qemu_savevm_state_guest_unplug_pending(void);
>> >>>>>>>    int qemu_savevm_state_resume_prepare(MigrationState *s);
>> >>>>>>>    void qemu_savevm_state_header(QEMUFile *f);
>> >>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>> >>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>> >>>>>>> --- a/migration/migration.c
>> >>>>>>> +++ b/migration/migration.c
>> >>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>> >>>>>>>        int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> >>>>>>>        MigThrError thr_error;
>> >>>>>>>        bool urgent = false;
>> >>>>>>> +    Error *local_err = NULL;
>> >>>>>>> +    int ret;
>> >>>>>>>        thread = migration_threads_add("live_migration", qemu_get_thread_id());
>> >>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>> >>>>>>>        }
>> >>>>>>>        bql_lock();
>> >>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>> >>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>> >>>>>>>        bql_unlock();
>> >>>>>>> +    if (ret) {
>> >>>>>>> +        migrate_set_error(s, local_err);
>> >>>>>>> +        error_free(local_err);
>> >>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>> >>>>>>> +                          MIGRATION_STATUS_FAILED);
>> >>>>>>> +        goto out;
>> >>>>>>> +     }
>> >>>>>>> +
>> >>>>>>>        qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>> >>>>>>>                                   MIGRATION_STATUS_ACTIVE);
>> >>>>>>
>> >>>>>> This^ should be before the new block it seems:
>> >>>>>>
>> >>>>>> GOOD:
>> >>>>>> migrate_set_state new state setup
>> >>>>>> migrate_set_state new state wait-unplug
>> >>>>>> migrate_fd_cancel
>> >>>>>> migrate_set_state new state cancelling
>> >>>>>> migrate_fd_cleanup
>> >>>>>> migrate_set_state new state cancelled
>> >>>>>> migrate_fd_cancel
>> >>>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>> >>>>>>
>> >>>>>> BAD:
>> >>>>>> migrate_set_state new state setup
>> >>>>>> migrate_fd_cancel
>> >>>>>> migrate_set_state new state cancelling
>> >>>>>> migrate_fd_cleanup
>> >>>>>> migrate_set_state new state cancelled
>> >>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>> >>>>>> **
>> >>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>> >>>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>> >>>>>>
>> >>>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>> >>>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>> >>>>>> and bad things happens.
>> >>>>>
>> >>>>> This hack makes things work :
>> >>>>>
>> >>>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>> >>>>>            qemu_savevm_send_colo_enable(s->to_dst_file);
>> >>>>>        }
>> >>>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>> >>>>> +                            MIGRATION_STATUS_SETUP);
>> >>>>> +
>> >>>>
>> >>>> Why move it all the way up here? Has moving the wait_unplug before the
>> >>>> 'if (ret)' block not worked for you?
>> >>>
>> >>> We could be sleeping while holding the BQL. It looked wrong.
>> >> 
>> >> Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
>> >> I can reproduce easily with an x86 guest running on PPC64.
>> >
>> > That works just the same.
>> >
>> > Peter, Fabiano,
>> >
>> > What would you prefer  ?
>> >
>> > 1. move qemu_savevm_wait_unplug() before qemu_savevm_state_setup(),
>> >     means one new patch.
>> 
>> Is there a point to this except "because we can"? Honest question, I
>> might have missed the motivation.
>
> My previous point was, it avoids holding the resources (that will be
> allocated in setup() routines) while we know we can wait for a long time.
>
> But then I found that the ordering is indeed needed at least if we don't
> change migrate_set_state() first - it is the only place we set the status
> to START (which I overlooked, sorry)...
>
> IMHO the function is not well designed; the state update of the next stage
> should not reside in a function to wait for failover primary devices
> conditionally. It's a bit of a mess.
>

I agree. We can clean that up in 9.1.

migrate_set_state is also unintuitive because it ignores invalid state
transitions and we've been using that property to deal with special
states such as POSTCOPY_PAUSED and FAILED:

- After the migration goes into POSTCOPY_PAUSED, the resumed migration's
  migrate_init() will try to set the state NONE->SETUP, which is not
  valid.

- After save_setup fails, the migration goes into FAILED, but wait_unplug
  will try to transition SETUP->ACTIVE, which is also not valid.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-12 18:28                   ` Fabiano Rosas
@ 2024-03-15 10:17                     ` Cédric Le Goater
  2024-03-15 11:01                       ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-15 10:17 UTC (permalink / raw)
  To: Fabiano Rosas, Peter Xu
  Cc: qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/12/24 19:28, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
>> On Tue, Mar 12, 2024 at 11:24:39AM -0300, Fabiano Rosas wrote:
>>> Cédric Le Goater <clg@redhat.com> writes:
>>>
>>>> On 3/12/24 14:34, Cédric Le Goater wrote:
>>>>> On 3/12/24 13:32, Cédric Le Goater wrote:
>>>>>> On 3/11/24 20:03, Fabiano Rosas wrote:
>>>>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>>>>
>>>>>>>> On 3/8/24 15:36, Fabiano Rosas wrote:
>>>>>>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>>>>>>
>>>>>>>>>> This prepares ground for the changes coming next which add an Error**
>>>>>>>>>> argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
>>>>>>>>>> now handle the error and fail earlier setting the migration state from
>>>>>>>>>> MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
>>>>>>>>>>
>>>>>>>>>> In qemu_savevm_state(), move the cleanup to preserve the error
>>>>>>>>>> reported by .save_setup() handlers.
>>>>>>>>>>
>>>>>>>>>> Since the previous behavior was to ignore errors at this step of
>>>>>>>>>> migration, this change should be examined closely to check that
>>>>>>>>>> cleanups are still correctly done.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>     Changes in v4:
>>>>>>>>>>     - Merged cleanup change in qemu_savevm_state()
>>>>>>>>>>     Changes in v3:
>>>>>>>>>>     - Set migration state to MIGRATION_STATUS_FAILED
>>>>>>>>>>     - Fixed error handling to be done under lock in bg_migration_thread()
>>>>>>>>>>     - Made sure an error is always set in case of failure in
>>>>>>>>>>       qemu_savevm_state_setup()
>>>>>>>>>>     migration/savevm.h    |  2 +-
>>>>>>>>>>     migration/migration.c | 27 ++++++++++++++++++++++++---
>>>>>>>>>>     migration/savevm.c    | 26 +++++++++++++++-----------
>>>>>>>>>>     3 files changed, 40 insertions(+), 15 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/migration/savevm.h b/migration/savevm.h
>>>>>>>>>> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
>>>>>>>>>> --- a/migration/savevm.h
>>>>>>>>>> +++ b/migration/savevm.h
>>>>>>>>>> @@ -32,7 +32,7 @@
>>>>>>>>>>     bool qemu_savevm_state_blocked(Error **errp);
>>>>>>>>>>     void qemu_savevm_non_migratable_list(strList **reasons);
>>>>>>>>>>     int qemu_savevm_state_prepare(Error **errp);
>>>>>>>>>> -void qemu_savevm_state_setup(QEMUFile *f);
>>>>>>>>>> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>>>>>>>>>>     bool qemu_savevm_state_guest_unplug_pending(void);
>>>>>>>>>>     int qemu_savevm_state_resume_prepare(MigrationState *s);
>>>>>>>>>>     void qemu_savevm_state_header(QEMUFile *f);
>>>>>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>>>>>> index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
>>>>>>>>>> --- a/migration/migration.c
>>>>>>>>>> +++ b/migration/migration.c
>>>>>>>>>> @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
>>>>>>>>>>         int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>>>>>>>         MigThrError thr_error;
>>>>>>>>>>         bool urgent = false;
>>>>>>>>>> +    Error *local_err = NULL;
>>>>>>>>>> +    int ret;
>>>>>>>>>>         thread = migration_threads_add("live_migration", qemu_get_thread_id());
>>>>>>>>>> @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
>>>>>>>>>>         }
>>>>>>>>>>         bql_lock();
>>>>>>>>>> -    qemu_savevm_state_setup(s->to_dst_file);
>>>>>>>>>> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>>>>>>>>>>         bql_unlock();
>>>>>>>>>> +    if (ret) {
>>>>>>>>>> +        migrate_set_error(s, local_err);
>>>>>>>>>> +        error_free(local_err);
>>>>>>>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>>>>>>> +                          MIGRATION_STATUS_FAILED);
>>>>>>>>>> +        goto out;
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>>         qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>>>>>>                                    MIGRATION_STATUS_ACTIVE);
>>>>>>>>>
>>>>>>>>> This^ should be before the new block it seems:
>>>>>>>>>
>>>>>>>>> GOOD:
>>>>>>>>> migrate_set_state new state setup
>>>>>>>>> migrate_set_state new state wait-unplug
>>>>>>>>> migrate_fd_cancel
>>>>>>>>> migrate_set_state new state cancelling
>>>>>>>>> migrate_fd_cleanup
>>>>>>>>> migrate_set_state new state cancelled
>>>>>>>>> migrate_fd_cancel
>>>>>>>>> ok 1 /x86_64/failover-virtio-net/migrate/abort/wait-unplug
>>>>>>>>>
>>>>>>>>> BAD:
>>>>>>>>> migrate_set_state new state setup
>>>>>>>>> migrate_fd_cancel
>>>>>>>>> migrate_set_state new state cancelling
>>>>>>>>> migrate_fd_cleanup
>>>>>>>>> migrate_set_state new state cancelled
>>>>>>>>> qemu-system-x86_64: ram_save_setup failed: Input/output error
>>>>>>>>> **
>>>>>>>>> ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug:
>>>>>>>>> assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
>>>>>>>>>
>>>>>>>>> Otherwise migration_iteration_finish() will schedule the cleanup BH and
>>>>>>>>> that will run concurrently with migrate_fd_cancel() issued by the test
>>>>>>>>> and bad things happens.
>>>>>>>>
>>>>>>>> This hack makes things work :
>>>>>>>>
>>>>>>>> @@ -3452,6 +3452,9 @@ static void *migration_thread(void *opaq
>>>>>>>>             qemu_savevm_send_colo_enable(s->to_dst_file);
>>>>>>>>         }
>>>>>>>> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>>>>>>>> +                            MIGRATION_STATUS_SETUP);
>>>>>>>> +
>>>>>>>
>>>>>>> Why move it all the way up here? Has moving the wait_unplug before the
>>>>>>> 'if (ret)' block not worked for you?
>>>>>>
>>>>>> We could be sleeping while holding the BQL. It looked wrong.
>>>>>
>>>>> Sorry wrong answer. Yes I can try moving it before the 'if (ret)' block.
>>>>> I can reproduce easily with an x86 guest running on PPC64.
>>>>
>>>> That works just the same.
>>>>
>>>> Peter, Fabiano,
>>>>
>>>> What would you prefer  ?
>>>>
>>>> 1. move qemu_savevm_wait_unplug() before qemu_savevm_state_setup(),
>>>>      means one new patch.
>>>
>>> Is there a point to this except "because we can"? Honest question, I
>>> might have missed the motivation.
>>
>> My previous point was, it avoids holding the resources (that will be
>> allocated in setup() routines) while we know we can wait for a long time.
>>
>> But then I found that the ordering is indeed needed at least if we don't
>> change migrate_set_state() first - it is the only place we set the status
>> to START (which I overlooked, sorry)...
>>
>> IMHO the function is not well designed; the state update of the next stage
>> should not reside in a function to wait for failover primary devices
>> conditionally. It's a bit of a mess.
>>
> 
> I agree. We can clean that up in 9.1.
> 
> migrate_set_state is also unintuitive because it ignores invalid state
> transitions and we've been using that property to deal with special
> states such as POSTCOPY_PAUSED and FAILED:
> 
> - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
>    migrate_init() will try to set the state NONE->SETUP, which is not
>    valid.
> 
> - After save_setup fails, the migration goes into FAILED, but wait_unplug
>    will try to transition SETUP->ACTIVE, which is also not valid.
> 

I am not sure I understand what the plan is. Both solutions are problematic
regarding the state transitions.

Should we consider that waiting for failover devices to unplug is an internal
step of the SETUP phase not transitioning to ACTIVE ?

Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 10:17                     ` Cédric Le Goater
@ 2024-03-15 11:01                       ` Peter Xu
  2024-03-15 12:20                         ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-15 11:01 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
> > migrate_set_state is also unintuitive because it ignores invalid state
> > transitions and we've been using that property to deal with special
> > states such as POSTCOPY_PAUSED and FAILED:
> > 
> > - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
> >    migrate_init() will try to set the state NONE->SETUP, which is not
> >    valid.
> > 
> > - After save_setup fails, the migration goes into FAILED, but wait_unplug
> >    will try to transition SETUP->ACTIVE, which is also not valid.
> > 
> 
> I am not sure I understand what the plan is. Both solutions are problematic
> regarding the state transitions.
> 
> Should we consider that waiting for failover devices to unplug is an internal
> step of the SETUP phase not transitioning to ACTIVE ?

If to unblock this series, IIUC the simplest solution is to do what Fabiano
suggested, that we move qemu_savevm_wait_unplug() to be before the check of
setup() ret.  In that case, the state change in qemu_savevm_wait_unplug()
should be benign and we should see a super small window it became ACTIVE
but then it should be FAILED (and IIUC the patch itself will need to use
ACTIVE as "old_state", not SETUP anymore).

For the long term, maybe we can remove the WAIT_UNPLUG state?  The only
Libvirt support seems to be here:

commit 8a226ddb3602586a2ba2359afc4448c02f566a0e
Author: Laine Stump <laine@redhat.com>
Date:   Wed Jan 15 16:38:57 2020 -0500

    qemu: add wait-unplug to qemu migration status enum

Considering that qemu_savevm_wait_unplug() can be a noop if the device is
already unplugged, I think it means no upper layer app should rely on this
state to present.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler
  2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
@ 2024-03-15 11:18   ` Peter Xu
  2024-03-18 14:33     ` Cédric Le Goater
  2024-03-18 14:54     ` Cédric Le Goater
  0 siblings, 2 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-15 11:18 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

On Wed, Mar 06, 2024 at 02:34:28PM +0100, Cédric Le Goater wrote:
> diff --git a/system/memory.c b/system/memory.c
> index a229a79988fce2aa3cb77e3a130db4c694e8cd49..3600e716149407c10a1f6bf8f0a81c2611cf15ba 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2914,9 +2914,27 @@ static unsigned int postponed_stop_flags;
>  static VMChangeStateEntry *vmstate_change;
>  static void memory_global_dirty_log_stop_postponed_run(void);
>  
> +/*
> + * Stop dirty logging on all listeners where it was previously enabled.
> + */
> +static void memory_global_dirty_log_rollback(MemoryListener *listener,
> +                                             unsigned int flags)
> +{
> +    global_dirty_tracking &= ~flags;

Having a hook rollback function to touch the global_dirty_tracking flag is
IMHO tricky.

Can we instead provide a helper to call all log_global_start() hooks, but
allow a gracefully fail (so rollback will be called if it fails)?

  bool memory_global_dirty_log_start_hooks(...)

Or any better names..  Leaving global_dirty_tracking rollback to
memory_global_dirty_log_start() when it returns false.

Would this be cleaner?

> +    trace_global_dirty_changed(global_dirty_tracking);
> +
> +    while (listener) {
> +        if (listener->log_global_stop) {
> +            listener->log_global_stop(listener);
> +        }
> +        listener = QTAILQ_PREV(listener, link);
> +    }
> +}
> +
>  void memory_global_dirty_log_start(unsigned int flags)
>  {
>      unsigned int old_flags;
> +    Error *local_err = NULL;
>  
>      assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
>  
> @@ -2936,7 +2954,25 @@ void memory_global_dirty_log_start(unsigned int flags)
>      trace_global_dirty_changed(global_dirty_tracking);
>  
>      if (!old_flags) {
> -        MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
> +        MemoryListener *listener;
> +        bool ret = true;
> +
> +        QTAILQ_FOREACH(listener, &memory_listeners, link) {
> +            if (listener->log_global_start) {
> +                ret = listener->log_global_start(listener, &local_err);
> +                if (!ret) {
> +                    break;
> +                }
> +            }
> +        }
> +
> +        if (!ret) {
> +            memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
> +                                             flags);
> +            error_report_err(local_err);
> +            return;
> +        }
> +
>          memory_region_transaction_begin();
>          memory_region_update_pending = true;
>          memory_region_transaction_commit();
> @@ -3009,13 +3045,16 @@ static void listener_add_address_space(MemoryListener *listener,
>  {
>      FlatView *view;
>      FlatRange *fr;
> +    Error *local_err = NULL;
>  
>      if (listener->begin) {
>          listener->begin(listener);
>      }
>      if (global_dirty_tracking) {
>          if (listener->log_global_start) {
> -            listener->log_global_start(listener);
> +            if (!listener->log_global_start(listener, &local_err)) {
> +                error_report_err(local_err);
> +            }

IMHO we should assert here instead of error report.  We have this to guard
hot-plug during migration so I think the assert is justified:

qdev_device_add_from_qdict():

    if (!migration_is_idle()) {
        error_setg(errp, "device_add not allowed while migrating");
        return NULL;
    }

If it really happens it's a bug, as listener_add_address_space() will still
keep the rest things around even if the hook failed.  It'll start to be a
total mess..

Thanks,

>          }
>      }
>  
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
@ 2024-03-15 11:34   ` Peter Xu
  2024-03-18 10:43     ` Cédric Le Goater
                       ` (2 more replies)
  2024-03-16  2:41   ` Yong Huang
  1 sibling, 3 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-15 11:34 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand, Hyman Huang

On Wed, Mar 06, 2024 at 02:34:29PM +0100, Cédric Le Goater wrote:
> Now that the log_global*() handlers take an Error** parameter and
> return a bool, do the same for memory_global_dirty_log_start() and
> memory_global_dirty_log_stop(). The error is reported in the callers
> for now and it will be propagated in the call stack in the next
> changes.
> 
> To be noted a functional change in ram_init_bitmaps(), if the dirty
> pages logger fails to start, there is no need to synchronize the dirty
> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> similar way.
> 
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hyman Huang <yong.huang@smartx.com>
> Reviewed-by: Hyman Huang <yong.huang@smartx.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> 
>  Changes in v4:
> 
>  - Dropped log_global_stop() and log_global_sync() changes
>  
>  include/exec/memory.h |  5 ++++-
>  hw/i386/xen/xen-hvm.c |  2 +-
>  migration/dirtyrate.c | 13 +++++++++++--
>  migration/ram.c       | 22 ++++++++++++++++++++--
>  system/memory.c       | 11 +++++------
>  5 files changed, 41 insertions(+), 12 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2570,8 +2570,11 @@ void memory_listener_unregister(MemoryListener *listener);
>   * memory_global_dirty_log_start: begin dirty logging for all regions
>   *
>   * @flags: purpose of starting dirty log, migration or dirty rate
> + * @errp: pointer to Error*, to store an error if it happens.
> + *
> + * Return: true on success, else false setting @errp with error.
>   */
> -void memory_global_dirty_log_start(unsigned int flags);
> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
>  
>  /**
>   * memory_global_dirty_log_stop: end dirty logging for all regions
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index 0608ca99f5166fd6379ee674442484e805eff9c0..57cb7df50788a6c31eff68c95e8eaa856fdebede 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -654,7 +654,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
>  void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
>  {
>      if (enable) {
> -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
>      } else {
>          memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
>      }
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -90,9 +90,15 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
>  
>  void global_dirty_log_change(unsigned int flag, bool start)
>  {
> +    Error *local_err = NULL;
> +    bool ret;
> +
>      bql_lock();
>      if (start) {
> -        memory_global_dirty_log_start(flag);
> +        ret = memory_global_dirty_log_start(flag, &local_err);
> +        if (!ret) {
> +            error_report_err(local_err);
> +        }
>      } else {
>          memory_global_dirty_log_stop(flag);
>      }
> @@ -608,9 +614,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
>  {
>      int64_t start_time;
>      DirtyPageRecord dirty_pages;
> +    Error *local_err = NULL;
>  
>      bql_lock();
> -    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
> +    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE, &local_err)) {
> +        error_report_err(local_err);
> +    }
>  
>      /*
>       * 1'round of log sync may return all 1 bits with
> diff --git a/migration/ram.c b/migration/ram.c
> index c5149b7d717aefad7f590422af0ea4a40e7507be..397b4c0f218a66d194e44f9c5f9fe8e9885c48b6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
>  
>  static void ram_init_bitmaps(RAMState *rs)
>  {
> +    Error *local_err = NULL;
> +    bool ret = true;
> +
>      qemu_mutex_lock_ramlist();
>  
>      WITH_RCU_READ_LOCK_GUARD() {
>          ram_list_init_bitmaps();
>          /* We don't use dirty log with background snapshots */
>          if (!migrate_background_snapshot()) {
> -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
> +                                                &local_err);
> +            if (!ret) {
> +                error_report_err(local_err);
> +                goto out_unlock;

Here we may need to free the bitmaps created in ram_list_init_bitmaps().

We can have a helper ram_bitmaps_destroy() for that.

One thing be careful is the new file_bmap can be created but missing in the
ram_save_cleanup(), it's because it's freed earlier.  IMHO if we will have
a new ram_bitmaps_destroy() we can unconditionally free file_bmap there
too, as if it's freed early g_free() is noop.

> +            }
>              migration_bitmap_sync_precopy(rs, false);
>          }
>      }
> +out_unlock:
>      qemu_mutex_unlock_ramlist();
>  
> +    if (!ret) {
> +        return;
> +    }
> +
>      /*
>       * After an eventual first bitmap sync, fixup the initial bitmap
>       * containing all 1s to exclude any discarded pages from migration.
> @@ -3631,6 +3644,8 @@ int colo_init_ram_cache(void)
>  void colo_incoming_start_dirty_log(void)
>  {
>      RAMBlock *block = NULL;
> +    Error *local_err = NULL;
> +
>      /* For memory_global_dirty_log_start below. */
>      bql_lock();
>      qemu_mutex_lock_ramlist();
> @@ -3642,7 +3657,10 @@ void colo_incoming_start_dirty_log(void)
>              /* Discard this dirty bitmap record */
>              bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
>          }
> -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +        if (!memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
> +                                           &local_err)) {
> +            error_report_err(local_err);
> +        }
>      }
>      ram_state->migration_dirty_pages = 0;
>      qemu_mutex_unlock_ramlist();
> diff --git a/system/memory.c b/system/memory.c
> index 3600e716149407c10a1f6bf8f0a81c2611cf15ba..cbc098216b789f50460f1d1bc7ec122030693d9e 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2931,10 +2931,9 @@ static void memory_global_dirty_log_rollback(MemoryListener *listener,
>      }
>  }
>  
> -void memory_global_dirty_log_start(unsigned int flags)
> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp)
>  {
>      unsigned int old_flags;
> -    Error *local_err = NULL;
>  
>      assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
>  
> @@ -2946,7 +2945,7 @@ void memory_global_dirty_log_start(unsigned int flags)
>  
>      flags &= ~global_dirty_tracking;
>      if (!flags) {
> -        return;
> +        return true;
>      }
>  
>      old_flags = global_dirty_tracking;
> @@ -2959,7 +2958,7 @@ void memory_global_dirty_log_start(unsigned int flags)
>  
>          QTAILQ_FOREACH(listener, &memory_listeners, link) {
>              if (listener->log_global_start) {
> -                ret = listener->log_global_start(listener, &local_err);
> +                ret = listener->log_global_start(listener, errp);
>                  if (!ret) {
>                      break;
>                  }
> @@ -2969,14 +2968,14 @@ void memory_global_dirty_log_start(unsigned int flags)
>          if (!ret) {
>              memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
>                                               flags);
> -            error_report_err(local_err);
> -            return;
> +            return false;
>          }
>  
>          memory_region_transaction_begin();
>          memory_region_update_pending = true;
>          memory_region_transaction_commit();
>      }
> +    return true;
>  }
>  
>  static void memory_global_dirty_log_do_stop(unsigned int flags)
> -- 
> 2.44.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 11:01                       ` Peter Xu
@ 2024-03-15 12:20                         ` Cédric Le Goater
  2024-03-15 13:09                           ` Peter Xu
                                             ` (2 more replies)
  0 siblings, 3 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-15 12:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/15/24 12:01, Peter Xu wrote:
> On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
>>> migrate_set_state is also unintuitive because it ignores invalid state
>>> transitions and we've been using that property to deal with special
>>> states such as POSTCOPY_PAUSED and FAILED:
>>>
>>> - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
>>>     migrate_init() will try to set the state NONE->SETUP, which is not
>>>     valid.
>>>
>>> - After save_setup fails, the migration goes into FAILED, but wait_unplug
>>>     will try to transition SETUP->ACTIVE, which is also not valid.
>>>
>>
>> I am not sure I understand what the plan is. Both solutions are problematic
>> regarding the state transitions.
>>
>> Should we consider that waiting for failover devices to unplug is an internal
>> step of the SETUP phase not transitioning to ACTIVE ?
> 
> If to unblock this series, IIUC the simplest solution is to do what Fabiano
> suggested, that we move qemu_savevm_wait_unplug() to be before the check of
> setup() ret. 

The simplest is IMHO moving qemu_savevm_wait_unplug() before
qemu_savevm_state_setup() and leave patch 10 is unchanged. See
below the extra patch. It looks much cleaner than what we have
today.

> In that case, the state change in qemu_savevm_wait_unplug()
> should be benign and we should see a super small window it became ACTIVE
> but then it should be FAILED (and IIUC the patch itself will need to use
> ACTIVE as "old_state", not SETUP anymore).

OK. I will give it a try to compare.

> For the long term, maybe we can remove the WAIT_UNPLUG state?  

I hope so, it's an internal SETUP state for me.

> The only Libvirt support seems to be here:
> 
> commit 8a226ddb3602586a2ba2359afc4448c02f566a0e
> Author: Laine Stump <laine@redhat.com>
> Date:   Wed Jan 15 16:38:57 2020 -0500
> 
>      qemu: add wait-unplug to qemu migration status enum
> 
> Considering that qemu_savevm_wait_unplug() can be a noop if the device is
> already unplugged, I think it means no upper layer app should rely on this
> state to present.

Thanks,

C.


> 
@@ -3383,11 +3383,10 @@ bool migration_rate_limit(void)
   * unplugged
   */
  
-static void qemu_savevm_wait_unplug(MigrationState *s, int old_state,
-                                    int new_state)
+static void qemu_savevm_wait_unplug(MigrationState *s, int state)
  {
      if (qemu_savevm_state_guest_unplug_pending()) {
-        migrate_set_state(&s->state, old_state, MIGRATION_STATUS_WAIT_UNPLUG);
+        migrate_set_state(&s->state, state, MIGRATION_STATUS_WAIT_UNPLUG);
  
          while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
                 qemu_savevm_state_guest_unplug_pending()) {
@@ -3410,9 +3409,7 @@ static void qemu_savevm_wait_unplug(Migr
              }
          }
  
-        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state);
-    } else {
-        migrate_set_state(&s->state, old_state, new_state);
+        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, state);
      }
  }
  
@@ -3469,17 +3466,19 @@ static void *migration_thread(void *opaq
          qemu_savevm_send_colo_enable(s->to_dst_file);
      }
  
+    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP);
+
      bql_lock();
      qemu_savevm_state_setup(s->to_dst_file);
      bql_unlock();
  
-    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
-                               MIGRATION_STATUS_ACTIVE);
-
      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
  
      trace_migration_thread_setup_complete();
  
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_ACTIVE);
+
      while (migration_is_active()) {
          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
              MigIterateState iter_state = migration_iteration_run(s);
@@ -3580,18 +3579,20 @@ static void *bg_migration_thread(void *o
      ram_write_tracking_prepare();
  #endif
  
+    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP);
+
      bql_lock();
      qemu_savevm_state_header(s->to_dst_file);
      qemu_savevm_state_setup(s->to_dst_file);
      bql_unlock();
  
-    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
-                               MIGRATION_STATUS_ACTIVE);
-
      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
  
      trace_migration_thread_setup_complete();
  
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_ACTIVE);
+
      bql_lock();
  
      if (migration_stop_vm(s, RUN_STATE_PAUSED)) {



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 12:20                         ` Cédric Le Goater
@ 2024-03-15 13:09                           ` Peter Xu
  2024-03-15 14:30                             ` Cédric Le Goater
  2024-03-15 13:11                           ` Peter Xu
  2024-03-15 14:21                           ` Cédric Le Goater
  2 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-15 13:09 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 15, 2024 at 01:20:49PM +0100, Cédric Le Goater wrote:
> On 3/15/24 12:01, Peter Xu wrote:
> > On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
> > > > migrate_set_state is also unintuitive because it ignores invalid state
> > > > transitions and we've been using that property to deal with special
> > > > states such as POSTCOPY_PAUSED and FAILED:
> > > > 
> > > > - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
> > > >     migrate_init() will try to set the state NONE->SETUP, which is not
> > > >     valid.
> > > > 
> > > > - After save_setup fails, the migration goes into FAILED, but wait_unplug
> > > >     will try to transition SETUP->ACTIVE, which is also not valid.
> > > > 
> > > 
> > > I am not sure I understand what the plan is. Both solutions are problematic
> > > regarding the state transitions.
> > > 
> > > Should we consider that waiting for failover devices to unplug is an internal
> > > step of the SETUP phase not transitioning to ACTIVE ?
> > 
> > If to unblock this series, IIUC the simplest solution is to do what Fabiano
> > suggested, that we move qemu_savevm_wait_unplug() to be before the check of
> > setup() ret.
> 
> The simplest is IMHO moving qemu_savevm_wait_unplug() before
> qemu_savevm_state_setup() and leave patch 10 is unchanged. See
> below the extra patch. It looks much cleaner than what we have
> today.

Yes it looks cleaner indeed, it's just that then we'll have one more
possible state conversions like SETUP->UNPLUG->SETUP.  I'd say it's fine,
but let's also copy Laruent and Laine if it's going to be posted formally.

Thanks,

> 
> > In that case, the state change in qemu_savevm_wait_unplug()
> > should be benign and we should see a super small window it became ACTIVE
> > but then it should be FAILED (and IIUC the patch itself will need to use
> > ACTIVE as "old_state", not SETUP anymore).
> 
> OK. I will give it a try to compare.
> 
> > For the long term, maybe we can remove the WAIT_UNPLUG state?
> 
> I hope so, it's an internal SETUP state for me.
> 
> > The only Libvirt support seems to be here:
> > 
> > commit 8a226ddb3602586a2ba2359afc4448c02f566a0e
> > Author: Laine Stump <laine@redhat.com>
> > Date:   Wed Jan 15 16:38:57 2020 -0500
> > 
> >      qemu: add wait-unplug to qemu migration status enum
> > 
> > Considering that qemu_savevm_wait_unplug() can be a noop if the device is
> > already unplugged, I think it means no upper layer app should rely on this
> > state to present.
> 
> Thanks,
> 
> C.
> 
> 
> > 
> @@ -3383,11 +3383,10 @@ bool migration_rate_limit(void)
>   * unplugged
>   */
> -static void qemu_savevm_wait_unplug(MigrationState *s, int old_state,
> -                                    int new_state)
> +static void qemu_savevm_wait_unplug(MigrationState *s, int state)
>  {
>      if (qemu_savevm_state_guest_unplug_pending()) {
> -        migrate_set_state(&s->state, old_state, MIGRATION_STATUS_WAIT_UNPLUG);
> +        migrate_set_state(&s->state, state, MIGRATION_STATUS_WAIT_UNPLUG);
>          while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
>                 qemu_savevm_state_guest_unplug_pending()) {
> @@ -3410,9 +3409,7 @@ static void qemu_savevm_wait_unplug(Migr
>              }
>          }
> -        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state);
> -    } else {
> -        migrate_set_state(&s->state, old_state, new_state);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, state);
>      }
>  }
> @@ -3469,17 +3466,19 @@ static void *migration_thread(void *opaq
>          qemu_savevm_send_colo_enable(s->to_dst_file);
>      }
> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP);
> +
>      bql_lock();
>      qemu_savevm_state_setup(s->to_dst_file);
>      bql_unlock();
> -    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> -                               MIGRATION_STATUS_ACTIVE);
> -
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>      trace_migration_thread_setup_complete();
> +    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> +                      MIGRATION_STATUS_ACTIVE);
> +
>      while (migration_is_active()) {
>          if (urgent || !migration_rate_exceeded(s->to_dst_file)) {
>              MigIterateState iter_state = migration_iteration_run(s);
> @@ -3580,18 +3579,20 @@ static void *bg_migration_thread(void *o
>      ram_write_tracking_prepare();
>  #endif
> +    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP);
> +
>      bql_lock();
>      qemu_savevm_state_header(s->to_dst_file);
>      qemu_savevm_state_setup(s->to_dst_file);
>      bql_unlock();
> -    qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
> -                               MIGRATION_STATUS_ACTIVE);
> -
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>      trace_migration_thread_setup_complete();
> +    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> +                      MIGRATION_STATUS_ACTIVE);
> +
>      bql_lock();
>      if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 12:20                         ` Cédric Le Goater
  2024-03-15 13:09                           ` Peter Xu
@ 2024-03-15 13:11                           ` Peter Xu
  2024-03-15 14:31                             ` Cédric Le Goater
  2024-03-15 14:21                           ` Cédric Le Goater
  2 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-15 13:11 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 15, 2024 at 01:20:49PM +0100, Cédric Le Goater wrote:
> +static void qemu_savevm_wait_unplug(MigrationState *s, int state)

One more trivial comment: I'd even consider dropping "state" altogether, as
this should be the only state this function should be invoked.  So we can
perhaps assert it instead of passing it over?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 12:20                         ` Cédric Le Goater
  2024-03-15 13:09                           ` Peter Xu
  2024-03-15 13:11                           ` Peter Xu
@ 2024-03-15 14:21                           ` Cédric Le Goater
  2024-03-15 14:52                             ` Peter Xu
  2 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-15 14:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/15/24 13:20, Cédric Le Goater wrote:
> On 3/15/24 12:01, Peter Xu wrote:
>> On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
>>>> migrate_set_state is also unintuitive because it ignores invalid state
>>>> transitions and we've been using that property to deal with special
>>>> states such as POSTCOPY_PAUSED and FAILED:
>>>>
>>>> - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
>>>>     migrate_init() will try to set the state NONE->SETUP, which is not
>>>>     valid.
>>>>
>>>> - After save_setup fails, the migration goes into FAILED, but wait_unplug
>>>>     will try to transition SETUP->ACTIVE, which is also not valid.
>>>>
>>>
>>> I am not sure I understand what the plan is. Both solutions are problematic
>>> regarding the state transitions.
>>>
>>> Should we consider that waiting for failover devices to unplug is an internal
>>> step of the SETUP phase not transitioning to ACTIVE ?
>>
>> If to unblock this series, IIUC the simplest solution is to do what Fabiano
>> suggested, that we move qemu_savevm_wait_unplug() to be before the check of
>> setup() ret. 
> 
> The simplest is IMHO moving qemu_savevm_wait_unplug() before
> qemu_savevm_state_setup() and leave patch 10 is unchanged. See
> below the extra patch. It looks much cleaner than what we have
> today.
> 
>> In that case, the state change in qemu_savevm_wait_unplug()
>> should be benign and we should see a super small window it became ACTIVE
>> but then it should be FAILED (and IIUC the patch itself will need to use
>> ACTIVE as "old_state", not SETUP anymore).
> 
> OK. I will give it a try to compare.

Here's the alternative solution. SETUP state failures are handled after
transitioning to ACTIVE state, which is unfortunate but probably harmless.
I guess it's OK.

Thanks,

C.



Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
  migration/savevm.h    |  2 +-
  migration/migration.c | 29 +++++++++++++++++++++++++++--
  migration/savevm.c    | 26 +++++++++++++++-----------
  3 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/migration/savevm.h b/migration/savevm.h
index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -32,7 +32,7 @@
  bool qemu_savevm_state_blocked(Error **errp);
  void qemu_savevm_non_migratable_list(strList **reasons);
  int qemu_savevm_state_prepare(Error **errp);
-void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
  bool qemu_savevm_state_guest_unplug_pending(void);
  int qemu_savevm_state_resume_prepare(MigrationState *s);
  void qemu_savevm_state_header(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 644e073b7dcc70cb2bdaa9c975ba478952465ff4..0704ad6226df61f2f15bd81a2897f9946d601ca7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3427,6 +3427,8 @@ static void *migration_thread(void *opaque)
      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
      MigThrError thr_error;
      bool urgent = false;
+    Error *local_err = NULL;
+    int ret;
  
      thread = migration_threads_add("live_migration", qemu_get_thread_id());
  
@@ -3470,12 +3472,24 @@ static void *migration_thread(void *opaque)
      }
  
      bql_lock();
-    qemu_savevm_state_setup(s->to_dst_file);
+    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
      bql_unlock();
  
      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                 MIGRATION_STATUS_ACTIVE);
  
+    /*
+     * Handle SETUP failures after waiting for virtio-net-failover
+     * devices to unplug. This to preserve migration state transitions.
+     */
+    if (ret) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+        goto out;
+    }
+
      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
  
      trace_migration_thread_setup_complete();
@@ -3549,6 +3563,8 @@ static void *bg_migration_thread(void *opaque)
      MigThrError thr_error;
      QEMUFile *fb;
      bool early_fail = true;
+    Error *local_err = NULL;
+    int ret;
  
      rcu_register_thread();
      object_ref(OBJECT(s));
@@ -3582,12 +3598,20 @@ static void *bg_migration_thread(void *opaque)
  
      bql_lock();
      qemu_savevm_state_header(s->to_dst_file);
-    qemu_savevm_state_setup(s->to_dst_file);
+    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
      bql_unlock();
  
      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
                                 MIGRATION_STATUS_ACTIVE);
  
+    if (ret) {
+        migrate_set_error(s, local_err);
+        error_free(local_err);
+        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+        goto fail_setup;
+    }
+
      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
  
      trace_migration_thread_setup_complete();
@@ -3656,6 +3680,7 @@ fail:
          bql_unlock();
      }
  
+fail_setup:
      bg_migration_iteration_finish(s);
  
      qemu_fclose(fb);
diff --git a/migration/savevm.c b/migration/savevm.c
index 1a7b5cb78a912c36ae16db703afc90ef2906b61f..0eb94e61f888adba2c0732c2cb701b110814c455 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
      return 0;
  }
  
-void qemu_savevm_state_setup(QEMUFile *f)
+int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
  {
+    ERRP_GUARD();
      MigrationState *ms = migrate_get_current();
      SaveStateEntry *se;
-    Error *local_err = NULL;
      int ret = 0;
  
      json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
@@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
      trace_savevm_state_setup();
      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
          if (se->vmsd && se->vmsd->early_setup) {
-            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
+            ret = vmstate_save(f, se, ms->vmdesc, errp);
              if (ret) {
-                migrate_set_error(ms, local_err);
-                error_report_err(local_err);
+                migrate_set_error(ms, *errp);
                  qemu_file_set_error(f, ret);
                  break;
              }
@@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
          ret = se->ops->save_setup(f, se->opaque);
          save_section_footer(f, se);
          if (ret < 0) {
+            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
+                       "%d(%s): %d", se->section_id, se->idstr, ret);
              qemu_file_set_error(f, ret);
              break;
          }
      }
  
      if (ret) {
-        return;
+        return ret;
      }
  
-    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
-        error_report_err(local_err);
-    }
+    /* TODO: Should we check that errp is set in case of failure ? */
+    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
  }
  
  int qemu_savevm_state_resume_prepare(MigrationState *s)
@@ -1725,7 +1725,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
      ms->to_dst_file = f;
  
      qemu_savevm_state_header(f);
-    qemu_savevm_state_setup(f);
+    ret = qemu_savevm_state_setup(f, errp);
+    if (ret) {
+        goto cleanup;
+    }
  
      while (qemu_file_get_error(f) == 0) {
          if (qemu_savevm_state_iterate(f, false) > 0) {
@@ -1738,10 +1741,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
          qemu_savevm_state_complete_precopy(f, false, false);
          ret = qemu_file_get_error(f);
      }
-    qemu_savevm_state_cleanup();
      if (ret != 0) {
          error_setg_errno(errp, -ret, "Error while writing VM state");
      }
+cleanup:
+    qemu_savevm_state_cleanup();
  
      if (ret != 0) {
          status = MIGRATION_STATUS_FAILED;
-- 
2.44.0




^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 13:09                           ` Peter Xu
@ 2024-03-15 14:30                             ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-15 14:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/15/24 14:09, Peter Xu wrote:
> On Fri, Mar 15, 2024 at 01:20:49PM +0100, Cédric Le Goater wrote:
>> On 3/15/24 12:01, Peter Xu wrote:
>>> On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
>>>>> migrate_set_state is also unintuitive because it ignores invalid state
>>>>> transitions and we've been using that property to deal with special
>>>>> states such as POSTCOPY_PAUSED and FAILED:
>>>>>
>>>>> - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
>>>>>      migrate_init() will try to set the state NONE->SETUP, which is not
>>>>>      valid.
>>>>>
>>>>> - After save_setup fails, the migration goes into FAILED, but wait_unplug
>>>>>      will try to transition SETUP->ACTIVE, which is also not valid.
>>>>>
>>>>
>>>> I am not sure I understand what the plan is. Both solutions are problematic
>>>> regarding the state transitions.
>>>>
>>>> Should we consider that waiting for failover devices to unplug is an internal
>>>> step of the SETUP phase not transitioning to ACTIVE ?
>>>
>>> If to unblock this series, IIUC the simplest solution is to do what Fabiano
>>> suggested, that we move qemu_savevm_wait_unplug() to be before the check of
>>> setup() ret.
>>
>> The simplest is IMHO moving qemu_savevm_wait_unplug() before
>> qemu_savevm_state_setup() and leave patch 10 is unchanged. See
>> below the extra patch. It looks much cleaner than what we have
>> today.
> 
> Yes it looks cleaner indeed, it's just that then we'll have one more
> possible state conversions like SETUP->UNPLUG->SETUP.  I'd say it's fine,
> but let's also copy Laruent and Laine if it's going to be posted formally.

OK. I just sent the alternative implementation. The code looks a little
ugly  :

     bql_lock();                                                           |
     qemu_savevm_state_header(s->to_dst_file);                             |
     ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);            | in SETUP state
     bql_unlock();                                                         |
                                                                           |
     qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,                    | SETUP -> ACTIVE transition
                                MIGRATION_STATUS_ACTIVE);                  |
                                                                           |
     /*                                                                    |
      * Handle SETUP failures after waiting for virtio-net-failover        |
      * devices to unplug. This to preserve migration state transitions.   |
      */                                                                   |
     if (ret) {                                                            |
         migrate_set_error(s, local_err);                                  | handling SETUP errors in ACTIVE
         error_free(local_err);                                            |
         migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,             |
                           MIGRATION_STATUS_FAILED);                       |
         goto fail_setup;                                                  |
     }                                                                     |
                                                                           |
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;     | SETUP duration
                                                                           |
     trace_migration_thread_setup_complete();                              | SETUP trace event



Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 13:11                           ` Peter Xu
@ 2024-03-15 14:31                             ` Cédric Le Goater
  2024-03-15 14:57                               ` Peter Xu
  0 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-15 14:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/15/24 14:11, Peter Xu wrote:
> On Fri, Mar 15, 2024 at 01:20:49PM +0100, Cédric Le Goater wrote:
>> +static void qemu_savevm_wait_unplug(MigrationState *s, int state)
> 
> One more trivial comment: I'd even consider dropping "state" altogether, as
> this should be the only state this function should be invoked.  So we can
> perhaps assert it instead of passing it over?

Yes. If you prefer this implementation I will change.


Thanks,

C.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 14:21                           ` Cédric Le Goater
@ 2024-03-15 14:52                             ` Peter Xu
  2024-03-19 10:46                               ` Cédric Le Goater
  0 siblings, 1 reply; 111+ messages in thread
From: Peter Xu @ 2024-03-15 14:52 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 15, 2024 at 03:21:27PM +0100, Cédric Le Goater wrote:
> On 3/15/24 13:20, Cédric Le Goater wrote:
> > On 3/15/24 12:01, Peter Xu wrote:
> > > On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
> > > > > migrate_set_state is also unintuitive because it ignores invalid state
> > > > > transitions and we've been using that property to deal with special
> > > > > states such as POSTCOPY_PAUSED and FAILED:
> > > > > 
> > > > > - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
> > > > >     migrate_init() will try to set the state NONE->SETUP, which is not
> > > > >     valid.
> > > > > 
> > > > > - After save_setup fails, the migration goes into FAILED, but wait_unplug
> > > > >     will try to transition SETUP->ACTIVE, which is also not valid.
> > > > > 
> > > > 
> > > > I am not sure I understand what the plan is. Both solutions are problematic
> > > > regarding the state transitions.
> > > > 
> > > > Should we consider that waiting for failover devices to unplug is an internal
> > > > step of the SETUP phase not transitioning to ACTIVE ?
> > > 
> > > If to unblock this series, IIUC the simplest solution is to do what Fabiano
> > > suggested, that we move qemu_savevm_wait_unplug() to be before the check of
> > > setup() ret.
> > 
> > The simplest is IMHO moving qemu_savevm_wait_unplug() before
> > qemu_savevm_state_setup() and leave patch 10 is unchanged. See
> > below the extra patch. It looks much cleaner than what we have
> > today.
> > 
> > > In that case, the state change in qemu_savevm_wait_unplug()
> > > should be benign and we should see a super small window it became ACTIVE
> > > but then it should be FAILED (and IIUC the patch itself will need to use
> > > ACTIVE as "old_state", not SETUP anymore).
> > 
> > OK. I will give it a try to compare.
> 
> Here's the alternative solution. SETUP state failures are handled after
> transitioning to ACTIVE state, which is unfortunate but probably harmless.
> I guess it's OK.

This also looks good to me, thanks.

One trivial early comment is in this case we can introduce a helper to
cover both setup() calls and UNPLUG waits and dedup the two paths.

> 
> Thanks,
> 
> C.
> 
> 
> 
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>  migration/savevm.h    |  2 +-
>  migration/migration.c | 29 +++++++++++++++++++++++++++--
>  migration/savevm.c    | 26 +++++++++++++++-----------
>  3 files changed, 43 insertions(+), 14 deletions(-)
> 
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -32,7 +32,7 @@
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_non_migratable_list(strList **reasons);
>  int qemu_savevm_state_prepare(Error **errp);
> -void qemu_savevm_state_setup(QEMUFile *f);
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
>  bool qemu_savevm_state_guest_unplug_pending(void);
>  int qemu_savevm_state_resume_prepare(MigrationState *s);
>  void qemu_savevm_state_header(QEMUFile *f);
> diff --git a/migration/migration.c b/migration/migration.c
> index 644e073b7dcc70cb2bdaa9c975ba478952465ff4..0704ad6226df61f2f15bd81a2897f9946d601ca7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3427,6 +3427,8 @@ static void *migration_thread(void *opaque)
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      MigThrError thr_error;
>      bool urgent = false;
> +    Error *local_err = NULL;
> +    int ret;
>      thread = migration_threads_add("live_migration", qemu_get_thread_id());
> @@ -3470,12 +3472,24 @@ static void *migration_thread(void *opaque)
>      }
>      bql_lock();
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>      bql_unlock();
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
> +    /*
> +     * Handle SETUP failures after waiting for virtio-net-failover
> +     * devices to unplug. This to preserve migration state transitions.
> +     */
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> +                          MIGRATION_STATUS_FAILED);
> +        goto out;
> +    }
> +
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>      trace_migration_thread_setup_complete();
> @@ -3549,6 +3563,8 @@ static void *bg_migration_thread(void *opaque)
>      MigThrError thr_error;
>      QEMUFile *fb;
>      bool early_fail = true;
> +    Error *local_err = NULL;
> +    int ret;
>      rcu_register_thread();
>      object_ref(OBJECT(s));
> @@ -3582,12 +3598,20 @@ static void *bg_migration_thread(void *opaque)
>      bql_lock();
>      qemu_savevm_state_header(s->to_dst_file);
> -    qemu_savevm_state_setup(s->to_dst_file);
> +    ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
>      bql_unlock();
>      qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
>                                 MIGRATION_STATUS_ACTIVE);
> +    if (ret) {
> +        migrate_set_error(s, local_err);
> +        error_free(local_err);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
> +                          MIGRATION_STATUS_FAILED);
> +        goto fail_setup;
> +    }
> +
>      s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
>      trace_migration_thread_setup_complete();
> @@ -3656,6 +3680,7 @@ fail:
>          bql_unlock();
>      }
> +fail_setup:
>      bg_migration_iteration_finish(s);
>      qemu_fclose(fb);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1a7b5cb78a912c36ae16db703afc90ef2906b61f..0eb94e61f888adba2c0732c2cb701b110814c455 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1310,11 +1310,11 @@ int qemu_savevm_state_prepare(Error **errp)
>      return 0;
>  }
> -void qemu_savevm_state_setup(QEMUFile *f)
> +int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
>  {
> +    ERRP_GUARD();
>      MigrationState *ms = migrate_get_current();
>      SaveStateEntry *se;
> -    Error *local_err = NULL;
>      int ret = 0;
>      json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
> @@ -1323,10 +1323,9 @@ void qemu_savevm_state_setup(QEMUFile *f)
>      trace_savevm_state_setup();
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (se->vmsd && se->vmsd->early_setup) {
> -            ret = vmstate_save(f, se, ms->vmdesc, &local_err);
> +            ret = vmstate_save(f, se, ms->vmdesc, errp);
>              if (ret) {
> -                migrate_set_error(ms, local_err);
> -                error_report_err(local_err);
> +                migrate_set_error(ms, *errp);
>                  qemu_file_set_error(f, ret);
>                  break;
>              }
> @@ -1346,18 +1345,19 @@ void qemu_savevm_state_setup(QEMUFile *f)
>          ret = se->ops->save_setup(f, se->opaque);
>          save_section_footer(f, se);
>          if (ret < 0) {
> +            error_setg(errp, "failed to setup SaveStateEntry with id(name): "
> +                       "%d(%s): %d", se->section_id, se->idstr, ret);
>              qemu_file_set_error(f, ret);
>              break;
>          }
>      }
>      if (ret) {
> -        return;
> +        return ret;
>      }
> -    if (precopy_notify(PRECOPY_NOTIFY_SETUP, &local_err)) {
> -        error_report_err(local_err);
> -    }
> +    /* TODO: Should we check that errp is set in case of failure ? */
> +    return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>  }
>  int qemu_savevm_state_resume_prepare(MigrationState *s)
> @@ -1725,7 +1725,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      ms->to_dst_file = f;
>      qemu_savevm_state_header(f);
> -    qemu_savevm_state_setup(f);
> +    ret = qemu_savevm_state_setup(f, errp);
> +    if (ret) {
> +        goto cleanup;
> +    }
>      while (qemu_file_get_error(f) == 0) {
>          if (qemu_savevm_state_iterate(f, false) > 0) {
> @@ -1738,10 +1741,11 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>          qemu_savevm_state_complete_precopy(f, false, false);
>          ret = qemu_file_get_error(f);
>      }
> -    qemu_savevm_state_cleanup();
>      if (ret != 0) {
>          error_setg_errno(errp, -ret, "Error while writing VM state");
>      }
> +cleanup:
> +    qemu_savevm_state_cleanup();
>      if (ret != 0) {
>          status = MIGRATION_STATUS_FAILED;
> -- 
> 2.44.0
> 
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 14:31                             ` Cédric Le Goater
@ 2024-03-15 14:57                               ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-15 14:57 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On Fri, Mar 15, 2024 at 03:31:28PM +0100, Cédric Le Goater wrote:
> On 3/15/24 14:11, Peter Xu wrote:
> > On Fri, Mar 15, 2024 at 01:20:49PM +0100, Cédric Le Goater wrote:
> > > +static void qemu_savevm_wait_unplug(MigrationState *s, int state)
> > 
> > One more trivial comment: I'd even consider dropping "state" altogether, as
> > this should be the only state this function should be invoked.  So we can
> > perhaps assert it instead of passing it over?
> 
> Yes. If you prefer this implementation I will change.

I am fine with either approach, we can wait for 1-2 days to see whether
others want to say.  Otherwise the other approach actually looks better to
me in that it avoids SETUP->UNPLUG->SETUP jumps.

And then we wait to see whether UNPLUG can be dropped for either way to go,
perhaps starting from adding it into deprecation list if no objections from
the relevant folks.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr()
  2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
@ 2024-03-15 15:06   ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-15 15:06 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Michael S. Tsirkin,
	Paolo Bonzini, David Hildenbrand

On Wed, Mar 06, 2024 at 02:34:37PM +0100, Cédric Le Goater wrote:
> Let the callers do the reporting. This will be useful in
> vfio_iommu_map_dirty_notify().
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
  2024-03-15 11:34   ` Peter Xu
@ 2024-03-16  2:41   ` Yong Huang
  2024-03-18 16:19     ` Cédric Le Goater
  1 sibling, 1 reply; 111+ messages in thread
From: Yong Huang @ 2024-03-16  2:41 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Alex Williamson,
	Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 8494 bytes --]

On Wed, Mar 6, 2024 at 9:35 PM Cédric Le Goater <clg@redhat.com> wrote:

> Now that the log_global*() handlers take an Error** parameter and
> return a bool, do the same for memory_global_dirty_log_start() and
> memory_global_dirty_log_stop(). The error is reported in the callers
> for now and it will be propagated in the call stack in the next
> changes.


> To be noted a functional change in ram_init_bitmaps(), if the dirty
>
Hi, Cédric Le Goater. Could the functional modification be made
separately from the patch? And my "Reviewed-by" is attached
to the first patch that refines memory_global_dirty_log_start's
function declaration.


> pages logger fails to start, there is no need to synchronize the dirty
> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> similar way.
>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Anthony Perard <anthony.perard@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hyman Huang <yong.huang@smartx.com>
> Reviewed-by: Hyman Huang <yong.huang@smartx.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
>
>  Changes in v4:
>
>  - Dropped log_global_stop() and log_global_sync() changes
>
>  include/exec/memory.h |  5 ++++-
>  hw/i386/xen/xen-hvm.c |  2 +-
>  migration/dirtyrate.c | 13 +++++++++++--
>  migration/ram.c       | 22 ++++++++++++++++++++--
>  system/memory.c       | 11 +++++------
>  5 files changed, 41 insertions(+), 12 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index
> 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8
> 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2570,8 +2570,11 @@ void memory_listener_unregister(MemoryListener
> *listener);
>   * memory_global_dirty_log_start: begin dirty logging for all regions
>   *
>   * @flags: purpose of starting dirty log, migration or dirty rate
> + * @errp: pointer to Error*, to store an error if it happens.
> + *
> + * Return: true on success, else false setting @errp with error.
>   */
> -void memory_global_dirty_log_start(unsigned int flags);
> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
>
>  /**
>   * memory_global_dirty_log_stop: end dirty logging for all regions
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index
> 0608ca99f5166fd6379ee674442484e805eff9c0..57cb7df50788a6c31eff68c95e8eaa856fdebede
> 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -654,7 +654,7 @@ void xen_hvm_modified_memory(ram_addr_t start,
> ram_addr_t length)
>  void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
>  {
>      if (enable) {
> -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
>      } else {
>          memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
>      }
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index
> 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98
> 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -90,9 +90,15 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord
> dirty_pages,
>
>  void global_dirty_log_change(unsigned int flag, bool start)
>  {
> +    Error *local_err = NULL;
> +    bool ret;
> +
>      bql_lock();
>      if (start) {
> -        memory_global_dirty_log_start(flag);
> +        ret = memory_global_dirty_log_start(flag, &local_err);
> +        if (!ret) {
> +            error_report_err(local_err);
> +        }
>      } else {
>          memory_global_dirty_log_stop(flag);
>      }
> @@ -608,9 +614,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> DirtyRateConfig config)
>  {
>      int64_t start_time;
>      DirtyPageRecord dirty_pages;
> +    Error *local_err = NULL;
>
>      bql_lock();
> -    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
> +    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE,
> &local_err)) {
> +        error_report_err(local_err);
> +    }
>
>      /*
>       * 1'round of log sync may return all 1 bits with
> diff --git a/migration/ram.c b/migration/ram.c
> index
> c5149b7d717aefad7f590422af0ea4a40e7507be..397b4c0f218a66d194e44f9c5f9fe8e9885c48b6
> 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2836,18 +2836,31 @@ static void
> migration_bitmap_clear_discarded_pages(RAMState *rs)
>
>  static void ram_init_bitmaps(RAMState *rs)
>  {
> +    Error *local_err = NULL;
> +    bool ret = true;
> +
>      qemu_mutex_lock_ramlist();
>
>      WITH_RCU_READ_LOCK_GUARD() {
>          ram_list_init_bitmaps();
>          /* We don't use dirty log with background snapshots */
>          if (!migrate_background_snapshot()) {
> -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
> +                                                &local_err);
> +            if (!ret) {
> +                error_report_err(local_err);
> +                goto out_unlock;
> +            }
>              migration_bitmap_sync_precopy(rs, false);
>          }
>      }
> +out_unlock:
>      qemu_mutex_unlock_ramlist();
>
> +    if (!ret) {
> +        return;
> +    }
> +
>      /*
>       * After an eventual first bitmap sync, fixup the initial bitmap
>       * containing all 1s to exclude any discarded pages from migration.
> @@ -3631,6 +3644,8 @@ int colo_init_ram_cache(void)
>  void colo_incoming_start_dirty_log(void)
>  {
>      RAMBlock *block = NULL;
> +    Error *local_err = NULL;
> +
>      /* For memory_global_dirty_log_start below. */
>      bql_lock();
>      qemu_mutex_lock_ramlist();
> @@ -3642,7 +3657,10 @@ void colo_incoming_start_dirty_log(void)
>              /* Discard this dirty bitmap record */
>              bitmap_zero(block->bmap, block->max_length >>
> TARGET_PAGE_BITS);
>          }
> -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +        if (!memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
> +                                           &local_err)) {
> +            error_report_err(local_err);
> +        }
>      }
>      ram_state->migration_dirty_pages = 0;
>      qemu_mutex_unlock_ramlist();
> diff --git a/system/memory.c b/system/memory.c
> index
> 3600e716149407c10a1f6bf8f0a81c2611cf15ba..cbc098216b789f50460f1d1bc7ec122030693d9e
> 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2931,10 +2931,9 @@ static void
> memory_global_dirty_log_rollback(MemoryListener *listener,
>      }
>  }
>
> -void memory_global_dirty_log_start(unsigned int flags)
> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp)
>  {
>      unsigned int old_flags;
> -    Error *local_err = NULL;
>
>      assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
>
> @@ -2946,7 +2945,7 @@ void memory_global_dirty_log_start(unsigned int
> flags)
>
>      flags &= ~global_dirty_tracking;
>      if (!flags) {
> -        return;
> +        return true;
>      }
>
>      old_flags = global_dirty_tracking;
> @@ -2959,7 +2958,7 @@ void memory_global_dirty_log_start(unsigned int
> flags)
>
>          QTAILQ_FOREACH(listener, &memory_listeners, link) {
>              if (listener->log_global_start) {
> -                ret = listener->log_global_start(listener, &local_err);
> +                ret = listener->log_global_start(listener, errp);
>                  if (!ret) {
>                      break;
>                  }
> @@ -2969,14 +2968,14 @@ void memory_global_dirty_log_start(unsigned int
> flags)
>          if (!ret) {
>              memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
>                                               flags);
> -            error_report_err(local_err);
> -            return;
> +            return false;
>          }
>
>          memory_region_transaction_begin();
>          memory_region_update_pending = true;
>          memory_region_transaction_commit();
>      }
> +    return true;
>  }
>
>  static void memory_global_dirty_log_do_stop(unsigned int flags)
> --
> 2.44.0
>
>

-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 11504 bytes --]

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-15 11:34   ` Peter Xu
@ 2024-03-18 10:43     ` Cédric Le Goater
  2024-03-18 16:03     ` Cédric Le Goater
  2024-03-18 16:08     ` Cédric Le Goater
  2 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 10:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand, Hyman Huang

>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
>>   
>>   static void ram_init_bitmaps(RAMState *rs)
>>   {
>> +    Error *local_err = NULL;
>> +    bool ret = true;
>> +
>>       qemu_mutex_lock_ramlist();
>>   
>>       WITH_RCU_READ_LOCK_GUARD() {
>>           ram_list_init_bitmaps();
>>           /* We don't use dirty log with background snapshots */
>>           if (!migrate_background_snapshot()) {
>> -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>> +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
>> +                                                &local_err);
>> +            if (!ret) {
>> +                error_report_err(local_err);
>> +                goto out_unlock;
> 
> Here we may need to free the bitmaps created in ram_list_init_bitmaps().
> 
> We can have a helper ram_bitmaps_destroy() for that.
> 
> One thing be careful is the new file_bmap can be created but missing in the
> ram_save_cleanup(), it's because it's freed earlier.  IMHO if we will have
> a new ram_bitmaps_destroy() we can unconditionally free file_bmap there
> too, as if it's freed early g_free() is noop.

OK. Let's do that in a new prereq patch. I will change ram_state_init()
and xbzrle_init() to take an Error ** argument while at it.


Thanks,

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler
  2024-03-15 11:18   ` Peter Xu
@ 2024-03-18 14:33     ` Cédric Le Goater
  2024-03-18 14:54     ` Cédric Le Goater
  1 sibling, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 14:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

On 3/15/24 12:18, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:28PM +0100, Cédric Le Goater wrote:
>> diff --git a/system/memory.c b/system/memory.c
>> index a229a79988fce2aa3cb77e3a130db4c694e8cd49..3600e716149407c10a1f6bf8f0a81c2611cf15ba 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2914,9 +2914,27 @@ static unsigned int postponed_stop_flags;
>>   static VMChangeStateEntry *vmstate_change;
>>   static void memory_global_dirty_log_stop_postponed_run(void);
>>   
>> +/*
>> + * Stop dirty logging on all listeners where it was previously enabled.
>> + */
>> +static void memory_global_dirty_log_rollback(MemoryListener *listener,
>> +                                             unsigned int flags)
>> +{
>> +    global_dirty_tracking &= ~flags;
> 
> Having a hook rollback function to touch the global_dirty_tracking flag is
> IMHO tricky.
> 
> Can we instead provide a helper to call all log_global_start() hooks, but
> allow a gracefully fail (so rollback will be called if it fails)?
> 
>    bool memory_global_dirty_log_start_hooks(...)
> 
> Or any better names..  Leaving global_dirty_tracking rollback to
> memory_global_dirty_log_start() when it returns false.
> 
> Would this be cleaner?

I will introduce a memory_global_dirty_log_do_start() helper to call
the log_global_start() handlers and to do the rollback in case of
error. Modification of the global_dirty_tracking flag will stay local
to memory_global_dirty_log_start() to avoid any futur errors.


>> +    trace_global_dirty_changed(global_dirty_tracking);
>> +
>> +    while (listener) {
>> +        if (listener->log_global_stop) {
>> +            listener->log_global_stop(listener);
>> +        }
>> +        listener = QTAILQ_PREV(listener, link);
>> +    }
>> +}
>> +
>>   void memory_global_dirty_log_start(unsigned int flags)
>>   {
>>       unsigned int old_flags;
>> +    Error *local_err = NULL;
>>   
>>       assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
>>   
>> @@ -2936,7 +2954,25 @@ void memory_global_dirty_log_start(unsigned int flags)
>>       trace_global_dirty_changed(global_dirty_tracking);
>>   
>>       if (!old_flags) {
>> -        MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
>> +        MemoryListener *listener;
>> +        bool ret = true;
>> +
>> +        QTAILQ_FOREACH(listener, &memory_listeners, link) {
>> +            if (listener->log_global_start) {
>> +                ret = listener->log_global_start(listener, &local_err);
>> +                if (!ret) {
>> +                    break;
>> +                }
>> +            }
>> +        }
>> +
>> +        if (!ret) {
>> +            memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
>> +                                             flags);
>> +            error_report_err(local_err);
>> +            return;
>> +        }
>> +
>>           memory_region_transaction_begin();
>>           memory_region_update_pending = true;
>>           memory_region_transaction_commit();
>> @@ -3009,13 +3045,16 @@ static void listener_add_address_space(MemoryListener *listener,
>>   {
>>       FlatView *view;
>>       FlatRange *fr;
>> +    Error *local_err = NULL;
>>   
>>       if (listener->begin) {
>>           listener->begin(listener);
>>       }
>>       if (global_dirty_tracking) {
>>           if (listener->log_global_start) {
>> -            listener->log_global_start(listener);
>> +            if (!listener->log_global_start(listener, &local_err)) {
>> +                error_report_err(local_err);
>> +            }
> 
> IMHO we should assert here instead of error report.  We have this to guard
> hot-plug during migration so I think the assert is justified:
> 
> qdev_device_add_from_qdict():
> 
>      if (!migration_is_idle()) {
>          error_setg(errp, "device_add not allowed while migrating");
>          return NULL;
>      }
> 
> If it really happens it's a bug, as listener_add_address_space() will still
> keep the rest things around even if the hook failed.  It'll start to be a
> total mess..

OK. I will change the Error parameter to error_abort in that case.

However, It would be useful to catch errors of the .region_add() handler
for VFIO. Let's address that later.

Thanks,

C.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler
  2024-03-15 11:18   ` Peter Xu
  2024-03-18 14:33     ` Cédric Le Goater
@ 2024-03-18 14:54     ` Cédric Le Goater
  2024-03-18 16:27       ` Peter Xu
  1 sibling, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 14:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

On 3/15/24 12:18, Peter Xu wrote:
>> @@ -3009,13 +3045,16 @@ static void listener_add_address_space(MemoryListener *listener,
>>   {
>>       FlatView *view;
>>       FlatRange *fr;
>> +    Error *local_err = NULL;
>>   
>>       if (listener->begin) {
>>           listener->begin(listener);
>>       }
>>       if (global_dirty_tracking) {
>>           if (listener->log_global_start) {
>> -            listener->log_global_start(listener);
>> +            if (!listener->log_global_start(listener, &local_err)) {
>> +                error_report_err(local_err);
>> +            }
> IMHO we should assert here instead of error report.  We have this to guard
> hot-plug during migration so I think the assert is justified:
> 
> qdev_device_add_from_qdict():
> 
>      if (!migration_is_idle()) {
>          error_setg(errp, "device_add not allowed while migrating");
>          return NULL;
>      }
> 
> If it really happens it's a bug, as listener_add_address_space() will still
> keep the rest things around even if the hook failed.  It'll start to be a
> total mess..

It seems that adding a region listener while logging is active has been
supported from the beginning, commit 7664e80c8470 ("memory: add API for
observing  updates to the physical memory map"). Can it happen ? if not
we could simply remove the  log_global_start() call.

Thanks,

C.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-15 11:34   ` Peter Xu
  2024-03-18 10:43     ` Cédric Le Goater
@ 2024-03-18 16:03     ` Cédric Le Goater
  2024-03-18 16:08     ` Cédric Le Goater
  2 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 16:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand, Hyman Huang

On 3/15/24 12:34, Peter Xu wrote:
> On Wed, Mar 06, 2024 at 02:34:29PM +0100, Cédric Le Goater wrote:
>> Now that the log_global*() handlers take an Error** parameter and
>> return a bool, do the same for memory_global_dirty_log_start() and
>> memory_global_dirty_log_stop(). The error is reported in the callers
>> for now and it will be propagated in the call stack in the next
>> changes.
>>
>> To be noted a functional change in ram_init_bitmaps(), if the dirty
>> pages logger fails to start, there is no need to synchronize the dirty
>> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
>> similar way.
>>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: Anthony Perard <anthony.perard@citrix.com>
>> Cc: Paul Durrant <paul@xen.org>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Hyman Huang <yong.huang@smartx.com>
>> Reviewed-by: Hyman Huang <yong.huang@smartx.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>>
>>   Changes in v4:
>>
>>   - Dropped log_global_stop() and log_global_sync() changes
>>   
>>   include/exec/memory.h |  5 ++++-
>>   hw/i386/xen/xen-hvm.c |  2 +-
>>   migration/dirtyrate.c | 13 +++++++++++--
>>   migration/ram.c       | 22 ++++++++++++++++++++--
>>   system/memory.c       | 11 +++++------
>>   5 files changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -2570,8 +2570,11 @@ void memory_listener_unregister(MemoryListener *listener);
>>    * memory_global_dirty_log_start: begin dirty logging for all regions
>>    *
>>    * @flags: purpose of starting dirty log, migration or dirty rate
>> + * @errp: pointer to Error*, to store an error if it happens.
>> + *
>> + * Return: true on success, else false setting @errp with error.
>>    */
>> -void memory_global_dirty_log_start(unsigned int flags);
>> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
>>   
>>   /**
>>    * memory_global_dirty_log_stop: end dirty logging for all regions
>> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
>> index 0608ca99f5166fd6379ee674442484e805eff9c0..57cb7df50788a6c31eff68c95e8eaa856fdebede 100644
>> --- a/hw/i386/xen/xen-hvm.c
>> +++ b/hw/i386/xen/xen-hvm.c
>> @@ -654,7 +654,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
>>   void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
>>   {
>>       if (enable) {
>> -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>> +        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
>>       } else {
>>           memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
>>       }
>> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> index 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98 100644
>> --- a/migration/dirtyrate.c
>> +++ b/migration/dirtyrate.c
>> @@ -90,9 +90,15 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
>>   
>>   void global_dirty_log_change(unsigned int flag, bool start)
>>   {
>> +    Error *local_err = NULL;
>> +    bool ret;
>> +
>>       bql_lock();
>>       if (start) {
>> -        memory_global_dirty_log_start(flag);
>> +        ret = memory_global_dirty_log_start(flag, &local_err);
>> +        if (!ret) {
>> +            error_report_err(local_err);
>> +        }
>>       } else {
>>           memory_global_dirty_log_stop(flag);
>>       }
>> @@ -608,9 +614,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
>>   {
>>       int64_t start_time;
>>       DirtyPageRecord dirty_pages;
>> +    Error *local_err = NULL;
>>   
>>       bql_lock();
>> -    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
>> +    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE, &local_err)) {
>> +        error_report_err(local_err);
>> +    }
>>   
>>       /*
>>        * 1'round of log sync may return all 1 bits with
>> diff --git a/migration/ram.c b/migration/ram.c
>> index c5149b7d717aefad7f590422af0ea4a40e7507be..397b4c0f218a66d194e44f9c5f9fe8e9885c48b6 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
>>   
>>   static void ram_init_bitmaps(RAMState *rs)
>>   {
>> +    Error *local_err = NULL;
>> +    bool ret = true;
>> +
>>       qemu_mutex_lock_ramlist();
>>   
>>       WITH_RCU_READ_LOCK_GUARD() {
>>           ram_list_init_bitmaps();
>>           /* We don't use dirty log with background snapshots */
>>           if (!migrate_background_snapshot()) {
>> -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>> +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
>> +                                                &local_err);
>> +            if (!ret) {
>> +                error_report_err(local_err);
>> +                goto out_unlock;
> 
> Here we may need to free the bitmaps created in ram_list_init_bitmaps().
> 
> We can have a helper ram_bitmaps_destroy() for that.
> 
> One thing be careful is the new file_bmap can be created but missing in the
> ram_save_cleanup(), it's because it's freed earlier.  IMHO if we will have
> a new ram_bitmaps_destroy() we can unconditionally free file_bmap there
> too, as if it's freed early g_free() is noop.

ok. will do.


Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-15 11:34   ` Peter Xu
  2024-03-18 10:43     ` Cédric Le Goater
  2024-03-18 16:03     ` Cédric Le Goater
@ 2024-03-18 16:08     ` Cédric Le Goater
  2024-03-18 16:31       ` Peter Xu
  2 siblings, 1 reply; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 16:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand, Hyman Huang

>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
>>   
>>   static void ram_init_bitmaps(RAMState *rs)
>>   {
>> +    Error *local_err = NULL;
>> +    bool ret = true;
>> +
>>       qemu_mutex_lock_ramlist();
>>   
>>       WITH_RCU_READ_LOCK_GUARD() {
>>           ram_list_init_bitmaps();

btw, should we use bitmap_try_new() to create the bitmaps instead of
bitmap_new() which can abort() ?


>>           /* We don't use dirty log with background snapshots */
>>           if (!migrate_background_snapshot()) {
>> -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>> +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
>> +                                                &local_err);
>> +            if (!ret) {
>> +                error_report_err(local_err);
>> +                goto out_unlock;
> 
> Here we may need to free the bitmaps created in ram_list_init_bitmaps().


C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-16  2:41   ` Yong Huang
@ 2024-03-18 16:19     ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-18 16:19 UTC (permalink / raw)
  To: Yong Huang
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Alex Williamson,
	Avihai Horon, Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

On 3/16/24 03:41, Yong Huang wrote:
> 
> 
> On Wed, Mar 6, 2024 at 9:35 PM Cédric Le Goater <clg@redhat.com <mailto:clg@redhat.com>> wrote:
> 
>     Now that the log_global*() handlers take an Error** parameter and
>     return a bool, do the same for memory_global_dirty_log_start() and
>     memory_global_dirty_log_stop(). The error is reported in the callers
>     for now and it will be propagated in the call stack in the next
>     changes. 
> 
> 
>     To be noted a functional change in ram_init_bitmaps(), if the dirty
> 
> Hi, Cédric Le Goater. Could the functional modification be made
> separately from the patch? 

Are you suggesting one patch to add the Error ** parameter and a second
to report the error if there is a failure ? From the moment the prototype
is modified, all handlers need to take the change into account to avoid
a build break. Looks difficult.

> And my "Reviewed-by" is attached
> to the first patch that refines memory_global_dirty_log_start's
> function declaration.

OK. I should resend a v5 anyhow, I will remove your R-b and you can
reconsider.

Thanks,

C.


> 
>     pages logger fails to start, there is no need to synchronize the dirty
>     pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
>     similar way.
> 
>     Cc: Stefano Stabellini <sstabellini@kernel.org <mailto:sstabellini@kernel.org>>
>     Cc: Anthony Perard <anthony.perard@citrix.com <mailto:anthony.perard@citrix.com>>
>     Cc: Paul Durrant <paul@xen.org <mailto:paul@xen.org>>
>     Cc: "Michael S. Tsirkin" <mst@redhat.com <mailto:mst@redhat.com>>
>     Cc: Paolo Bonzini <pbonzini@redhat.com <mailto:pbonzini@redhat.com>>
>     Cc: David Hildenbrand <david@redhat.com <mailto:david@redhat.com>>
>     Cc: Hyman Huang <yong.huang@smartx.com <mailto:yong.huang@smartx.com>>
>     Reviewed-by: Hyman Huang <yong.huang@smartx.com <mailto:yong.huang@smartx.com>>
>     Signed-off-by: Cédric Le Goater <clg@redhat.com <mailto:clg@redhat.com>>
>     ---
> 
>       Changes in v4:
> 
>       - Dropped log_global_stop() and log_global_sync() changes
> 
>       include/exec/memory.h |  5 ++++-
>       hw/i386/xen/xen-hvm.c |  2 +-
>       migration/dirtyrate.c | 13 +++++++++++--
>       migration/ram.c       | 22 ++++++++++++++++++++--
>       system/memory.c       | 11 +++++------
>       5 files changed, 41 insertions(+), 12 deletions(-)
> 
>     diff --git a/include/exec/memory.h b/include/exec/memory.h
>     index 5555567bc4c9fdb53e8f63487f1400980275687d..c129ee6db7162504bd72d4cfc69b5affb2cd87e8 100644
>     --- a/include/exec/memory.h
>     +++ b/include/exec/memory.h
>     @@ -2570,8 +2570,11 @@ void memory_listener_unregister(MemoryListener *listener);
>        * memory_global_dirty_log_start: begin dirty logging for all regions
>        *
>        * @flags: purpose of starting dirty log, migration or dirty rate
>     + * @errp: pointer to Error*, to store an error if it happens.
>     + *
>     + * Return: true on success, else false setting @errp with error.
>        */
>     -void memory_global_dirty_log_start(unsigned int flags);
>     +bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
> 
>       /**
>        * memory_global_dirty_log_stop: end dirty logging for all regions
>     diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
>     index 0608ca99f5166fd6379ee674442484e805eff9c0..57cb7df50788a6c31eff68c95e8eaa856fdebede 100644
>     --- a/hw/i386/xen/xen-hvm.c
>     +++ b/hw/i386/xen/xen-hvm.c
>     @@ -654,7 +654,7 @@ void xen_hvm_modified_memory(ram_addr_t start, ram_addr_t length)
>       void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
>       {
>           if (enable) {
>     -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>     +        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
>           } else {
>               memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
>           }
>     diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>     index 1d2e85746fb7b10eb7f149976970f9a92125af8a..d02d70b7b4b86a29d4d5540ded416543536d8f98 100644
>     --- a/migration/dirtyrate.c
>     +++ b/migration/dirtyrate.c
>     @@ -90,9 +90,15 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
> 
>       void global_dirty_log_change(unsigned int flag, bool start)
>       {
>     +    Error *local_err = NULL;
>     +    bool ret;
>     +
>           bql_lock();
>           if (start) {
>     -        memory_global_dirty_log_start(flag);
>     +        ret = memory_global_dirty_log_start(flag, &local_err);
>     +        if (!ret) {
>     +            error_report_err(local_err);
>     +        }
>           } else {
>               memory_global_dirty_log_stop(flag);
>           }
>     @@ -608,9 +614,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
>       {
>           int64_t start_time;
>           DirtyPageRecord dirty_pages;
>     +    Error *local_err = NULL;
> 
>           bql_lock();
>     -    memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
>     +    if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE, &local_err)) {
>     +        error_report_err(local_err);
>     +    }
> 
>           /*
>            * 1'round of log sync may return all 1 bits with
>     diff --git a/migration/ram.c b/migration/ram.c
>     index c5149b7d717aefad7f590422af0ea4a40e7507be..397b4c0f218a66d194e44f9c5f9fe8e9885c48b6 100644
>     --- a/migration/ram.c
>     +++ b/migration/ram.c
>     @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
> 
>       static void ram_init_bitmaps(RAMState *rs)
>       {
>     +    Error *local_err = NULL;
>     +    bool ret = true;
>     +
>           qemu_mutex_lock_ramlist();
> 
>           WITH_RCU_READ_LOCK_GUARD() {
>               ram_list_init_bitmaps();
>               /* We don't use dirty log with background snapshots */
>               if (!migrate_background_snapshot()) {
>     -            memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>     +            ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
>     +                                                &local_err);
>     +            if (!ret) {
>     +                error_report_err(local_err);
>     +                goto out_unlock;
>     +            }
>                   migration_bitmap_sync_precopy(rs, false);
>               }
>           }
>     +out_unlock:
>           qemu_mutex_unlock_ramlist();
> 
>     +    if (!ret) {
>     +        return;
>     +    }
>     +
>           /*
>            * After an eventual first bitmap sync, fixup the initial bitmap
>            * containing all 1s to exclude any discarded pages from migration.
>     @@ -3631,6 +3644,8 @@ int colo_init_ram_cache(void)
>       void colo_incoming_start_dirty_log(void)
>       {
>           RAMBlock *block = NULL;
>     +    Error *local_err = NULL;
>     +
>           /* For memory_global_dirty_log_start below. */
>           bql_lock();
>           qemu_mutex_lock_ramlist();
>     @@ -3642,7 +3657,10 @@ void colo_incoming_start_dirty_log(void)
>                   /* Discard this dirty bitmap record */
>                   bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
>               }
>     -        memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
>     +        if (!memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION,
>     +                                           &local_err)) {
>     +            error_report_err(local_err);
>     +        }
>           }
>           ram_state->migration_dirty_pages = 0;
>           qemu_mutex_unlock_ramlist();
>     diff --git a/system/memory.c b/system/memory.c
>     index 3600e716149407c10a1f6bf8f0a81c2611cf15ba..cbc098216b789f50460f1d1bc7ec122030693d9e 100644
>     --- a/system/memory.c
>     +++ b/system/memory.c
>     @@ -2931,10 +2931,9 @@ static void memory_global_dirty_log_rollback(MemoryListener *listener,
>           }
>       }
> 
>     -void memory_global_dirty_log_start(unsigned int flags)
>     +bool memory_global_dirty_log_start(unsigned int flags, Error **errp)
>       {
>           unsigned int old_flags;
>     -    Error *local_err = NULL;
> 
>           assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
> 
>     @@ -2946,7 +2945,7 @@ void memory_global_dirty_log_start(unsigned int flags)
> 
>           flags &= ~global_dirty_tracking;
>           if (!flags) {
>     -        return;
>     +        return true;
>           }
> 
>           old_flags = global_dirty_tracking;
>     @@ -2959,7 +2958,7 @@ void memory_global_dirty_log_start(unsigned int flags)
> 
>               QTAILQ_FOREACH(listener, &memory_listeners, link) {
>                   if (listener->log_global_start) {
>     -                ret = listener->log_global_start(listener, &local_err);
>     +                ret = listener->log_global_start(listener, errp);
>                       if (!ret) {
>                           break;
>                       }
>     @@ -2969,14 +2968,14 @@ void memory_global_dirty_log_start(unsigned int flags)
>               if (!ret) {
>                   memory_global_dirty_log_rollback(QTAILQ_PREV(listener, link),
>                                                    flags);
>     -            error_report_err(local_err);
>     -            return;
>     +            return false;
>               }
> 
>               memory_region_transaction_begin();
>               memory_region_update_pending = true;
>               memory_region_transaction_commit();
>           }
>     +    return true;
>       }
> 
>       static void memory_global_dirty_log_do_stop(unsigned int flags)
>     -- 
>     2.44.0
> 
> 
> 
> -- 
> Best regards



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler
  2024-03-18 14:54     ` Cédric Le Goater
@ 2024-03-18 16:27       ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-18 16:27 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand

On Mon, Mar 18, 2024 at 03:54:28PM +0100, Cédric Le Goater wrote:
> On 3/15/24 12:18, Peter Xu wrote:
> > > @@ -3009,13 +3045,16 @@ static void listener_add_address_space(MemoryListener *listener,
> > >   {
> > >       FlatView *view;
> > >       FlatRange *fr;
> > > +    Error *local_err = NULL;
> > >       if (listener->begin) {
> > >           listener->begin(listener);
> > >       }
> > >       if (global_dirty_tracking) {
> > >           if (listener->log_global_start) {
> > > -            listener->log_global_start(listener);
> > > +            if (!listener->log_global_start(listener, &local_err)) {
> > > +                error_report_err(local_err);
> > > +            }
> > IMHO we should assert here instead of error report.  We have this to guard
> > hot-plug during migration so I think the assert is justified:
> > 
> > qdev_device_add_from_qdict():
> > 
> >      if (!migration_is_idle()) {
> >          error_setg(errp, "device_add not allowed while migrating");
> >          return NULL;
> >      }
> > 
> > If it really happens it's a bug, as listener_add_address_space() will still
> > keep the rest things around even if the hook failed.  It'll start to be a
> > total mess..
> 
> It seems that adding a region listener while logging is active has been
> supported from the beginning, commit 7664e80c8470 ("memory: add API for
> observing  updates to the physical memory map"). Can it happen ? if not
> we could simply remove the  log_global_start() call.

IMHO we'd better keep it for the sake of logic completeness, even though I
don't know when it'll be useful..

I think it's safe to assert because log_global_start() should only be
triggered by either vhost/vfio with current code base when reaching here.
It doesn't mean that in the future all log_global_start() hooks are based
on a device object. E.g., there's the other Xen user, it just won't trigger
either, afaict.  So the assert should be safe.

In the future maybe we could allow other things to trigger here besides
device, but obviously we're not ready for failing it.  Instead of adding
the failure handling which will never be used for now, IIUC it's simpler we
just provide an assert until someone add a real user of such.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines
  2024-03-18 16:08     ` Cédric Le Goater
@ 2024-03-18 16:31       ` Peter Xu
  0 siblings, 0 replies; 111+ messages in thread
From: Peter Xu @ 2024-03-18 16:31 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Fabiano Rosas, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit, Stefano Stabellini,
	Anthony Perard, Paul Durrant, Michael S. Tsirkin, Paolo Bonzini,
	David Hildenbrand, Hyman Huang

On Mon, Mar 18, 2024 at 05:08:13PM +0100, Cédric Le Goater wrote:
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2836,18 +2836,31 @@ static void migration_bitmap_clear_discarded_pages(RAMState *rs)
> > >   static void ram_init_bitmaps(RAMState *rs)
> > >   {
> > > +    Error *local_err = NULL;
> > > +    bool ret = true;
> > > +
> > >       qemu_mutex_lock_ramlist();
> > >       WITH_RCU_READ_LOCK_GUARD() {
> > >           ram_list_init_bitmaps();
> 
> btw, should we use bitmap_try_new() to create the bitmaps instead of
> bitmap_new() which can abort() ?

I'm not sure how much it'll help in reality; if allocation can fail here I
would expect qemu crash sooner or later.. but I agree the try_new() seems
reasonable too to be used here if this can fail now, after all migration is
extra feature on top of VM's emulation functions, so it's optional.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
  2024-03-15 14:52                             ` Peter Xu
@ 2024-03-19 10:46                               ` Cédric Le Goater
  0 siblings, 0 replies; 111+ messages in thread
From: Cédric Le Goater @ 2024-03-19 10:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Alex Williamson, Avihai Horon,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Prasad Pandit

On 3/15/24 15:52, Peter Xu wrote:
> On Fri, Mar 15, 2024 at 03:21:27PM +0100, Cédric Le Goater wrote:
>> On 3/15/24 13:20, Cédric Le Goater wrote:
>>> On 3/15/24 12:01, Peter Xu wrote:
>>>> On Fri, Mar 15, 2024 at 11:17:45AM +0100, Cédric Le Goater wrote:
>>>>>> migrate_set_state is also unintuitive because it ignores invalid state
>>>>>> transitions and we've been using that property to deal with special
>>>>>> states such as POSTCOPY_PAUSED and FAILED:
>>>>>>
>>>>>> - After the migration goes into POSTCOPY_PAUSED, the resumed migration's
>>>>>>      migrate_init() will try to set the state NONE->SETUP, which is not
>>>>>>      valid.
>>>>>>
>>>>>> - After save_setup fails, the migration goes into FAILED, but wait_unplug
>>>>>>      will try to transition SETUP->ACTIVE, which is also not valid.
>>>>>>
>>>>>
>>>>> I am not sure I understand what the plan is. Both solutions are problematic
>>>>> regarding the state transitions.
>>>>>
>>>>> Should we consider that waiting for failover devices to unplug is an internal
>>>>> step of the SETUP phase not transitioning to ACTIVE ?
>>>>
>>>> If to unblock this series, IIUC the simplest solution is to do what Fabiano
>>>> suggested, that we move qemu_savevm_wait_unplug() to be before the check of
>>>> setup() ret.
>>>
>>> The simplest is IMHO moving qemu_savevm_wait_unplug() before
>>> qemu_savevm_state_setup() and leave patch 10 is unchanged. See
>>> below the extra patch. It looks much cleaner than what we have
>>> today.
>>>
>>>> In that case, the state change in qemu_savevm_wait_unplug()
>>>> should be benign and we should see a super small window it became ACTIVE
>>>> but then it should be FAILED (and IIUC the patch itself will need to use
>>>> ACTIVE as "old_state", not SETUP anymore).
>>>
>>> OK. I will give it a try to compare.
>>
>> Here's the alternative solution. SETUP state failures are handled after
>> transitioning to ACTIVE state, which is unfortunate but probably harmless.
>> I guess it's OK.
> 
> This also looks good to me, thanks.
> 
> One trivial early comment is in this case we can introduce a helper to
> cover both setup() calls and UNPLUG waits and dedup the two paths.

There is one little difference: qemu_savevm_state_header() is called
earlier in the migration thread, before return-path, postcopy and colo
are advertised on the target. I don't think it can it be moved.

Thanks,

C.



^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2024-03-19 10:47 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
2024-03-07 12:18   ` Fabiano Rosas
2024-03-08  8:11   ` Peter Xu
2024-03-08  8:45   ` Thomas Huth
2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
2024-03-07  9:36   ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
2024-03-07 12:28   ` Fabiano Rosas
2024-03-08  6:59   ` Peter Xu
2024-03-11 15:22     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
2024-03-07 12:28   ` Fabiano Rosas
2024-03-06 13:34 ` [PATCH v4 09/25] migration: Add Error** argument to vmstate_save() Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
2024-03-07 12:45   ` Fabiano Rosas
2024-03-08 12:56   ` Peter Xu
2024-03-08 13:14     ` Cédric Le Goater
2024-03-08 13:39       ` Cédric Le Goater
2024-03-08 13:55         ` Cédric Le Goater
2024-03-08 14:17           ` Peter Xu
2024-03-11 18:12             ` Cédric Le Goater
2024-03-11 20:15               ` Peter Xu
2024-03-08 14:11     ` Fabiano Rosas
2024-03-08 14:36   ` Fabiano Rosas
2024-03-11 18:15     ` Cédric Le Goater
2024-03-11 19:03       ` Fabiano Rosas
2024-03-11 20:10         ` Peter Xu
2024-03-12 13:01           ` Cédric Le Goater
2024-03-12 12:32         ` Cédric Le Goater
2024-03-12 13:34           ` Cédric Le Goater
2024-03-12 14:01             ` Cédric Le Goater
2024-03-12 14:24               ` Fabiano Rosas
2024-03-12 15:18                 ` Peter Xu
2024-03-12 18:06                   ` Cédric Le Goater
2024-03-12 18:28                   ` Fabiano Rosas
2024-03-15 10:17                     ` Cédric Le Goater
2024-03-15 11:01                       ` Peter Xu
2024-03-15 12:20                         ` Cédric Le Goater
2024-03-15 13:09                           ` Peter Xu
2024-03-15 14:30                             ` Cédric Le Goater
2024-03-15 13:11                           ` Peter Xu
2024-03-15 14:31                             ` Cédric Le Goater
2024-03-15 14:57                               ` Peter Xu
2024-03-15 14:21                           ` Cédric Le Goater
2024-03-15 14:52                             ` Peter Xu
2024-03-19 10:46                               ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
2024-03-07  9:53   ` Vladimir Sementsov-Ogievskiy
2024-03-07 10:31     ` Cédric Le Goater
2024-03-07 11:39       ` Vladimir Sementsov-Ogievskiy
2024-03-08  7:11         ` Peter Xu
2024-03-08  8:08           ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
2024-03-15 11:18   ` Peter Xu
2024-03-18 14:33     ` Cédric Le Goater
2024-03-18 14:54     ` Cédric Le Goater
2024-03-18 16:27       ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
2024-03-15 11:34   ` Peter Xu
2024-03-18 10:43     ` Cédric Le Goater
2024-03-18 16:03     ` Cédric Le Goater
2024-03-18 16:08     ` Cédric Le Goater
2024-03-18 16:31       ` Peter Xu
2024-03-16  2:41   ` Yong Huang
2024-03-18 16:19     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
2024-03-07  8:09   ` Eric Auger
2024-03-07 12:06     ` Cédric Le Goater
2024-03-08  7:39       ` Eric Auger
2024-03-08 13:00         ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
2024-03-07  8:15   ` Eric Auger
2024-03-07 13:15     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
2024-03-07  8:53   ` Eric Auger
2024-03-07 14:05     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
2024-03-07  9:04   ` Eric Auger
2024-03-07 13:35     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
2024-03-07  9:13   ` Eric Auger
2024-03-07 13:55     ` Cédric Le Goater
2024-03-08  7:41       ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
2024-03-06 20:51   ` Philippe Mathieu-Daudé
2024-03-07  7:13     ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
2024-03-15 15:06   ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
2024-03-07  9:23   ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
2024-03-07  9:28   ` Eric Auger
2024-03-07 13:36     ` Cédric Le Goater
2024-03-08  7:42       ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
2024-03-07  9:30   ` Eric Auger
2024-03-08  8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
2024-03-08 13:03   ` Cédric Le Goater
2024-03-11 20:24   ` Peter Xu
2024-03-12  7:16     ` Cédric Le Goater
2024-03-12  9:58       ` Cédric Le Goater
2024-03-12 11:50         ` Peter Xu
2024-03-12 12:09           ` Cédric Le Goater
2024-03-12 12:25             ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.