[PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover
@ 2023-08-29 21:42 Peter Xu
  2023-08-29 21:42 ` [PATCH 1/9] migration: Display error in query-migrate irrelevant of status Peter Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

This patchset supersedes below:
[PATCH v2 0/7] migration: Better error handling in return path thread

Another note is that this might conflict with Fabiano's other patchset to
fix postcopy race conditions, but maybe not.  If collapse, I can rebase.
Let me send this out still for early reviews.

I dropped the last patch there (which wasn't clear on being beneficial)
from last version, meanwhile added three more patches to address an issue
reported from our QE team that one postcopy migration can stuck in RECOVER
stage and never got kicked out.  For more information of that problem, one
can refer to the last patch commit message.

Since this one covers more issues, I renamed the subject, and let me
version it from v1.  I still collected most of R-bs from Fabiano since last
version (patches 1-6).

Please have a look, thanks.

Peter Xu (9):
  migration: Display error in query-migrate irrelevant of status
  migration: Let migrate_set_error() take ownership
  migration: Introduce migrate_has_error()
  migration: Refactor error handling in source return path
  migration: Deliver return path file error to migrate state too
  qemufile: Always return a verbose error
  migration: Remember num of ramblocks to sync during recovery
  migration: Add migration_rp_wait|kick()
  migration/postcopy: Allow network to fail even during recovery

 qapi/migration.json      |   5 +-
 migration/migration.h    |  25 ++++-
 migration/qemu-file.h    |   1 +
 migration/ram.h          |   5 +-
 migration/channel.c      |   1 -
 migration/migration.c    | 230 +++++++++++++++++++++++++++------------
 migration/multifd.c      |  10 +-
 migration/postcopy-ram.c |   1 -
 migration/qemu-file.c    |  17 ++-
 migration/ram.c          |  77 +++++++------
 migration/trace-events   |   2 +-
 11 files changed, 248 insertions(+), 126 deletions(-)

-- 
2.41.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] migration: Display error in query-migrate irrelevant of status
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-08-29 21:42 ` [PATCH 2/9] migration: Let migrate_set_error() take ownership Peter Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

Display it as long as being set, irrelevant of FAILED status.  E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.

The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 5 ++---
 migration/migration.c | 8 +++++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..c241b6d318 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -230,9 +230,8 @@
 #     throttled during auto-converge.  This is only present when
 #     auto-converge has started throttling guest cpus.  (Since 2.7)
 #
-# @error-desc: the human readable error description string, when
-#     @status is 'failed'. Clients should not attempt to parse the
-#     error strings.  (Since 2.7)
+# @error-desc: the human readable error description string. Clients
+#     should not attempt to parse the error strings.  (Since 2.7)
 #
 # @postcopy-blocktime: total time when all vCPU were blocked during
 #     postcopy live migration.  This is only present when the
diff --git a/migration/migration.c b/migration/migration.c
index 5528acb65e..c60064d48e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1052,9 +1052,6 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     case MIGRATION_STATUS_FAILED:
         info->has_status = true;
-        if (s->error) {
-            info->error_desc = g_strdup(error_get_pretty(s->error));
-        }
         break;
     case MIGRATION_STATUS_CANCELLED:
         info->has_status = true;
@@ -1064,6 +1061,11 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     }
     info->status = state;
+
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    if (s->error) {
+        info->error_desc = g_strdup(error_get_pretty(s->error));
+    }
 }
 
 static void fill_destination_migration_info(MigrationInfo *info)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/9] migration: Let migrate_set_error() take ownership
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
  2023-08-29 21:42 ` [PATCH 1/9] migration: Display error in query-migrate irrelevant of status Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-09-12 19:40   ` Fabiano Rosas
  2023-08-29 21:42 ` [PATCH 3/9] migration: Introduce migrate_has_error() Peter Xu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

migrate_set_error() used one error_copy() so it always copy an error.
However that's not the major use case - the major use case is one would
like to pass the error to migrate_set_error() without further touching the
error.

It can be proved if we see most of the callers are freeing the error
explicitly right afterwards.  There're a few outliers (only if when the
caller) where we can use error_copy() explicitly there.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  4 ++--
 migration/channel.c      |  1 -
 migration/migration.c    | 22 ++++++++++++++++------
 migration/multifd.c      | 10 ++++------
 migration/postcopy-ram.c |  1 -
 migration/ram.c          |  1 -
 6 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..76e35a5ecf 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
 
 uint64_t migrate_max_downtime(void);
 
-void migrate_set_error(MigrationState *s, const Error *error);
+void migrate_set_error(MigrationState *s, Error *error);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
@@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
-void migration_cancel(const Error *error);
+void migration_cancel(Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
 void reset_vfio_bytes_transferred(void);
diff --git a/migration/channel.c b/migration/channel.c
index ca3319a309..48b3f6abd6 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
         }
     }
     migrate_fd_connect(s, error);
-    error_free(error);
 }
 
 
diff --git a/migration/migration.c b/migration/migration.c
index c60064d48e..0f3ca168ed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -162,7 +162,7 @@ void migration_object_init(void)
     dirty_bitmap_mig_init();
 }
 
-void migration_cancel(const Error *error)
+void migration_cancel(Error *error)
 {
     if (error) {
         migrate_set_error(current_migration, error);
@@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
     object_unref(OBJECT(s));
 }
 
-void migrate_set_error(MigrationState *s, const Error *error)
+/*
+ * Set error for current migration state.  The `error' ownership will be
+ * moved from the caller to MigrationState, so the caller doesn't need to
+ * free the error.
+ *
+ * If the caller still needs to reference the `error' passed in, one should
+ * use error_copy() explicitly.
+ */
+void migrate_set_error(MigrationState *s, Error *error)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
     if (!s->error) {
-        s->error = error_copy(error);
+        /* Record the first error triggered */
+        s->error = error;
+    } else {
+        error_free(error);
     }
 }
 
@@ -1235,7 +1246,7 @@ static void migrate_error_free(MigrationState *s)
     }
 }
 
-static void migrate_fd_error(MigrationState *s, const Error *error)
+static void migrate_fd_error(MigrationState *s, Error *error)
 {
     trace_migrate_fd_error(error_get_pretty(error));
     assert(s->to_dst_file == NULL);
@@ -1703,7 +1714,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         if (!resume_requested) {
             yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         }
-        migrate_fd_error(s, local_err);
+        migrate_fd_error(s, error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
@@ -2626,7 +2637,6 @@ static MigThrError migration_detect_error(MigrationState *s)
 
     if (local_error) {
         migrate_set_error(s, local_error);
-        error_free(local_error);
     }
 
     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..69d56104fb 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -551,7 +551,6 @@ void multifd_save_cleanup(void)
         multifd_send_state->ops->send_cleanup(p, &local_err);
         if (local_err) {
             migrate_set_error(migrate_get_current(), local_err);
-            error_free(local_err);
         }
     }
     qemu_sem_destroy(&multifd_send_state->channels_ready);
@@ -750,7 +749,6 @@ out:
     if (local_err) {
         trace_multifd_send_error(p->id);
         multifd_send_terminate_threads(local_err);
-        error_free(local_err);
     }
 
     /*
@@ -883,7 +881,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
       */
      p->quit = true;
      object_unref(OBJECT(ioc));
-     error_free(err);
 }
 
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
@@ -1148,7 +1145,6 @@ static void *multifd_recv_thread(void *opaque)
 
     if (local_err) {
         multifd_recv_terminate_threads(local_err);
-        error_free(local_err);
     }
     qemu_mutex_lock(&p->mutex);
     p->running = false;
@@ -1240,7 +1236,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate_prepend(errp, local_err,
                                 "failed to receive packet"
                                 " via multifd channel %d: ",
@@ -1253,7 +1250,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     if (p->c != NULL) {
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
-        multifd_recv_terminate_threads(local_err);
+        /* Copy local error because we'll also return it to caller */
+        multifd_recv_terminate_threads(error_copy(local_err));
         error_propagate(errp, local_err);
         return;
     }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 29aea9456d..8a93b5504d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1594,7 +1594,6 @@ postcopy_preempt_send_channel_done(MigrationState *s,
 {
     if (local_err) {
         migrate_set_error(s, local_err);
-        error_free(local_err);
     } else {
         migration_ioc_register_yank(ioc);
         s->postcopy_qemufile_src = qemu_file_new_output(ioc);
diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..fc7fe0e6e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4308,7 +4308,6 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
          */
         error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
         migration_cancel(err);
-        error_free(err);
     }
 
     switch (ps) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/9] migration: Introduce migrate_has_error()
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
  2023-08-29 21:42 ` [PATCH 1/9] migration: Display error in query-migrate irrelevant of status Peter Xu
  2023-08-29 21:42 ` [PATCH 2/9] migration: Let migrate_set_error() take ownership Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-08-29 21:42 ` [PATCH 4/9] migration: Refactor error handling in source return path Peter Xu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

Introduce a helper to detect whether MigrationState.error is set for
whatever reason.  It is intended to not taking the error_mutex here because
neither do we reference the pointer, nor do we modify the pointer.  State
why it's safe to do so.

This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h | 1 +
 migration/migration.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 76e35a5ecf..1ba38eecfa 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -466,6 +466,7 @@ bool  migration_has_all_channels(void);
 uint64_t migrate_max_downtime(void);
 
 void migrate_set_error(MigrationState *s, Error *error);
+bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
diff --git a/migration/migration.c b/migration/migration.c
index 0f3ca168ed..84c17dfbbb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1237,6 +1237,13 @@ void migrate_set_error(MigrationState *s, Error *error)
     }
 }
 
+bool migrate_has_error(MigrationState *s)
+{
+    /* The lock is not helpful here, but still follow the rule */
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    return qatomic_read(&s->error);
+}
+
 static void migrate_error_free(MigrationState *s)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/9] migration: Refactor error handling in source return path
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (2 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 3/9] migration: Introduce migrate_has_error() Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-08-29 21:42 ` [PATCH 5/9] migration: Deliver return path file error to migrate state too Peter Xu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

rp_state.error was a boolean used to show error happened in return path
thread.  That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.

To make this better, a few things done:

  - Use error_setg() rather than error_report() across the whole lifecycle
    of return path thread, keeping the error in an Error*.

  - Use migrate_set_error() to apply that captured error to the global
    migration object when error occured in this thread.

  - With above, no need to have mark_source_rp_bad(), remove it, alongside
    with rp_state.error itself.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |   1 -
 migration/ram.h        |   5 +-
 migration/migration.c  | 122 +++++++++++++++++++++--------------------
 migration/ram.c        |  41 +++++++-------
 migration/trace-events |   2 +-
 5 files changed, 89 insertions(+), 82 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 1ba38eecfa..a5c95e4d43 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -297,7 +297,6 @@ struct MigrationState {
         /* Protected by qemu_file_lock */
         QEMUFile     *from_dst_file;
         QemuThread    rp_thread;
-        bool          error;
         /*
          * We can also check non-zero of rp_thread, but there's no "official"
          * way to do this, so this bool makes it slightly more elegant.
diff --git a/migration/ram.h b/migration/ram.h
index 145c915ca7..14ed666d58 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void);
 void mig_throttle_counter_reset(void);
 
 uint64_t ram_pagesize_summary(void);
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
+                         Error **errp);
 void ram_postcopy_migrated_memory_release(MigrationState *ms);
 /* For outgoing discard bitmap */
 void ram_postcopy_send_discard_bitmap(MigrationState *ms);
@@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
 void postcopy_preempt_shutdown_file(MigrationState *s);
 void *postcopy_preempt_thread(void *opaque);
diff --git a/migration/migration.c b/migration/migration.c
index 84c17dfbbb..def9d119b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1424,7 +1424,6 @@ void migrate_init(MigrationState *s)
     s->to_dst_file = NULL;
     s->state = MIGRATION_STATUS_NONE;
     s->rp_state.from_dst_file = NULL;
-    s->rp_state.error = false;
     s->mbps = 0.0;
     s->pages_per_second = 0.0;
     s->downtime = 0;
@@ -1743,14 +1742,14 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp)
     qemu_sem_post(&s->pause_sem);
 }
 
-/* migration thread support */
-/*
- * Something bad happened to the RP stream, mark an error
- * The caller shall print or trace something to indicate why
- */
-static void mark_source_rp_bad(MigrationState *s)
+void migration_rp_wait(MigrationState *s)
 {
-    s->rp_state.error = true;
+    qemu_sem_wait(&s->rp_state.rp_sem);
+}
+
+void migration_rp_kick(MigrationState *s)
+{
+    qemu_sem_post(&s->rp_state.rp_sem);
 }
 
 static struct rp_cmd_args {
@@ -1774,7 +1773,7 @@ static struct rp_cmd_args {
  * and we don't need to send pages that have already been sent.
  */
 static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
-                                       ram_addr_t start, size_t len)
+                                        ram_addr_t start, size_t len, Error **errp)
 {
     long our_host_ps = qemu_real_host_page_size();
 
@@ -1786,15 +1785,12 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
      */
     if (!QEMU_IS_ALIGNED(start, our_host_ps) ||
         !QEMU_IS_ALIGNED(len, our_host_ps)) {
-        error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
-                     " len: %zd", __func__, start, len);
-        mark_source_rp_bad(ms);
+        error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:"
+                   RAM_ADDR_FMT " len: %zd", start, len);
         return;
     }
 
-    if (ram_save_queue_pages(rbname, start, len)) {
-        mark_source_rp_bad(ms);
-    }
+    ram_save_queue_pages(rbname, start, len, errp);
 }
 
 /* Return true to retry, false to quit */
@@ -1809,26 +1805,28 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
-static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name,
+                                         Error **errp)
 {
     RAMBlock *block = qemu_ram_block_by_name(block_name);
 
     if (!block) {
-        error_report("%s: invalid block name '%s'", __func__, block_name);
+        error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'",
+                   block_name);
         return -EINVAL;
     }
 
     /* Fetch the received bitmap and refresh the dirty bitmap */
-    return ram_dirty_bitmap_reload(s, block);
+    return ram_dirty_bitmap_reload(s, block, errp);
 }
 
-static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+static int migrate_handle_rp_resume_ack(MigrationState *s,
+                                        uint32_t value, Error **errp)
 {
     trace_source_return_path_thread_resume_ack(value);
 
     if (value != MIGRATION_RESUME_ACK_VALUE) {
-        error_report("%s: illegal resume_ack value %"PRIu32,
-                     __func__, value);
+        error_setg(errp, "illegal resume_ack value %"PRIu32, value);
         return -1;
     }
 
@@ -1887,49 +1885,47 @@ static void *source_return_path_thread(void *opaque)
     uint32_t tmp32, sibling_error;
     ram_addr_t start = 0; /* =0 to silence warning */
     size_t  len = 0, expected_len;
+    Error *err = NULL;
     int res;
 
     trace_source_return_path_thread_entry();
     rcu_register_thread();
 
 retry:
-    while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
+    while (!migrate_has_error(ms) && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
+
         header_type = qemu_get_be16(rp);
         header_len = qemu_get_be16(rp);
 
         if (qemu_file_get_error(rp)) {
-            mark_source_rp_bad(ms);
             goto out;
         }
 
         if (header_type >= MIG_RP_MSG_MAX ||
             header_type == MIG_RP_MSG_INVALID) {
-            error_report("RP: Received invalid message 0x%04x length 0x%04x",
-                         header_type, header_len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Received invalid message 0x%04x length 0x%04x",
+                       header_type, header_len);
             goto out;
         }
 
         if ((rp_cmd_args[header_type].len != -1 &&
             header_len != rp_cmd_args[header_type].len) ||
             header_len > sizeof(buf)) {
-            error_report("RP: Received '%s' message (0x%04x) with"
-                         "incorrect length %d expecting %zu",
-                         rp_cmd_args[header_type].name, header_type, header_len,
-                         (size_t)rp_cmd_args[header_type].len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Received '%s' message (0x%04x) with"
+                       "incorrect length %d expecting %zu",
+                       rp_cmd_args[header_type].name, header_type, header_len,
+                       (size_t)rp_cmd_args[header_type].len);
             goto out;
         }
 
         /* We know we've got a valid header by this point */
         res = qemu_get_buffer(rp, buf, header_len);
         if (res != header_len) {
-            error_report("RP: Failed reading data for message 0x%04x"
-                         " read %d expected %d",
-                         header_type, res, header_len);
-            mark_source_rp_bad(ms);
+            error_setg(&err, "Failed reading data for message 0x%04x"
+                       " read %d expected %d",
+                       header_type, res, header_len);
             goto out;
         }
 
@@ -1939,8 +1935,7 @@ retry:
             sibling_error = ldl_be_p(buf);
             trace_source_return_path_thread_shut(sibling_error);
             if (sibling_error) {
-                error_report("RP: Sibling indicated error %d", sibling_error);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "Sibling indicated error %d", sibling_error);
             }
             /*
              * We'll let the main thread deal with closing the RP
@@ -1958,7 +1953,10 @@ retry:
         case MIG_RP_MSG_REQ_PAGES:
             start = ldq_be_p(buf);
             len = ldl_be_p(buf + 8);
-            migrate_handle_rp_req_pages(ms, NULL, start, len);
+            migrate_handle_rp_req_pages(ms, NULL, start, len, &err);
+            if (err) {
+                goto out;
+            }
             break;
 
         case MIG_RP_MSG_REQ_PAGES_ID:
@@ -1973,32 +1971,32 @@ retry:
                 expected_len += tmp32;
             }
             if (header_len != expected_len) {
-                error_report("RP: Req_Page_id with length %d expecting %zd",
-                             header_len, expected_len);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "Req_Page_id with length %d expecting %zd",
+                           header_len, expected_len);
+                goto out;
+            }
+            migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len,
+                                        &err);
+            if (err) {
                 goto out;
             }
-            migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
         case MIG_RP_MSG_RECV_BITMAP:
             if (header_len < 1) {
-                error_report("%s: missing block name", __func__);
-                mark_source_rp_bad(ms);
+                error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block name");
                 goto out;
             }
             /* Format: len (1B) + idstr (<255B). This ends the idstr. */
             buf[buf[0] + 1] = '\0';
-            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
-                mark_source_rp_bad(ms);
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) {
                 goto out;
             }
             break;
 
         case MIG_RP_MSG_RESUME_ACK:
             tmp32 = ldl_be_p(buf);
-            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
-                mark_source_rp_bad(ms);
+            if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) {
                 goto out;
             }
             break;
@@ -2014,6 +2012,19 @@ retry:
     }
 
 out:
+    if (err) {
+        /*
+         * Collect any error in return-path thread and report it to the
+         * migration state object.
+         */
+        migrate_set_error(ms, err);
+        /*
+         * We lost ownership to Error*, clear it, prepared to capture the
+         * next error.
+         */
+        err = NULL;
+    }
+
     res = qemu_file_get_error(rp);
     if (res) {
         if (res && migration_in_postcopy()) {
@@ -2029,13 +2040,11 @@ out:
                  * it's reset only by us above, or when migration completes
                  */
                 rp = ms->rp_state.from_dst_file;
-                ms->rp_state.error = false;
                 goto retry;
             }
         }
 
         trace_source_return_path_thread_bad_end();
-        mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
@@ -2068,8 +2077,7 @@ static int open_return_path_on_source(MigrationState *ms,
     return 0;
 }
 
-/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
-static int await_return_path_close_on_source(MigrationState *ms)
+static void await_return_path_close_on_source(MigrationState *ms)
 {
     /*
      * If this is a normal exit then the destination will send a SHUT and the
@@ -2082,13 +2090,11 @@ static int await_return_path_close_on_source(MigrationState *ms)
          * waiting for the destination.
          */
         qemu_file_shutdown(ms->rp_state.from_dst_file);
-        mark_source_rp_bad(ms);
     }
     trace_await_return_path_close_on_source_joining();
     qemu_thread_join(&ms->rp_state.rp_thread);
     ms->rp_state.rp_thread_created = false;
     trace_await_return_path_close_on_source_close();
-    return ms->rp_state.error;
 }
 
 static inline void
@@ -2391,11 +2397,11 @@ static void migration_completion(MigrationState *s)
      * a SHUT command).
      */
     if (s->rp_state.rp_thread_created) {
-        int rp_error;
         trace_migration_return_path_end_before();
-        rp_error = await_return_path_close_on_source(s);
-        trace_migration_return_path_end_after(rp_error);
-        if (rp_error) {
+        await_return_path_close_on_source(s);
+        trace_migration_return_path_end_after();
+        /* If return path has error, should have been set here */
+        if (migrate_has_error(s)) {
             goto fail;
         }
     }
diff --git a/migration/ram.c b/migration/ram.c
index fc7fe0e6e8..814c59c17b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1963,7 +1963,8 @@ static void migration_page_queue_free(RAMState *rs)
  * @start: starting address from the start of the RAMBlock
  * @len: length (in bytes) to send
  */
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
+                         Error **errp)
 {
     RAMBlock *ramblock;
     RAMState *rs = ram_state;
@@ -1980,7 +1981,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
              * Shouldn't happen, we can't reuse the last RAMBlock if
              * it's the 1st request.
              */
-            error_report("ram_save_queue_pages no previous block");
+            error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block");
             return -1;
         }
     } else {
@@ -1988,16 +1989,17 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
 
         if (!ramblock) {
             /* We shouldn't be asked for a non-existent RAMBlock */
-            error_report("ram_save_queue_pages no block '%s'", rbname);
+            error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname);
             return -1;
         }
         rs->last_req_rb = ramblock;
     }
     trace_ram_save_queue_pages(ramblock->idstr, start, len);
     if (!offset_in_ramblock(ramblock, start + len - 1)) {
-        error_report("%s request overrun start=" RAM_ADDR_FMT " len="
-                     RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
-                     __func__, start, len, ramblock->used_length);
+        error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, "
+                   "start=" RAM_ADDR_FMT " len="
+                   RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT,
+                   start, len, ramblock->used_length);
         return -1;
     }
 
@@ -2029,9 +2031,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len)
         assert(len % page_size == 0);
         while (len) {
             if (ram_save_host_page_urgent(pss)) {
-                error_report("%s: ram_save_host_page_urgent() failed: "
-                             "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
-                             __func__, ramblock->idstr, start);
+                error_setg(errp, "ram_save_host_page_urgent() failed: "
+                           "ramblock=%s, start_addr=0x"RAM_ADDR_FMT,
+                           ramblock->idstr, start);
                 ret = -1;
                 break;
             }
@@ -4165,7 +4167,7 @@ static void ram_dirty_bitmap_reload_notify(MigrationState *s)
  * This is only used when the postcopy migration is paused but wants
  * to resume from a middle point.
  */
-int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
 {
     int ret = -EINVAL;
     /* from_dst_file is always valid because we're within rp_thread */
@@ -4177,8 +4179,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
     if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
-        error_report("%s: incorrect state %s", __func__,
-                     MigrationStatus_str(s->state));
+        error_setg(errp, "Reload bitmap in incorrect state %s",
+                   MigrationStatus_str(s->state));
         return -EINVAL;
     }
 
@@ -4195,9 +4197,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     /* The size of the bitmap should match with our ramblock */
     if (size != local_size) {
-        error_report("%s: ramblock '%s' bitmap size mismatch "
-                     "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
-                     block->idstr, size, local_size);
+        error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64
+                   " != 0x%"PRIx64")", block->idstr, size, local_size);
         ret = -EINVAL;
         goto out;
     }
@@ -4207,16 +4208,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     ret = qemu_file_get_error(file);
     if (ret || size != local_size) {
-        error_report("%s: read bitmap failed for ramblock '%s': %d"
-                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
-                     __func__, block->idstr, ret, local_size, size);
+        error_setg(errp, "read bitmap failed for ramblock '%s': %d"
+                   " (size 0x%"PRIx64", got: 0x%"PRIx64")",
+                   block->idstr, ret, local_size, size);
         ret = -EIO;
         goto out;
     }
 
     if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
-        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
-                     __func__, block->idstr, end_mark);
+        error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64,
+                   block->idstr, end_mark);
         ret = -EINVAL;
         goto out;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 4666f19325..20cd17ffe8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -164,7 +164,7 @@ migration_completion_postcopy_end_after_complete(void) ""
 migration_rate_limit_pre(int ms) "%d ms"
 migration_rate_limit_post(int urgent) "urgent: %d"
 migration_return_path_end_before(void) ""
-migration_return_path_end_after(int rp_error) "%d"
+migration_return_path_end_after(void) ""
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/9] migration: Deliver return path file error to migrate state too
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (3 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 4/9] migration: Refactor error handling in source return path Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-08-29 21:42 ` [PATCH 6/9] qemufile: Always return a verbose error Peter Xu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

We've already did this for most of the return path thread errors, but not
yet for the IO errors happened on the return path qemufile.  Do that too.

Remember to reset "err" always, because the ownership is not us anymore,
otherwise we're prone to use-after-free later after recovered.

Re-export qemu_file_get_error_obj().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.h | 1 +
 migration/migration.c | 7 +++++++
 migration/qemu-file.c | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 47015f5201..bc6edc5c39 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -129,6 +129,7 @@ void qemu_file_skip(QEMUFile *f, int size);
 void qemu_file_credit_transfer(QEMUFile *f, size_t size);
 int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
 void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
+int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index def9d119b1..576e102319 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2027,6 +2027,13 @@ out:
 
     res = qemu_file_get_error(rp);
     if (res) {
+        /* We have forwarded any error in "err" already, reuse "error" */
+        assert(err == NULL);
+        /* Try to deliver this file error to migration state */
+        qemu_file_get_error_obj(rp, &err);
+        migrate_set_error(ms, err);
+        err = NULL;
+
         if (res && migration_in_postcopy()) {
             /*
              * Maybe there is something we can do: it looks like a
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 19c33c9985..eea7171192 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -146,7 +146,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
  * is not 0.
  *
  */
-static int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
+int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
     if (errp) {
         *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/9] qemufile: Always return a verbose error
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (4 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 5/9] migration: Deliver return path file error to migrate state too Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-08-29 21:42 ` [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery Peter Xu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

There're a lot of cases where we only have an errno set in last_error but
without a detailed error description.  When this happens, try to generate
an error contains the errno as a descriptive error.

This will be helpful in cases where one relies on the Error*.  E.g.,
migration state only caches Error* in MigrationState.error.  With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index eea7171192..3e64e900c9 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
  *
  * Return negative error value if there has been an error on previous
  * operations, return 0 if no error happened.
- * Optional, it returns Error* in errp, but it may be NULL even if return value
- * is not 0.
  *
+ * If errp is specified, a verbose error message will be copied over.
  */
 int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
+    if (!f->last_error) {
+        return 0;
+    }
+
+    /* There is an error */
     if (errp) {
-        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
+        if (f->last_error_obj) {
+            *errp = error_copy(f->last_error_obj);
+        } else {
+            error_setg_errno(errp, -f->last_error, "Channel error");
+        }
     }
+
     return f->last_error;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (5 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 6/9] qemufile: Always return a verbose error Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-09-12  0:33   ` Fabiano Rosas
  2023-08-29 21:42 ` [PATCH 8/9] migration: Add migration_rp_wait|kick() Peter Xu
  2023-08-29 21:42 ` [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery Peter Xu
  8 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.

rp_sem will be further reused as a way to kick the main thread, e.g., on
recovery failures.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 814c59c17b..a9541c60b4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -394,6 +394,14 @@ struct RAMState {
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
     QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+    /*
+     * This is only used when postcopy is in recovery phase, to communicate
+     * between the migration thread and the return path thread on dirty
+     * bitmap synchronizations.  This field is unused in other stages of
+     * RAM migration.
+     */
+    unsigned int postcopy_bmap_sync_requested;
 };
 typedef struct RAMState RAMState;
 
@@ -4135,20 +4143,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
 {
     RAMBlock *block;
     QEMUFile *file = s->to_dst_file;
-    int ramblock_count = 0;
 
     trace_ram_dirty_bitmap_sync_start();
 
+    qatomic_set(&rs->postcopy_bmap_sync_requested, 0);
     RAMBLOCK_FOREACH_NOT_IGNORED(block) {
         qemu_savevm_send_recv_bitmap(file, block->idstr);
         trace_ram_dirty_bitmap_request(block->idstr);
-        ramblock_count++;
+        qatomic_inc(&rs->postcopy_bmap_sync_requested);
     }
 
     trace_ram_dirty_bitmap_sync_wait();
 
     /* Wait until all the ramblocks' dirty bitmap synced */
-    while (ramblock_count--) {
+    while (qatomic_read(&rs->postcopy_bmap_sync_requested)) {
         qemu_sem_wait(&s->rp_state.rp_sem);
     }
 
@@ -4175,6 +4183,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
     unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
     uint64_t local_size = DIV_ROUND_UP(nbits, 8);
     uint64_t size, end_mark;
+    RAMState *rs = ram_state;
 
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
@@ -4240,6 +4249,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
     /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). */
     trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
+    qatomic_dec(&rs->postcopy_bmap_sync_requested);
+
     /*
      * We succeeded to sync bitmap for current ramblock. If this is
      * the last one to sync, we need to notify the main send thread.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/9] migration: Add migration_rp_wait|kick()
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (6 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-09-12  0:32   ` Fabiano Rosas
  2023-08-29 21:42 ` [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery Peter Xu
  8 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela

It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used.  Prepared to be used even for other things.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h | 15 +++++++++++++++
 migration/migration.c |  4 ++--
 migration/ram.c       | 16 +++++++---------
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index a5c95e4d43..b6de78dbdd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -304,6 +304,12 @@ struct MigrationState {
          * be cleared in the rp_thread!
          */
         bool          rp_thread_created;
+        /*
+         * Used to synchonize between migration main thread and return path
+         * thread.  The migration thread can wait() on this sem, while
+         * other threads (e.g., return path thread) can kick it using a
+         * post().
+         */
         QemuSemaphore rp_sem;
         /*
          * We post to this when we got one PONG from dest. So far it's an
@@ -516,4 +522,13 @@ void populate_vfio_info(MigrationInfo *info);
 void reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+/* Migration thread waiting for return path thread. */
+void migration_rp_wait(MigrationState *s);
+/*
+ * Kick the migration thread waiting for return path messages.  NOTE: the
+ * name can be slightly confusing (when read as "kick the rp thread"), just
+ * to remember the target is always the migration thread.
+ */
+void migration_rp_kick(MigrationState *s);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 576e102319..3a5f324781 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1835,7 +1835,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     /* Notify send thread that time to continue send pages */
-    qemu_sem_post(&s->rp_state.rp_sem);
+    migration_rp_kick(s);
 
     return 0;
 }
@@ -2503,7 +2503,7 @@ static int postcopy_resume_handshake(MigrationState *s)
     qemu_savevm_send_postcopy_resume(s->to_dst_file);
 
     while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
-        qemu_sem_wait(&s->rp_state.rp_sem);
+        migration_rp_wait(s);
     }
 
     if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
diff --git a/migration/ram.c b/migration/ram.c
index a9541c60b4..b5f6d65d84 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4157,7 +4157,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
 
     /* Wait until all the ramblocks' dirty bitmap synced */
     while (qatomic_read(&rs->postcopy_bmap_sync_requested)) {
-        qemu_sem_wait(&s->rp_state.rp_sem);
+        migration_rp_wait(s);
     }
 
     trace_ram_dirty_bitmap_sync_complete();
@@ -4165,11 +4165,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
     return 0;
 }
 
-static void ram_dirty_bitmap_reload_notify(MigrationState *s)
-{
-    qemu_sem_post(&s->rp_state.rp_sem);
-}
-
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -4252,10 +4247,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp)
     qatomic_dec(&rs->postcopy_bmap_sync_requested);
 
     /*
-     * We succeeded to sync bitmap for current ramblock. If this is
-     * the last one to sync, we need to notify the main send thread.
+     * We succeeded to sync bitmap for current ramblock. Always kick the
+     * migration thread to check whether all requested bitmaps are
+     * reloaded.  NOTE: it's racy to only kick when requested==0, because
+     * we don't know whether the migration thread may still be increasing
+     * it.
      */
-    ram_dirty_bitmap_reload_notify(s);
+    migration_rp_kick(s);
 
     ret = 0;
 out:
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
                   ` (7 preceding siblings ...)
  2023-08-29 21:42 ` [PATCH 8/9] migration: Add migration_rp_wait|kick() Peter Xu
@ 2023-08-29 21:42 ` Peter Xu
  2023-09-12  0:31   ` Fabiano Rosas
  8 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2023-08-29 21:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Fabiano Rosas, peterx, Juan Quintela, Xiaohui Li

Normally the postcopy recover phase should only exist for a super short
period, that's the duration when QEMU is trying to recover from an
interrupted postcopy migration, during which handshake will be carried out
for continuing the procedure with state changes from PAUSED -> RECOVER ->
POSTCOPY_ACTIVE again.

Here RECOVER phase should be super small, that happens right after the
admin specified a new but working network link for QEMU to reconnect to
dest QEMU.

However there can still be case where the channel is broken in this small
RECOVER window.

If it happens, with current code there's no way the src QEMU can got kicked
out of RECOVER stage. No way either to retry the recover in another channel
when established.

This patch allows the RECOVER phase to fail itself too - we're mostly
ready, just some small things missing, e.g. properly kick the main
migration thread out when sleeping on rp_sem when we found that we're at
RECOVER stage.  When this happens, it fails the RECOVER itself, and
rollback to PAUSED stage.  Then the user can retry another round of
recovery.

To make it even stronger, teach QMP command migrate-pause to explicitly
kick src/dst QEMU out when needed, so even if for some reason the migration
thread didn't got kicked out already by a failing rethrn-path thread, the
admin can also kick it out.

This will be an super, super corner case, but still try to cover that.

One can try to test this with two proxy channels for migration:

  (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000
  (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock

So the migration channel will be:

                      (a)          (b)
  src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst

Then to make QEMU hang at RECOVER stage, one can do below:

  (1) stop the postcopy using QMP command postcopy-pause
  (2) kill the 2nd proxy (b)
  (3) try to recover the postcopy using /tmp/src.sock on src
  (4) src QEMU will go into RECOVER stage but won't be able to continue
      from there, because the channel is actually broken at (b)

Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
without a way to kick the QEMU out or continue the postcopy again.  After
this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.

Admin can also kick QEMU from (4) into PAUSED when needed using
migrate-pause when needed.

After bouncing back to PAUSED stage, one can recover again.

Reported-by: Xiaohui Li <xiaohli@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  8 ++++--
 migration/migration.c | 64 +++++++++++++++++++++++++++++++++++++++----
 migration/ram.c       |  4 ++-
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index b6de78dbdd..e86d9d098a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -482,6 +482,7 @@ void migrate_init(MigrationState *s);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
+bool migration_postcopy_is_alive(void);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_get_total_transferred_pages(void);
@@ -522,8 +523,11 @@ void populate_vfio_info(MigrationInfo *info);
 void reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
-/* Migration thread waiting for return path thread. */
-void migration_rp_wait(MigrationState *s);
+/*
+ * Migration thread waiting for return path thread.  Return non-zero if an
+ * error is detected.
+ */
+int migration_rp_wait(MigrationState *s);
 /*
  * Kick the migration thread waiting for return path messages.  NOTE: the
  * name can be slightly confusing (when read as "kick the rp thread"), just
diff --git a/migration/migration.c b/migration/migration.c
index 3a5f324781..85462ff1d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1349,6 +1349,19 @@ bool migration_in_postcopy(void)
     }
 }
 
+bool migration_postcopy_is_alive(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
+        return true;
+    default:
+        return false;
+    }
+}
+
 bool migration_in_postcopy_after_devices(MigrationState *s)
 {
     return migration_in_postcopy() && s->postcopy_after_devices;
@@ -1540,18 +1553,31 @@ void qmp_migrate_pause(Error **errp)
     MigrationIncomingState *mis = migration_incoming_get_current();
     int ret;
 
-    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+    if (migration_postcopy_is_alive()) {
         /* Source side, during postcopy */
+        Error *error = NULL;
+
+        /* Tell the core migration that we're pausing */
+        error_setg(&error, "Postcopy migration is paused by the user");
+        migrate_set_error(ms, error);
+
         qemu_mutex_lock(&ms->qemu_file_lock);
         ret = qemu_file_shutdown(ms->to_dst_file);
         qemu_mutex_unlock(&ms->qemu_file_lock);
         if (ret) {
             error_setg(errp, "Failed to pause source migration");
         }
+
+        /*
+         * Kick the migration thread out of any waiting windows (on behalf
+         * of the rp thread).
+         */
+        migration_rp_kick(ms);
+
         return;
     }
 
-    if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+    if (migration_postcopy_is_alive()) {
         ret = qemu_file_shutdown(mis->from_src_file);
         if (ret) {
             error_setg(errp, "Failed to pause destination migration");
@@ -1560,7 +1586,7 @@ void qmp_migrate_pause(Error **errp)
     }
 
     error_setg(errp, "migrate-pause is currently only supported "
-               "during postcopy-active state");
+               "during postcopy-active or postcopy-recover state");
 }
 
 bool migration_is_blocked(Error **errp)
@@ -1742,9 +1768,21 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp)
     qemu_sem_post(&s->pause_sem);
 }
 
-void migration_rp_wait(MigrationState *s)
+int migration_rp_wait(MigrationState *s)
 {
+    /* If migration has failure already, ignore the wait */
+    if (migrate_has_error(s)) {
+        return -1;
+    }
+
     qemu_sem_wait(&s->rp_state.rp_sem);
+
+    /* After wait, double check that there's no failure */
+    if (migrate_has_error(s)) {
+        return -1;
+    }
+
+    return 0;
 }
 
 void migration_rp_kick(MigrationState *s)
@@ -1798,6 +1836,20 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
 {
     trace_postcopy_pause_return_path();
 
+    if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        /*
+         * this will be extremely unlikely: that we got yet another network
+         * issue during recovering of the 1st network failure.. during this
+         * period the main migration thread can be waiting on rp_sem for
+         * this thread to sync with the other side.
+         *
+         * When this happens, explicitly kick the migration thread out of
+         * RECOVER stage and back to PAUSED, so the admin can try
+         * everything again.
+         */
+        migration_rp_kick(s);
+    }
+
     qemu_sem_wait(&s->postcopy_pause_rp_sem);
 
     trace_postcopy_pause_return_path_continued();
@@ -2503,7 +2555,9 @@ static int postcopy_resume_handshake(MigrationState *s)
     qemu_savevm_send_postcopy_resume(s->to_dst_file);
 
     while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
-        migration_rp_wait(s);
+        if (migration_rp_wait(s)) {
+            return -1;
+        }
     }
 
     if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
diff --git a/migration/ram.c b/migration/ram.c
index b5f6d65d84..199fd3e117 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4157,7 +4157,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
 
     /* Wait until all the ramblocks' dirty bitmap synced */
     while (qatomic_read(&rs->postcopy_bmap_sync_requested)) {
-        migration_rp_wait(s);
+        if (migration_rp_wait(s)) {
+            return -1;
+        }
     }
 
     trace_ram_dirty_bitmap_sync_complete();
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-08-29 21:42 ` [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery Peter Xu
@ 2023-09-12  0:31   ` Fabiano Rosas
  2023-09-12 20:05     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Fabiano Rosas @ 2023-09-12  0:31 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: peterx, Juan Quintela, Xiaohui Li

Peter Xu <peterx@redhat.com> writes:

Hi, sorry it took me so long to get to this.

> Normally the postcopy recover phase should only exist for a super short
> period, that's the duration when QEMU is trying to recover from an
> interrupted postcopy migration, during which handshake will be carried out
> for continuing the procedure with state changes from PAUSED -> RECOVER ->
> POSTCOPY_ACTIVE again.
>
> Here RECOVER phase should be super small, that happens right after the
> admin specified a new but working network link for QEMU to reconnect to
> dest QEMU.
>
> However there can still be case where the channel is broken in this small
> RECOVER window.
>
> If it happens, with current code there's no way the src QEMU can got kicked
> out of RECOVER stage. No way either to retry the recover in another channel
> when established.
>
> This patch allows the RECOVER phase to fail itself too - we're mostly
> ready, just some small things missing, e.g. properly kick the main
> migration thread out when sleeping on rp_sem when we found that we're at
> RECOVER stage.  When this happens, it fails the RECOVER itself, and
> rollback to PAUSED stage.  Then the user can retry another round of
> recovery.
>
> To make it even stronger, teach QMP command migrate-pause to explicitly
> kick src/dst QEMU out when needed, so even if for some reason the migration
> thread didn't got kicked out already by a failing rethrn-path thread, the
> admin can also kick it out.
>
> This will be an super, super corner case, but still try to cover that.

It would be nice to have a test for this. Being such a corner case, it
will be hard to keep this scenario working.

I wrote two tests[1] that do the recovery each using a different URI:
1) fd: using a freshly opened file,
2) fd: using a socketpair that simply has nothing on the other end.

These might be too far from the original bug, but it seems to exercise
some of the same paths:

Scenario 1:
/x86_64/migration/postcopy/recovery/fail-twice

the stacks are:

Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"):
 qemu_sem_wait
 ram_dirty_bitmap_sync_all
 ram_resume_prepare
 qemu_savevm_state_resume_prepare
 postcopy_do_resume
 postcopy_pause
 migration_detect_error
 migration_thread

Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"):
 qemu_sem_wait
 postcopy_pause_return_path_thread
 source_return_path_thread

This patch seems to fix it, although we cannot call qmp_migrate_recover
a second time because the mis state is now in RECOVER:

  "Migrate recover can only be run when postcopy is paused."

Do we maybe need to return the state to PAUSED, or allow
qmp_migrate_recover to run in RECOVER, like you did on the src side?


Scenario 2:
/x86_64/migration/postcopy/recovery/fail-twice/rp

Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"):
 qemu_sem_wait
 ram_dirty_bitmap_sync_all
 ram_resume_prepare
 qemu_savevm_state_resume_prepare
 postcopy_do_resume
 postcopy_pause
 migration_detect_error
 migration_thread

Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"):
 recvmsg
 qio_channel_socket_readv
 qio_channel_readv_full
 qio_channel_read
 qemu_fill_buffer
 qemu_peek_byte
 qemu_get_byte
 qemu_get_be16
 source_return_path_thread

Here, with this patch the migration gets stuck unless we call
migrate_pause() one more time. After another round of migrate_pause +
recover, it finishes properly.


1- hacked-together test:
-->8--
From a34685c8795799350665a880cd2ddddbf53c5812 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <farosas@suse.de>
Date: Mon, 11 Sep 2023 20:45:33 -0300
Subject: [PATCH] test patch

---
 tests/qtest/migration-test.c | 87 ++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 1b43df5ca7..4d9d2209c1 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -695,6 +695,7 @@ typedef struct {
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
+    int postcopy_recovery_method;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1357,6 +1358,61 @@ static void test_postcopy_preempt_tls_psk(void)
 }
 #endif
 
+static void postcopy_recover_fail(QTestState *from, QTestState *to, int method)
+{
+    int src, dst;
+
+    if (method == 1) {
+        /* give it some random fd to recover */
+        g_autofree char *uri = g_strdup_printf("%s/noop", tmpfs);
+        src = dst = open(uri, O_CREAT|O_RDWR);
+    } else if (method == 2) {
+        int ret, pair1[2], pair2[2];
+
+        /* create two unrelated socketpairs */
+        ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
+        g_assert_cmpint(ret, ==, 0);
+
+        ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2);
+        g_assert_cmpint(ret, ==, 0);
+
+        /* give the guests unpaired ends of the sockets */
+        src = pair1[0];
+        dst = pair2[0];
+    }
+
+    qtest_qmp_fds_assert_success(to, &src, 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
+
+    qtest_qmp_fds_assert_success(from, &dst, 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
+
+    migrate_recover(to, "fd:fd-mig");
+
+    wait_for_migration_status(from, "postcopy-paused",
+                              (const char * []) { "failed", "active",
+                                  "completed", NULL });
+    migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
+
+    printf("WAIT\n");
+    if (method == 2) {
+        /* This would be issued by the admin upon noticing the hang */
+        migrate_pause(from);
+    }
+
+    wait_for_migration_status(from, "postcopy-paused",
+                              (const char * []) { "failed", "active",
+                                  "completed", NULL });
+    printf("PAUSED\n");
+
+    close(src);
+    if (method == 2) {
+        close(dst);
+    }
+}
+
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
     QTestState *from, *to;
@@ -1396,6 +1452,13 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
                               (const char * []) { "failed", "active",
                                                   "completed", NULL });
 
+    if (args->postcopy_recovery_method) {
+        /* fail the recovery */
+        postcopy_recover_fail(from, to, args->postcopy_recovery_method);
+
+        /* continue with a good recovery */
+    }
+
     /*
      * Create a new socket to emulate a new channel that is different
      * from the broken migration channel; tell the destination to
@@ -1435,6 +1498,24 @@ static void test_postcopy_recovery_compress(void)
     test_postcopy_recovery_common(&args);
 }
 
+static void test_postcopy_recovery_fail(void)
+{
+    MigrateCommon args = {
+        .postcopy_recovery_method = 1,
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
+static void test_postcopy_recovery_fail_rp(void)
+{
+    MigrateCommon args = {
+        .postcopy_recovery_method = 2,
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_recovery_tls_psk(void)
 {
@@ -2825,6 +2906,12 @@ int main(int argc, char **argv)
             qtest_add_func("/migration/postcopy/recovery/compress/plain",
                            test_postcopy_recovery_compress);
         }
+        qtest_add_func("/migration/postcopy/recovery/fail-twice",
+                       test_postcopy_recovery_fail);
+
+        qtest_add_func("/migration/postcopy/recovery/fail-twice/rp",
+                       test_postcopy_recovery_fail_rp);
+
     }
 
     qtest_add_func("/migration/bad_dest", test_baddest);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] migration: Add migration_rp_wait|kick()
  2023-08-29 21:42 ` [PATCH 8/9] migration: Add migration_rp_wait|kick() Peter Xu
@ 2023-09-12  0:32   ` Fabiano Rosas
  0 siblings, 0 replies; 19+ messages in thread
From: Fabiano Rosas @ 2023-09-12  0:32 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: peterx, Juan Quintela

Peter Xu <peterx@redhat.com> writes:

> It's just a simple wrapper for rp_sem on either wait() or kick(), make it
> even clearer on how it is used.  Prepared to be used even for other things.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.h | 15 +++++++++++++++
>  migration/migration.c |  4 ++--
>  migration/ram.c       | 16 +++++++---------
>  3 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index a5c95e4d43..b6de78dbdd 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -304,6 +304,12 @@ struct MigrationState {
>           * be cleared in the rp_thread!
>           */
>          bool          rp_thread_created;
> +        /*
> +         * Used to synchonize between migration main thread and return path

synchronize

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery
  2023-08-29 21:42 ` [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery Peter Xu
@ 2023-09-12  0:33   ` Fabiano Rosas
  0 siblings, 0 replies; 19+ messages in thread
From: Fabiano Rosas @ 2023-09-12  0:33 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: peterx, Juan Quintela

Peter Xu <peterx@redhat.com> writes:

> Instead of only relying on the count of rp_sem, make the counter be part of
> RAMState so it can be used in both threads to synchronize on the process.
>
> rp_sem will be further reused as a way to kick the main thread, e.g., on
> recovery failures.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] migration: Let migrate_set_error() take ownership
  2023-08-29 21:42 ` [PATCH 2/9] migration: Let migrate_set_error() take ownership Peter Xu
@ 2023-09-12 19:40   ` Fabiano Rosas
  2023-09-12 20:14     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Fabiano Rosas @ 2023-09-12 19:40 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: peterx, Juan Quintela

Peter Xu <peterx@redhat.com> writes:

> migrate_set_error() used one error_copy() so it always copy an error.
> However that's not the major use case - the major use case is one would
> like to pass the error to migrate_set_error() without further touching the
> error.
>
> It can be proved if we see most of the callers are freeing the error
> explicitly right afterwards.  There're a few outliers (only if when the
> caller) where we can use error_copy() explicitly there.
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.h    |  4 ++--
>  migration/channel.c      |  1 -
>  migration/migration.c    | 22 ++++++++++++++++------
>  migration/multifd.c      | 10 ++++------
>  migration/postcopy-ram.c |  1 -
>  migration/ram.c          |  1 -
>  6 files changed, 22 insertions(+), 17 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index 6eea18db36..76e35a5ecf 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
>  
>  uint64_t migrate_max_downtime(void);
>  
> -void migrate_set_error(MigrationState *s, const Error *error);
> +void migrate_set_error(MigrationState *s, Error *error);
>  
>  void migrate_fd_connect(MigrationState *s, Error *error_in);
>  
> @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
>  void migration_make_urgent_request(void);
>  void migration_consume_urgent_request(void);
>  bool migration_rate_limit(void);
> -void migration_cancel(const Error *error);
> +void migration_cancel(Error *error);
>  
>  void populate_vfio_info(MigrationInfo *info);
>  void reset_vfio_bytes_transferred(void);
> diff --git a/migration/channel.c b/migration/channel.c
> index ca3319a309..48b3f6abd6 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
>          }
>      }
>      migrate_fd_connect(s, error);
> -    error_free(error);
>  }
>  
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index c60064d48e..0f3ca168ed 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -162,7 +162,7 @@ void migration_object_init(void)
>      dirty_bitmap_mig_init();
>  }
>  
> -void migration_cancel(const Error *error)
> +void migration_cancel(Error *error)
>  {
>      if (error) {
>          migrate_set_error(current_migration, error);
> @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
>      object_unref(OBJECT(s));
>  }
>  
> -void migrate_set_error(MigrationState *s, const Error *error)
> +/*
> + * Set error for current migration state.  The `error' ownership will be
> + * moved from the caller to MigrationState, so the caller doesn't need to
> + * free the error.
> + *
> + * If the caller still needs to reference the `error' passed in, one should
> + * use error_copy() explicitly.
> + */
> +void migrate_set_error(MigrationState *s, Error *error)
>  {
>      QEMU_LOCK_GUARD(&s->error_mutex);
>      if (!s->error) {
> -        s->error = error_copy(error);
> +        /* Record the first error triggered */
> +        s->error = error;
> +    } else {
> +        error_free(error);

This will conflict logically with 908927db28 ("migration: Update error
description whenever migration fails") which does:

+            migrate_set_error(s, local_err);
+            error_report_err(local_err);

both functions may now try to free the error.


I'm working on top of this series to try to get rid of all of those
qemu_file_set_error() we have. I'm trying to use migrate_set_error()
whenever possible and only set f->last_error at the very bottom IO
functions.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-09-12  0:31   ` Fabiano Rosas
@ 2023-09-12 20:05     ` Peter Xu
  2023-09-12 22:16       ` Peter Xu
  2023-09-12 22:49       ` Fabiano Rosas
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Xu @ 2023-09-12 20:05 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Xiaohui Li

On Mon, Sep 11, 2023 at 09:31:51PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> Hi, sorry it took me so long to get to this.

Not a problem.

> 
> > Normally the postcopy recover phase should only exist for a super short
> > period, that's the duration when QEMU is trying to recover from an
> > interrupted postcopy migration, during which handshake will be carried out
> > for continuing the procedure with state changes from PAUSED -> RECOVER ->
> > POSTCOPY_ACTIVE again.
> >
> > Here RECOVER phase should be super small, that happens right after the
> > admin specified a new but working network link for QEMU to reconnect to
> > dest QEMU.
> >
> > However there can still be case where the channel is broken in this small
> > RECOVER window.
> >
> > If it happens, with current code there's no way the src QEMU can got kicked
> > out of RECOVER stage. No way either to retry the recover in another channel
> > when established.
> >
> > This patch allows the RECOVER phase to fail itself too - we're mostly
> > ready, just some small things missing, e.g. properly kick the main
> > migration thread out when sleeping on rp_sem when we found that we're at
> > RECOVER stage.  When this happens, it fails the RECOVER itself, and
> > rollback to PAUSED stage.  Then the user can retry another round of
> > recovery.
> >
> > To make it even stronger, teach QMP command migrate-pause to explicitly
> > kick src/dst QEMU out when needed, so even if for some reason the migration
> > thread didn't got kicked out already by a failing rethrn-path thread, the
> > admin can also kick it out.
> >
> > This will be an super, super corner case, but still try to cover that.
> 
> It would be nice to have a test for this. Being such a corner case, it
> will be hard to keep this scenario working.

Yes makes sense.

> 
> I wrote two tests[1] that do the recovery each using a different URI:
> 1) fd: using a freshly opened file,
> 2) fd: using a socketpair that simply has nothing on the other end.
> 
> These might be too far from the original bug, but it seems to exercise
> some of the same paths:
> 
> Scenario 1:
> /x86_64/migration/postcopy/recovery/fail-twice
> 
> the stacks are:
> 
> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"):
>  qemu_sem_wait
>  ram_dirty_bitmap_sync_all
>  ram_resume_prepare
>  qemu_savevm_state_resume_prepare
>  postcopy_do_resume
>  postcopy_pause
>  migration_detect_error
>  migration_thread
> 
> Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"):
>  qemu_sem_wait
>  postcopy_pause_return_path_thread
>  source_return_path_thread

I guess this is because below path triggers:

    if (len > 0) {
        f->buf_size += len;
        f->total_transferred += len;
    } else if (len == 0) {
        qemu_file_set_error_obj(f, -EIO, local_error);     <-----------
    } else {
        qemu_file_set_error_obj(f, len, local_error);
    }

So the src can always write anything into the tmp file, but any read will
return 0 immediately because file offset is always pointing to the file
size.

> 
> This patch seems to fix it, although we cannot call qmp_migrate_recover
> a second time because the mis state is now in RECOVER:
> 
>   "Migrate recover can only be run when postcopy is paused."
> 
> Do we maybe need to return the state to PAUSED, or allow
> qmp_migrate_recover to run in RECOVER, like you did on the src side?

Ouch, I just noticed that my patch was wrong.

I probably need this:

===8<===
From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 12 Sep 2023 15:49:54 -0400
Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during
 recovery

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h | 2 +-
 migration/migration.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index e7f48e736e..7e61e2ece7 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
-bool migration_postcopy_is_alive(void);
+bool migration_postcopy_is_alive(int state);
 MigrationState *migrate_get_current(void);
 
 uint64_t ram_get_total_transferred_pages(void);
diff --git a/migration/migration.c b/migration/migration.c
index de2146c6fc..a9d381886c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void)
     }
 }
 
-bool migration_postcopy_is_alive(void)
+bool migration_postcopy_is_alive(int state)
 {
     MigrationState *s = migrate_get_current();
 
@@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp)
     MigrationIncomingState *mis = migration_incoming_get_current();
     int ret;
 
-    if (migration_postcopy_is_alive()) {
+    if (migration_postcopy_is_alive(ms->state)) {
         /* Source side, during postcopy */
         Error *error = NULL;
 
@@ -1593,7 +1593,7 @@ void qmp_migrate_pause(Error **errp)
         return;
     }
 
-    if (migration_postcopy_is_alive()) {
+    if (migration_postcopy_is_alive(mis->state)) {
         ret = qemu_file_shutdown(mis->from_src_file);
         if (ret) {
             error_setg(errp, "Failed to pause destination migration");
-- 
2.41.0
===8<===

> 
> 
> Scenario 2:
> /x86_64/migration/postcopy/recovery/fail-twice/rp
> 
> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"):
>  qemu_sem_wait
>  ram_dirty_bitmap_sync_all
>  ram_resume_prepare
>  qemu_savevm_state_resume_prepare
>  postcopy_do_resume
>  postcopy_pause
>  migration_detect_error
>  migration_thread
> 
> Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"):
>  recvmsg
>  qio_channel_socket_readv
>  qio_channel_readv_full
>  qio_channel_read
>  qemu_fill_buffer
>  qemu_peek_byte
>  qemu_get_byte
>  qemu_get_be16
>  source_return_path_thread
> 
> Here, with this patch the migration gets stuck unless we call
> migrate_pause() one more time. After another round of migrate_pause +
> recover, it finishes properly.
> 
> 
> 1- hacked-together test:
> -->8--
> From a34685c8795799350665a880cd2ddddbf53c5812 Mon Sep 17 00:00:00 2001
> From: Fabiano Rosas <farosas@suse.de>
> Date: Mon, 11 Sep 2023 20:45:33 -0300
> Subject: [PATCH] test patch
> 
> ---
>  tests/qtest/migration-test.c | 87 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 1b43df5ca7..4d9d2209c1 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -695,6 +695,7 @@ typedef struct {
>      /* Postcopy specific fields */
>      void *postcopy_data;
>      bool postcopy_preempt;
> +    int postcopy_recovery_method;
>  } MigrateCommon;
>  
>  static int test_migrate_start(QTestState **from, QTestState **to,
> @@ -1357,6 +1358,61 @@ static void test_postcopy_preempt_tls_psk(void)
>  }
>  #endif
>  
> +static void postcopy_recover_fail(QTestState *from, QTestState *to, int method)
> +{
> +    int src, dst;
> +
> +    if (method == 1) {
> +        /* give it some random fd to recover */
> +        g_autofree char *uri = g_strdup_printf("%s/noop", tmpfs);
> +        src = dst = open(uri, O_CREAT|O_RDWR);

This is slightly weird.

We opened a file, making it RW and pass it over to both QEMUs.

I think the result can be unpredictable for the reader, that if the reader
reads before any prior writes it'll just quickly fail, or I think it can
also happen that the read is slower than the write so it can read
something..  until it reads to the EOF and fail that fd at some point.

Not sure whether it'll cause some behavior difference and uncertainty on
the test case.

Maybe we drop this method but only keep the below one?

> +    } else if (method == 2) {
> +        int ret, pair1[2], pair2[2];
> +
> +        /* create two unrelated socketpairs */
> +        ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
> +        g_assert_cmpint(ret, ==, 0);
> +
> +        ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2);
> +        g_assert_cmpint(ret, ==, 0);
> +
> +        /* give the guests unpaired ends of the sockets */
> +        src = pair1[0];
> +        dst = pair2[0];
> +    }
> +
> +    qtest_qmp_fds_assert_success(to, &src, 1,
> +                                 "{ 'execute': 'getfd',"
> +                                 "  'arguments': { 'fdname': 'fd-mig' }}");
> +
> +    qtest_qmp_fds_assert_success(from, &dst, 1,
> +                                 "{ 'execute': 'getfd',"
> +                                 "  'arguments': { 'fdname': 'fd-mig' }}");
> +
> +    migrate_recover(to, "fd:fd-mig");
> +
> +    wait_for_migration_status(from, "postcopy-paused",
> +                              (const char * []) { "failed", "active",
> +                                  "completed", NULL });
> +    migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
> +
> +    printf("WAIT\n");
> +    if (method == 2) {
> +        /* This would be issued by the admin upon noticing the hang */
> +        migrate_pause(from);
> +    }
> +
> +    wait_for_migration_status(from, "postcopy-paused",
> +                              (const char * []) { "failed", "active",
> +                                  "completed", NULL });
> +    printf("PAUSED\n");
> +
> +    close(src);
> +    if (method == 2) {
> +        close(dst);
> +    }
> +}
> +
>  static void test_postcopy_recovery_common(MigrateCommon *args)
>  {
>      QTestState *from, *to;
> @@ -1396,6 +1452,13 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
>                                (const char * []) { "failed", "active",
>                                                    "completed", NULL });
>  
> +    if (args->postcopy_recovery_method) {
> +        /* fail the recovery */
> +        postcopy_recover_fail(from, to, args->postcopy_recovery_method);
> +
> +        /* continue with a good recovery */
> +    }
> +
>      /*
>       * Create a new socket to emulate a new channel that is different
>       * from the broken migration channel; tell the destination to
> @@ -1435,6 +1498,24 @@ static void test_postcopy_recovery_compress(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> +static void test_postcopy_recovery_fail(void)
> +{
> +    MigrateCommon args = {
> +        .postcopy_recovery_method = 1,
> +    };
> +
> +    test_postcopy_recovery_common(&args);
> +}
> +
> +static void test_postcopy_recovery_fail_rp(void)
> +{
> +    MigrateCommon args = {
> +        .postcopy_recovery_method = 2,
> +    };
> +
> +    test_postcopy_recovery_common(&args);
> +}
> +
>  #ifdef CONFIG_GNUTLS
>  static void test_postcopy_recovery_tls_psk(void)
>  {
> @@ -2825,6 +2906,12 @@ int main(int argc, char **argv)
>              qtest_add_func("/migration/postcopy/recovery/compress/plain",
>                             test_postcopy_recovery_compress);
>          }
> +        qtest_add_func("/migration/postcopy/recovery/fail-twice",
> +                       test_postcopy_recovery_fail);
> +
> +        qtest_add_func("/migration/postcopy/recovery/fail-twice/rp",
> +                       test_postcopy_recovery_fail_rp);
> +
>      }
>  
>      qtest_add_func("/migration/bad_dest", test_baddest);

Thanks for contributing the test case!

Do you want me to pick this patch up (with modifications) and repost
together with this series?  It'll also work if you want to send a separate
test patch.  Let me know!

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/9] migration: Let migrate_set_error() take ownership
  2023-09-12 19:40   ` Fabiano Rosas
@ 2023-09-12 20:14     ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-09-12 20:14 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela

On Tue, Sep 12, 2023 at 04:40:14PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > migrate_set_error() used one error_copy() so it always copy an error.
> > However that's not the major use case - the major use case is one would
> > like to pass the error to migrate_set_error() without further touching the
> > error.
> >
> > It can be proved if we see most of the callers are freeing the error
> > explicitly right afterwards.  There're a few outliers (only if when the
> > caller) where we can use error_copy() explicitly there.
> >
> > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.h    |  4 ++--
> >  migration/channel.c      |  1 -
> >  migration/migration.c    | 22 ++++++++++++++++------
> >  migration/multifd.c      | 10 ++++------
> >  migration/postcopy-ram.c |  1 -
> >  migration/ram.c          |  1 -
> >  6 files changed, 22 insertions(+), 17 deletions(-)
> >
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 6eea18db36..76e35a5ecf 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -465,7 +465,7 @@ bool  migration_has_all_channels(void);
> >  
> >  uint64_t migrate_max_downtime(void);
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error);
> > +void migrate_set_error(MigrationState *s, Error *error);
> >  
> >  void migrate_fd_connect(MigrationState *s, Error *error_in);
> >  
> > @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> >  void migration_make_urgent_request(void);
> >  void migration_consume_urgent_request(void);
> >  bool migration_rate_limit(void);
> > -void migration_cancel(const Error *error);
> > +void migration_cancel(Error *error);
> >  
> >  void populate_vfio_info(MigrationInfo *info);
> >  void reset_vfio_bytes_transferred(void);
> > diff --git a/migration/channel.c b/migration/channel.c
> > index ca3319a309..48b3f6abd6 100644
> > --- a/migration/channel.c
> > +++ b/migration/channel.c
> > @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s,
> >          }
> >      }
> >      migrate_fd_connect(s, error);
> > -    error_free(error);
> >  }
> >  
> >  
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c60064d48e..0f3ca168ed 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -162,7 +162,7 @@ void migration_object_init(void)
> >      dirty_bitmap_mig_init();
> >  }
> >  
> > -void migration_cancel(const Error *error)
> > +void migration_cancel(Error *error)
> >  {
> >      if (error) {
> >          migrate_set_error(current_migration, error);
> > @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque)
> >      object_unref(OBJECT(s));
> >  }
> >  
> > -void migrate_set_error(MigrationState *s, const Error *error)
> > +/*
> > + * Set error for current migration state.  The `error' ownership will be
> > + * moved from the caller to MigrationState, so the caller doesn't need to
> > + * free the error.
> > + *
> > + * If the caller still needs to reference the `error' passed in, one should
> > + * use error_copy() explicitly.
> > + */
> > +void migrate_set_error(MigrationState *s, Error *error)
> >  {
> >      QEMU_LOCK_GUARD(&s->error_mutex);
> >      if (!s->error) {
> > -        s->error = error_copy(error);
> > +        /* Record the first error triggered */
> > +        s->error = error;
> > +    } else {
> > +        error_free(error);
> 
> This will conflict logically with 908927db28 ("migration: Update error
> description whenever migration fails") which does:
> 
> +            migrate_set_error(s, local_err);
> +            error_report_err(local_err);
> 
> both functions may now try to free the error.

Indeed, thanks for spotting this.  Perhaps I should just drop the
error_report_err() if we've set the error already anyway.

> 
> 
> I'm working on top of this series to try to get rid of all of those
> qemu_file_set_error() we have. I'm trying to use migrate_set_error()
> whenever possible and only set f->last_error at the very bottom IO
> functions.

I'll read when it comes.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-09-12 20:05     ` Peter Xu
@ 2023-09-12 22:16       ` Peter Xu
  2023-09-12 22:49       ` Fabiano Rosas
  1 sibling, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-09-12 22:16 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Xiaohui Li

On Tue, Sep 12, 2023 at 04:05:27PM -0400, Peter Xu wrote:
> Thanks for contributing the test case!
> 
> Do you want me to pick this patch up (with modifications) and repost
> together with this series?  It'll also work if you want to send a separate
> test patch.  Let me know!

It turns out I found more bug when I was reworking that test case based on
yours.  E.g., currently we'll crash dest qemu if we really fail during
recovery, because we miss:

diff --git a/migration/savevm.c b/migration/savevm.c
index bb3e99194c..422406e0ee 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2723,7 +2723,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
         qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
     }
 
-    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+    /* Current state can be either ACTIVE or RECOVER */
+    migrate_set_state(&mis->state, mis->state,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
     /* Notify the fault thread for the invalidated file handle */

So in double failure case we'll not set RECOVER to PAUSED, and we'll crash
right afterwards, as we'll skip the semaphore:

    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {  <--- not true, continue
        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
    }

Now within the new test case I am 100% sure I can kick both sides into
RECOVER state (one trick still needed along the way; the test patch will
tell soon), then kick them back, then proceed with a successful migration.

Let me just repost everything with the new test case.

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-09-12 20:05     ` Peter Xu
  2023-09-12 22:16       ` Peter Xu
@ 2023-09-12 22:49       ` Fabiano Rosas
  2023-09-13  0:38         ` Peter Xu
  1 sibling, 1 reply; 19+ messages in thread
From: Fabiano Rosas @ 2023-09-12 22:49 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Xiaohui Li

Peter Xu <peterx@redhat.com> writes:

>> Scenario 1:
>> /x86_64/migration/postcopy/recovery/fail-twice
>> 
>> the stacks are:
>> 
>> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30282) "live_migration"):
>>  qemu_sem_wait
>>  ram_dirty_bitmap_sync_all
>>  ram_resume_prepare
>>  qemu_savevm_state_resume_prepare
>>  postcopy_do_resume
>>  postcopy_pause
>>  migration_detect_error
>>  migration_thread
>> 
>> Thread 7 (Thread 0x7fffd67ff700 (LWP 30281) "return path"):
>>  qemu_sem_wait
>>  postcopy_pause_return_path_thread
>>  source_return_path_thread
>
> I guess this is because below path triggers:
>
>     if (len > 0) {
>         f->buf_size += len;
>         f->total_transferred += len;
>     } else if (len == 0) {
>         qemu_file_set_error_obj(f, -EIO, local_error);     <-----------
>     } else {
>         qemu_file_set_error_obj(f, len, local_error);
>     }
>
> So the src can always write anything into the tmp file, but any read will
> return 0 immediately because file offset is always pointing to the file
> size.

Yes, a 0 return would mean EOF indeed.

>> 
>> This patch seems to fix it, although we cannot call qmp_migrate_recover
>> a second time because the mis state is now in RECOVER:
>> 
>>   "Migrate recover can only be run when postcopy is paused."
>> 
>> Do we maybe need to return the state to PAUSED, or allow
>> qmp_migrate_recover to run in RECOVER, like you did on the src side?

I figured what is going on here (test #1). At postcopy_pause_incoming()
the state transition is ACTIVE -> PAUSED, but when the first recovery
fails on the incoming side, the transition would have to be RECOVER ->
PAUSED.

Could you add that change to this patch?

>
> Ouch, I just noticed that my patch was wrong.
>
> I probably need this:
>
> ===8<===
> From 8c2fb7b4c7488002283c7fb6a5e2aae81b21e04b Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Tue, 12 Sep 2023 15:49:54 -0400
> Subject: [PATCH] fixup! migration/postcopy: Allow network to fail even during
>  recovery
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.h | 2 +-
>  migration/migration.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.h b/migration/migration.h
> index e7f48e736e..7e61e2ece7 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -482,7 +482,7 @@ int migrate_init(MigrationState *s, Error **errp);
>  bool migration_is_blocked(Error **errp);
>  /* True if outgoing migration has entered postcopy phase */
>  bool migration_in_postcopy(void);
> -bool migration_postcopy_is_alive(void);
> +bool migration_postcopy_is_alive(int state);
>  MigrationState *migrate_get_current(void);
>  
>  uint64_t ram_get_total_transferred_pages(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index de2146c6fc..a9d381886c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1349,7 +1349,7 @@ bool migration_in_postcopy(void)
>      }
>  }
>  
> -bool migration_postcopy_is_alive(void)
> +bool migration_postcopy_is_alive(int state)
>  {
>      MigrationState *s = migrate_get_current();
>  

Note there's a missing hunk here to actually use the 'state'.

> @@ -1569,7 +1569,7 @@ void qmp_migrate_pause(Error **errp)
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      int ret;
>  
> -    if (migration_postcopy_is_alive()) {
> +    if (migration_postcopy_is_alive(ms->state)) {
>          /* Source side, during postcopy */
>          Error *error = NULL;
>  
> @@ -1593,7 +1593,7 @@ void qmp_migrate_pause(Error **errp)
>          return;
>      }
>  
> -    if (migration_postcopy_is_alive()) {
> +    if (migration_postcopy_is_alive(mis->state)) {
>          ret = qemu_file_shutdown(mis->from_src_file);
>          if (ret) {
>              error_setg(errp, "Failed to pause destination migration");
> -- 
> 2.41.0
> ===8<===
>> 
>> 
>> Scenario 2:
>> /x86_64/migration/postcopy/recovery/fail-twice/rp
>> 
>> Thread 8 (Thread 0x7fffd5ffe700 (LWP 30456) "live_migration"):
>>  qemu_sem_wait
>>  ram_dirty_bitmap_sync_all
>>  ram_resume_prepare
>>  qemu_savevm_state_resume_prepare
>>  postcopy_do_resume
>>  postcopy_pause
>>  migration_detect_error
>>  migration_thread
>> 
>> Thread 7 (Thread 0x7fffd67ff700 (LWP 30455) "return path"):
>>  recvmsg
>>  qio_channel_socket_readv
>>  qio_channel_readv_full
>>  qio_channel_read
>>  qemu_fill_buffer
>>  qemu_peek_byte
>>  qemu_get_byte
>>  qemu_get_be16
>>  source_return_path_thread
>> 
>> Here, with this patch the migration gets stuck unless we call
>> migrate_pause() one more time. After another round of migrate_pause +
>> recover, it finishes properly.

Here (test #2), the issue is that the sockets are unpaired, so there's
no G_IO_IN to trigger the qio_channel watch callback. The incoming side
never calls fd_accept_incoming_migration() and the RP hangs waiting for
something. I think there's no other way to unblock aside from the
explicit qmp_migrate_pause().

>> 
>> 
>> 1- hacked-together test:
>> -->8--
>> From a34685c8795799350665a880cd2ddddbf53c5812 Mon Sep 17 00:00:00 2001
>> From: Fabiano Rosas <farosas@suse.de>
>> Date: Mon, 11 Sep 2023 20:45:33 -0300
>> Subject: [PATCH] test patch
>> 
>> ---
>>  tests/qtest/migration-test.c | 87 ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 87 insertions(+)
>> 
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 1b43df5ca7..4d9d2209c1 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -695,6 +695,7 @@ typedef struct {
>>      /* Postcopy specific fields */
>>      void *postcopy_data;
>>      bool postcopy_preempt;
>> +    int postcopy_recovery_method;
>>  } MigrateCommon;
>>  
>>  static int test_migrate_start(QTestState **from, QTestState **to,
>> @@ -1357,6 +1358,61 @@ static void test_postcopy_preempt_tls_psk(void)
>>  }
>>  #endif
>>  
>> +static void postcopy_recover_fail(QTestState *from, QTestState *to, int method)
>> +{
>> +    int src, dst;
>> +
>> +    if (method == 1) {
>> +        /* give it some random fd to recover */
>> +        g_autofree char *uri = g_strdup_printf("%s/noop", tmpfs);
>> +        src = dst = open(uri, O_CREAT|O_RDWR);
>
> This is slightly weird.

Yeah, I was just trying to give some sort of bad input that would still
be accepted.

>
> We opened a file, making it RW and pass it over to both QEMUs.
>
> I think the result can be unpredictable for the reader, that if the reader
> reads before any prior writes it'll just quickly fail, or I think it can
> also happen that the read is slower than the write so it can read
> something..  until it reads to the EOF and fail that fd at some point.
>
> Not sure whether it'll cause some behavior difference and uncertainty on
> the test case.

I see qmp_migrate_recover() failing consistently and since the src is
only resumed later, it sees an unresponsive dst and the RP goes into the
retry path.

We could give them both separate files and the result would be more
predictable.

>
> Maybe we drop this method but only keep the below one?
>

Yeah, maybe this is just too weird. I'm fine either way.

[snip]
>>      qtest_add_func("/migration/bad_dest", test_baddest);
>
> Thanks for contributing the test case!
>
> Do you want me to pick this patch up (with modifications) and repost
> together with this series?  It'll also work if you want to send a separate
> test patch.  Let me know!

You can take it. Or drop it if it ends being too artificial. I'll be
focusing on implementing some of the qemu-file.c improvements we
discussed in the past on top of this series.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery
  2023-09-12 22:49       ` Fabiano Rosas
@ 2023-09-13  0:38         ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2023-09-13  0:38 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Xiaohui Li

On Tue, Sep 12, 2023 at 07:49:37PM -0300, Fabiano Rosas wrote:
> I figured what is going on here (test #1). At postcopy_pause_incoming()
> the state transition is ACTIVE -> PAUSED, but when the first recovery
> fails on the incoming side, the transition would have to be RECOVER ->
> PAUSED.
> 
> Could you add that change to this patch?

Yes, and actually, see:

https://lore.kernel.org/qemu-devel/20230912222145.731099-11-peterx@redhat.com/

> > -bool migration_postcopy_is_alive(void)
> > +bool migration_postcopy_is_alive(int state)
> >  {
> >      MigrationState *s = migrate_get_current();
> >  
> 
> Note there's a missing hunk here to actually use the 'state'.

Yes.. I fixed it in the version I just posted, here:

https://lore.kernel.org/qemu-devel/20230912222145.731099-10-peterx@redhat.com/

+bool migration_postcopy_is_alive(int state)
+{
+    switch (state) {
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
+        return true;
+    default:
+        return false;
+    }
+}

[...]

> >> Here, with this patch the migration gets stuck unless we call
> >> migrate_pause() one more time. After another round of migrate_pause +
> >> recover, it finishes properly.
> 
> Here (test #2), the issue is that the sockets are unpaired, so there's
> no G_IO_IN to trigger the qio_channel watch callback. The incoming side
> never calls fd_accept_incoming_migration() and the RP hangs waiting for
> something. I think there's no other way to unblock aside from the
> explicit qmp_migrate_pause().

Exactly, that's the "trick" I mentioned. :)

Sorry when replying just now I seem to have jumped over some sections.
See:

https://lore.kernel.org/qemu-devel/20230912222145.731099-12-peterx@redhat.com/

I put a rich comment for that:

+    /*
+     * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to
+     * emulate the 1st byte of a real recovery, but stops from there to
+     * keep dest QEMU in RECOVER.  This is needed so that we can kick off
+     * the recover process on dest QEMU (by triggering the G_IO_IN event).
+     *
+     * NOTE: this trick is not needed on src QEMUs, because src doesn't
+     * rely on an pre-existing G_IO_IN event, so it will always trigger the
+     * upcoming recovery anyway even if it can read nothing.
+     */
+#define QEMU_VM_COMMAND              0x08
+    c = QEMU_VM_COMMAND;
+    ret = send(pair2[1], &c, 1, 0);
+    g_assert_cmpint(ret, ==, 1);

> We could give them both separate files and the result would be more
> predictable.

Please have a look at the test patch I posted (note!  it's still under your
name but I changed it quite a lot with my sign-off).  I used your 2nd
method to create socket pairs, and hopefully that provides very reliable
way to put both src/dst sides into RECOVER state, then kick it out of it
using qmp migrate-pause on both sides.

> You can take it. Or drop it if it ends being too artificial.

I like your suggestion on having the test case, and I hope the new version
in above link I posted isn't so artificial; the only part I don't like
about that was the "write 1 byte" trick for dest qemu, but that seems still
okay.  Feel free to go and have a look.

Thanks a lot,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-09-13  0:39 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 21:42 [PATCH 0/9] migration: Better error handling in rp thread, allow failures in recover Peter Xu
2023-08-29 21:42 ` [PATCH 1/9] migration: Display error in query-migrate irrelevant of status Peter Xu
2023-08-29 21:42 ` [PATCH 2/9] migration: Let migrate_set_error() take ownership Peter Xu
2023-09-12 19:40   ` Fabiano Rosas
2023-09-12 20:14     ` Peter Xu
2023-08-29 21:42 ` [PATCH 3/9] migration: Introduce migrate_has_error() Peter Xu
2023-08-29 21:42 ` [PATCH 4/9] migration: Refactor error handling in source return path Peter Xu
2023-08-29 21:42 ` [PATCH 5/9] migration: Deliver return path file error to migrate state too Peter Xu
2023-08-29 21:42 ` [PATCH 6/9] qemufile: Always return a verbose error Peter Xu
2023-08-29 21:42 ` [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery Peter Xu
2023-09-12  0:33   ` Fabiano Rosas
2023-08-29 21:42 ` [PATCH 8/9] migration: Add migration_rp_wait|kick() Peter Xu
2023-09-12  0:32   ` Fabiano Rosas
2023-08-29 21:42 ` [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery Peter Xu
2023-09-12  0:31   ` Fabiano Rosas
2023-09-12 20:05     ` Peter Xu
2023-09-12 22:16       ` Peter Xu
2023-09-12 22:49       ` Fabiano Rosas
2023-09-13  0:38         ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.