All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] migration: Fix disorder of channel creations
@ 2023-02-02 21:24 Peter Xu
  2023-02-02 21:24 ` [PATCH v2 1/3] migration: Rework multi-channel checks on URI Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Peter Xu @ 2023-02-02 21:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Daniel P . Berrange, Juan Quintela,
	Dr . David Alan Gilbert, peterx, Leonardo Bras Soares Passos

This patchset is rebased to Juan's latest pull request:
Based-on: <20230202160640.2300-1-quintela@redhat.com>

I can trigger disordered connections with preempt mode postcopy (1 out of a
few attemps), which can cause migration to hang during precopy phase, if
e.g. I set the NIC packet loss rate to 50%.

Patch 1 is IMHO a cleanup that I'd think good to have even without patch
2/3.  Patch 3 actually fixes the ordering issue.  For each of the patch,
please refer to the commit message and comments in-code.

Any comment welcomed, thanks.

Peter Xu (3):
  migration: Rework multi-channel checks on URI
  migration: Add a semaphore to count PONGs
  migration: Postpone postcopy preempt channel to be after main

 migration/migration.c    | 121 ++++++++++++++++++++++++++-------------
 migration/migration.h    |  15 ++++-
 migration/multifd.c      |  12 +---
 migration/postcopy-ram.c |  31 +++++-----
 migration/postcopy-ram.h |   4 +-
 migration/savevm.c       |   6 +-
 6 files changed, 117 insertions(+), 72 deletions(-)

-- 
2.37.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] migration: Rework multi-channel checks on URI
  2023-02-02 21:24 [PATCH v2 0/3] migration: Fix disorder of channel creations Peter Xu
@ 2023-02-02 21:24 ` Peter Xu
  2023-02-08 19:19   ` Juan Quintela
  2023-02-02 21:24 ` [PATCH v2 2/3] migration: Add a semaphore to count PONGs Peter Xu
  2023-02-02 21:24 ` [PATCH v2 3/3] migration: Postpone postcopy preempt channel to be after main Peter Xu
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2023-02-02 21:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Daniel P . Berrange, Juan Quintela,
	Dr . David Alan Gilbert, peterx, Leonardo Bras Soares Passos

The whole idea of multi-channel checks was not properly done, IMHO.

Currently we check multi-channel in a lot of places, but actually that's
not needed because we only need to check it right after we get the URI and
that should be it.

If the URI check succeeded, we should never need to check it again because
we must have it.  If it check fails, we should fail immediately on either
the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
the connection established.

Neither should we fail any set capabiliities like what we used to do here:

5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)

Because logically the URI will only be set later after the capability is
set, so it doesn't make a lot of sense to check the URI type when setting
the capability, because we're checking the cap with an old URI passed in,
and that may not even be the URI we're going to use later.

This patch mostly reverted all such checks for before, dropping the
variable migrate_allow_multi_channels and helpers.  Instead, add a common
helper to check URI for multi-channels for either qmp_migrate and
qmp_migrate_incoming and that should do all the proper checks.  The failure
will only trigger with the "migrate" or "migrate_incoming" command, or when
user specified "-incoming xxx" where "xxx" is not "defer".

With that, make postcopy_preempt_setup() as simple as creating the channel.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 56 +++++++++++++++++++---------------------
 migration/migration.h    |  3 ---
 migration/multifd.c      | 12 ++-------
 migration/postcopy-ram.c | 14 +---------
 migration/postcopy-ram.h |  2 +-
 5 files changed, 31 insertions(+), 56 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f4f7d207f0..ef7fceb5d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -182,16 +182,26 @@ static int migration_maybe_pause(MigrationState *s,
                                  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 
-static bool migrate_allow_multi_channels = true;
+static bool migration_needs_multiple_sockets(void)
+{
+    return migrate_use_multifd() || migrate_postcopy_preempt();
+}
 
-void migrate_protocol_allow_multi_channels(bool allow)
+static bool uri_supports_multi_channels(const char *uri)
 {
-    migrate_allow_multi_channels = allow;
+    return strstart(uri, "tcp:", NULL) || strstart(uri, "unix:", NULL) ||
+        strstart(uri, "vsock:", NULL);
 }
 
-bool migrate_multi_channels_is_allowed(void)
+static bool migration_uri_validate(const char *uri, Error **errp)
 {
-    return migrate_allow_multi_channels;
+    if (migration_needs_multiple_sockets() &&
+        !uri_supports_multi_channels(uri)) {
+        error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
+        return false;
+    }
+
+    return true;
 }
 
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
@@ -491,12 +501,15 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
 
-    migrate_protocol_allow_multi_channels(false); /* reset it anyway */
+    /* URI is not suitable for migration? */
+    if (!migration_uri_validate(uri, errp)) {
+        return;
+    }
+
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
-        migrate_protocol_allow_multi_channels(true);
         socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
@@ -721,11 +734,6 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
     migration_incoming_process();
 }
 
-static bool migration_needs_multiple_sockets(void)
-{
-    return migrate_use_multifd() || migrate_postcopy_preempt();
-}
-
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
@@ -1347,15 +1355,6 @@ static bool migrate_caps_check(bool *cap_list,
     }
 #endif
 
-
-    /* incoming side only */
-    if (runstate_check(RUN_STATE_INMIGRATE) &&
-        !migrate_multi_channels_is_allowed() &&
-        cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
-        error_setg(errp, "multifd is not supported by current protocol");
-        return false;
-    }
-
     if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
         if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
             error_setg(errp, "Postcopy preempt requires postcopy-ram");
@@ -2440,6 +2439,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     MigrationState *s = migrate_get_current();
     const char *p = NULL;
 
+    /* URI is not suitable for migration? */
+    if (!migration_uri_validate(uri, errp)) {
+        return;
+    }
+
     if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
                          has_resume && resume, errp)) {
         /* Error detected, put into errp */
@@ -2452,11 +2456,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         }
     }
 
-    migrate_protocol_allow_multi_channels(false);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
-        migrate_protocol_allow_multi_channels(true);
         socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
@@ -4309,12 +4311,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     }
 
     /* This needs to be done before resuming a postcopy */
-    if (postcopy_preempt_setup(s, &local_err)) {
-        error_report_err(local_err);
-        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                          MIGRATION_STATUS_FAILED);
-        migrate_fd_cleanup(s);
-        return;
+    if (migrate_postcopy_preempt()) {
+        postcopy_preempt_setup(s);
     }
 
     if (resume) {
diff --git a/migration/migration.h b/migration/migration.h
index 66511ce532..c351872360 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -474,7 +474,4 @@ void migration_cancel(const Error *error);
 void populate_vfio_info(MigrationInfo *info);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
-bool migrate_multi_channels_is_allowed(void);
-void migrate_protocol_allow_multi_channels(bool allow);
-
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index eeb4fb87ee..dfe8eda5bf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -512,7 +512,7 @@ void multifd_save_cleanup(void)
 {
     int i;
 
-    if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
+    if (!migrate_use_multifd()) {
         return;
     }
     multifd_send_terminate_threads(NULL);
@@ -910,10 +910,6 @@ int multifd_save_setup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
-    if (!migrate_multi_channels_is_allowed()) {
-        error_setg(errp, "multifd is not supported by current protocol");
-        return -1;
-    }
 
     thread_count = migrate_multifd_channels();
     multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
@@ -1018,7 +1014,7 @@ int multifd_load_cleanup(Error **errp)
 {
     int i;
 
-    if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
+    if (!migrate_use_multifd()) {
         return 0;
     }
     multifd_recv_terminate_threads(NULL);
@@ -1172,10 +1168,6 @@ int multifd_load_setup(Error **errp)
         return 0;
     }
 
-    if (!migrate_multi_channels_is_allowed()) {
-        error_setg(errp, "multifd is not supported by current protocol");
-        return -1;
-    }
     thread_count = migrate_multifd_channels();
     multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
     multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b98e95dab0..e2578dbd21 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1631,22 +1631,10 @@ int postcopy_preempt_wait_channel(MigrationState *s)
     return s->postcopy_qemufile_src ? 0 : -1;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+void postcopy_preempt_setup(MigrationState *s)
 {
-    if (!migrate_postcopy_preempt()) {
-        return 0;
-    }
-
-    if (!migrate_multi_channels_is_allowed()) {
-        error_setg(errp, "Postcopy preempt is not supported as current "
-                   "migration stream does not support multi-channels.");
-        return -1;
-    }
-
     /* Kick an async task to connect */
     socket_send_channel_create(postcopy_preempt_send_channel_new, s);
-
-    return 0;
 }
 
 static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 25881c4127..d5604cbcf1 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -191,7 +191,7 @@ enum PostcopyChannels {
 };
 
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
-int postcopy_preempt_setup(MigrationState *s, Error **errp);
+void postcopy_preempt_setup(MigrationState *s);
 int postcopy_preempt_wait_channel(MigrationState *s);
 
 #endif
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] migration: Add a semaphore to count PONGs
  2023-02-02 21:24 [PATCH v2 0/3] migration: Fix disorder of channel creations Peter Xu
  2023-02-02 21:24 ` [PATCH v2 1/3] migration: Rework multi-channel checks on URI Peter Xu
@ 2023-02-02 21:24 ` Peter Xu
  2023-02-08 19:19   ` Juan Quintela
  2023-02-02 21:24 ` [PATCH v2 3/3] migration: Postpone postcopy preempt channel to be after main Peter Xu
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2023-02-02 21:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Daniel P . Berrange, Juan Quintela,
	Dr . David Alan Gilbert, peterx, Leonardo Bras Soares Passos

This is mostly useless, but useful for us to know whether the main channel
is correctly established without changing the migration protocol.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 3 +++
 migration/migration.h | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index ef7fceb5d7..d66f5cfcd7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2993,6 +2993,7 @@ retry:
         case MIG_RP_MSG_PONG:
             tmp32 = ldl_be_p(buf);
             trace_source_return_path_thread_pong(tmp32);
+            qemu_sem_post(&ms->rp_state.rp_pong_acks);
             break;
 
         case MIG_RP_MSG_REQ_PAGES:
@@ -4488,6 +4489,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->postcopy_pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_sem);
+    qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
 }
@@ -4534,6 +4536,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
+    qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
     qemu_sem_init(&ms->rate_limit_sem, 0);
     qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
diff --git a/migration/migration.h b/migration/migration.h
index c351872360..4cb1cb6fa8 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -276,6 +276,12 @@ struct MigrationState {
          */
         bool          rp_thread_created;
         QemuSemaphore rp_sem;
+        /*
+         * We post to this when we got one PONG from dest. So far it's an
+         * easy way to know the main channel has successfully established
+         * on dest QEMU.
+         */
+        QemuSemaphore rp_pong_acks;
     } rp_state;
 
     double mbps;
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] migration: Postpone postcopy preempt channel to be after main
  2023-02-02 21:24 [PATCH v2 0/3] migration: Fix disorder of channel creations Peter Xu
  2023-02-02 21:24 ` [PATCH v2 1/3] migration: Rework multi-channel checks on URI Peter Xu
  2023-02-02 21:24 ` [PATCH v2 2/3] migration: Add a semaphore to count PONGs Peter Xu
@ 2023-02-02 21:24 ` Peter Xu
  2 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2023-02-02 21:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Manish Mishra, Daniel P . Berrange, Juan Quintela,
	Dr . David Alan Gilbert, peterx, Leonardo Bras Soares Passos

Postcopy with preempt-mode enabled needs two channels to communicate.  The
order of channel establishment is not guaranteed.  It can happen that the
dest QEMU got the preempt channel connection request before the main
channel is established, then the migration may make no progress even during
precopy due to the wrong order.

To fix it, create the preempt channel only if we know the main channel is
established.

For a general postcopy migration, we delay it until postcopy_start(),
that's where we already went through some part of precopy on the main
channel.  To make sure dest QEMU has already established the channel, we
wait until we got the first PONG received.  That's something we do at the
start of precopy when postcopy enabled so it's guaranteed to happen sooner
or later.

For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare()
where we'll have round trips of data on bitmap synchronizations, which
means the main channel must have been established.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 72 ++++++++++++++++++++++++++++++----------
 migration/migration.h    |  6 ++++
 migration/postcopy-ram.c | 17 ++++++++--
 migration/postcopy-ram.h |  2 +-
 migration/savevm.c       |  6 +++-
 5 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d66f5cfcd7..0516aa35e5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -232,6 +232,8 @@ void migration_object_init(void)
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fast_load, 0);
+    qemu_sem_init(&current_incoming->postcopy_qemufile_dst_done, 0);
+
     qemu_mutex_init(&current_incoming->page_request_mutex);
     current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -734,6 +736,31 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
     migration_incoming_process();
 }
 
+/*
+ * Returns true when we want to start a new incoming migration process,
+ * false otherwise.
+ */
+static bool migration_should_start_incoming(bool main_channel)
+{
+    /* Multifd doesn't start unless all channels are established */
+    if (migrate_use_multifd()) {
+        return migration_has_all_channels();
+    }
+
+    /* Preempt channel only starts when the main channel is created */
+    if (migrate_postcopy_preempt()) {
+        return main_channel;
+    }
+
+    /*
+     * For all the rest types of migration, we should only reach here when
+     * it's the main channel that's being created, and we should always
+     * proceed with this channel.
+     */
+    assert(main_channel);
+    return true;
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
@@ -795,7 +822,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         }
     }
 
-    if (migration_has_all_channels()) {
+    if (migration_should_start_incoming(default_channel)) {
         /* If it's a recovery, we're done */
         if (postcopy_try_recover()) {
             return;
@@ -3127,6 +3154,13 @@ static int await_return_path_close_on_source(MigrationState *ms)
     return ms->rp_state.error;
 }
 
+static inline void
+migration_wait_main_channel(MigrationState *ms)
+{
+    /* Wait until one PONG message received */
+    qemu_sem_wait(&ms->rp_state.rp_pong_acks);
+}
+
 /*
  * Switch from normal iteration to postcopy
  * Returns non-0 on error
@@ -3141,9 +3175,12 @@ static int postcopy_start(MigrationState *ms)
     bool restart_block = false;
     int cur_state = MIGRATION_STATUS_ACTIVE;
 
-    if (postcopy_preempt_wait_channel(ms)) {
-        migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
-        return -1;
+    if (migrate_postcopy_preempt()) {
+        migration_wait_main_channel(ms);
+        if (postcopy_preempt_establish_channel(ms)) {
+            migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+            return -1;
+        }
     }
 
     if (!migrate_pause_before_switchover()) {
@@ -3554,6 +3591,20 @@ static int postcopy_do_resume(MigrationState *s)
         return ret;
     }
 
+    /*
+     * If preempt is enabled, re-establish the preempt channel.  Note that
+     * we do it after resume prepare to make sure the main channel will be
+     * created before the preempt channel.  E.g. with weak network, the
+     * dest QEMU may get messed up with the preempt and main channels on
+     * the order of connection setup.  This guarantees the correct order.
+     */
+    ret = postcopy_preempt_establish_channel(s);
+    if (ret) {
+        error_report("%s: postcopy_preempt_establish_channel(): %d",
+                     __func__, ret);
+        return ret;
+    }
+
     /*
      * Last handshake with destination on the resume (destination will
      * switch to postcopy-active afterwards)
@@ -3615,14 +3666,6 @@ static MigThrError postcopy_pause(MigrationState *s)
         if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
             /* Woken up by a recover procedure. Give it a shot */
 
-            if (postcopy_preempt_wait_channel(s)) {
-                /*
-                 * Preempt enabled, and new channel create failed; loop
-                 * back to wait for another recovery.
-                 */
-                continue;
-            }
-
             /*
              * Firstly, let's wake up the return path now, with a new
              * return path channel.
@@ -4311,11 +4354,6 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         }
     }
 
-    /* This needs to be done before resuming a postcopy */
-    if (migrate_postcopy_preempt()) {
-        postcopy_preempt_setup(s);
-    }
-
     if (resume) {
         /* Wakeup the main migration thread to do the recovery */
         migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
diff --git a/migration/migration.h b/migration/migration.h
index 4cb1cb6fa8..2da2f8a164 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -116,6 +116,12 @@ struct MigrationIncomingState {
     unsigned int postcopy_channels;
     /* QEMUFile for postcopy only; it'll be handled by a separate thread */
     QEMUFile *postcopy_qemufile_dst;
+    /*
+     * When postcopy_qemufile_dst is properly setup, this sem is posted.
+     * One can wait on this semaphore to wait until the preempt channel is
+     * properly setup.
+     */
+    QemuSemaphore postcopy_qemufile_dst_done;
     /* Postcopy priority thread is used to receive postcopy requested pages */
     QemuThread postcopy_prio_thread;
     bool postcopy_prio_thread_created;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e2578dbd21..e9cb949c51 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1199,6 +1199,11 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
     }
 
     if (migrate_postcopy_preempt()) {
+        /*
+         * The preempt channel is established in asynchronous way.  Wait
+         * for its completion.
+         */
+        qemu_sem_wait(&mis->postcopy_qemufile_dst_done);
         /*
          * This thread needs to be created after the temp pages because
          * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
@@ -1546,6 +1551,7 @@ void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
      */
     qemu_file_set_blocking(file, true);
     mis->postcopy_qemufile_dst = file;
+    qemu_sem_post(&mis->postcopy_qemufile_dst_done);
     trace_postcopy_preempt_new_channel();
 }
 
@@ -1614,14 +1620,21 @@ out:
     postcopy_preempt_send_channel_done(s, ioc, local_err);
 }
 
-/* Returns 0 if channel established, -1 for error. */
-int postcopy_preempt_wait_channel(MigrationState *s)
+/*
+ * This function will kick off an async task to establish the preempt
+ * channel, and wait until the connection setup completed.  Returns 0 if
+ * channel established, -1 for error.
+ */
+int postcopy_preempt_establish_channel(MigrationState *s)
 {
     /* If preempt not enabled, no need to wait */
     if (!migrate_postcopy_preempt()) {
         return 0;
     }
 
+    /* Kick off async task to establish preempt channel */
+    postcopy_preempt_setup(s);
+
     /*
      * We need the postcopy preempt channel to be established before
      * starting doing anything.
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index d5604cbcf1..b4867a32d5 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -192,6 +192,6 @@ enum PostcopyChannels {
 
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
 void postcopy_preempt_setup(MigrationState *s);
-int postcopy_preempt_wait_channel(MigrationState *s);
+int postcopy_preempt_establish_channel(MigrationState *s);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index e9cf4999ad..4ca45702c2 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2200,7 +2200,11 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
     qemu_sem_post(&mis->postcopy_pause_sem_fault);
 
     if (migrate_postcopy_preempt()) {
-        /* The channel should already be setup again; make sure of it */
+        /*
+         * The preempt channel will be created in async manner, now let's
+         * wait for it and make sure it's created.
+         */
+        qemu_sem_wait(&mis->postcopy_qemufile_dst_done);
         assert(mis->postcopy_qemufile_dst);
         /* Kick the fast ram load thread too */
         qemu_sem_post(&mis->postcopy_pause_sem_fast_load);
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] migration: Rework multi-channel checks on URI
  2023-02-02 21:24 ` [PATCH v2 1/3] migration: Rework multi-channel checks on URI Peter Xu
@ 2023-02-08 19:19   ` Juan Quintela
  2023-02-08 20:03     ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Juan Quintela @ 2023-02-08 19:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Daniel P . Berrange,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

Peter Xu <peterx@redhat.com> wrote:
> The whole idea of multi-channel checks was not properly done, IMHO.
>
> Currently we check multi-channel in a lot of places, but actually that's
> not needed because we only need to check it right after we get the URI and
> that should be it.
>
> If the URI check succeeded, we should never need to check it again because
> we must have it.  If it check fails, we should fail immediately on either
> the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
> the connection established.
>
> Neither should we fail any set capabiliities like what we used to do here:
>
> 5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)
>
> Because logically the URI will only be set later after the capability is
> set, so it doesn't make a lot of sense to check the URI type when setting
> the capability, because we're checking the cap with an old URI passed in,
> and that may not even be the URI we're going to use later.
>
> This patch mostly reverted all such checks for before, dropping the
> variable migrate_allow_multi_channels and helpers.  Instead, add a common
> helper to check URI for multi-channels for either qmp_migrate and
> qmp_migrate_incoming and that should do all the proper checks.  The failure
> will only trigger with the "migrate" or "migrate_incoming" command, or when
> user specified "-incoming xxx" where "xxx" is not "defer".
>
> With that, make postcopy_preempt_setup() as simple as creating the channel.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

The idea is right.  But I think that changing everything in a single
patch is confusing.

> ---
>  migration/migration.c    | 56 +++++++++++++++++++---------------------
>  migration/migration.h    |  3 ---
>  migration/multifd.c      | 12 ++-------
>  migration/postcopy-ram.c | 14 +---------
>  migration/postcopy-ram.h |  2 +-
>  5 files changed, 31 insertions(+), 56 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index f4f7d207f0..ef7fceb5d7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -182,16 +182,26 @@ static int migration_maybe_pause(MigrationState *s,
>                                   int new_state);
>  static void migrate_fd_cancel(MigrationState *s);
>  
> -static bool migrate_allow_multi_channels = true;
> +static bool migration_needs_multiple_sockets(void)
> +{
> +    return migrate_use_multifd() || migrate_postcopy_preempt();
> +}

This function (and use it) makes sense

> -void migrate_protocol_allow_multi_channels(bool allow)
> +static bool uri_supports_multi_channels(const char *uri)
>  {
> -    migrate_allow_multi_channels = allow;
> +    return strstart(uri, "tcp:", NULL) || strstart(uri, "unix:", NULL) ||
> +        strstart(uri, "vsock:", NULL);

Indentation is wrong.  Fixing it by hand.

This other is also ok with me.

>  }
>  
> -bool migrate_multi_channels_is_allowed(void)
> +static bool migration_uri_validate(const char *uri, Error **errp)
>  {
> -    return migrate_allow_multi_channels;
> +    if (migration_needs_multiple_sockets() &&
> +        !uri_supports_multi_channels(uri)) {
> +        error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
> +        return false;
> +    }
> +
> +    return true;
>  }

This name is not O:-)

What about:

migration_channels_and_uri_compatible()

No, it is not perfect, but I can think anything else.

But validate don't mean anything.  I can't know without looking at the
function  what is the meaning of the result.

>  static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
> @@ -491,12 +501,15 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
>  {
>      const char *p = NULL;
>  
> -    migrate_protocol_allow_multi_channels(false); /* reset it anyway */
> +    /* URI is not suitable for migration? */
> +    if (!migration_uri_validate(uri, errp)) {
> +        return;
> +    }
> +
>      qapi_event_send_migration(MIGRATION_STATUS_SETUP);
>      if (strstart(uri, "tcp:", &p) ||
>          strstart(uri, "unix:", NULL) ||
>          strstart(uri, "vsock:", NULL)) {
> -        migrate_protocol_allow_multi_channels(true);
>          socket_start_incoming_migration(p ? p : uri, errp);
>  #ifdef CONFIG_RDMA
>      } else if (strstart(uri, "rdma:", &p)) {
> @@ -721,11 +734,6 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
>      migration_incoming_process();
>  }
>  
> -static bool migration_needs_multiple_sockets(void)
> -{
> -    return migrate_use_multifd() || migrate_postcopy_preempt();
> -}
> -
>  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
> @@ -1347,15 +1355,6 @@ static bool migrate_caps_check(bool *cap_list,
>      }
>  #endif
>  
> -
> -    /* incoming side only */
> -    if (runstate_check(RUN_STATE_INMIGRATE) &&
> -        !migrate_multi_channels_is_allowed() &&
> -        cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
> -        error_setg(errp, "multifd is not supported by current protocol");
> -        return false;
> -    }
> -
>      if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
>          if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
>              error_setg(errp, "Postcopy preempt requires postcopy-ram");
> @@ -2440,6 +2439,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>      MigrationState *s = migrate_get_current();
>      const char *p = NULL;
>  
> +    /* URI is not suitable for migration? */
> +    if (!migration_uri_validate(uri, errp)) {
> +        return;
> +    }
> +
>      if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
>                           has_resume && resume, errp)) {
>          /* Error detected, put into errp */
> @@ -2452,11 +2456,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>          }
>      }
>  
> -    migrate_protocol_allow_multi_channels(false);
>      if (strstart(uri, "tcp:", &p) ||
>          strstart(uri, "unix:", NULL) ||
>          strstart(uri, "vsock:", NULL)) {
> -        migrate_protocol_allow_multi_channels(true);
>          socket_start_outgoing_migration(s, p ? p : uri, &local_err);
>  #ifdef CONFIG_RDMA
>      } else if (strstart(uri, "rdma:", &p)) {
> @@ -4309,12 +4311,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>      }
>  
>      /* This needs to be done before resuming a postcopy */
> -    if (postcopy_preempt_setup(s, &local_err)) {
> -        error_report_err(local_err);
> -        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> -                          MIGRATION_STATUS_FAILED);
> -        migrate_fd_cleanup(s);
> -        return;
> +    if (migrate_postcopy_preempt()) {
> +        postcopy_preempt_setup(s);
>      }

I think that this should go in a different patch.

Rest looks ok.

Thanks, Juan.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] migration: Add a semaphore to count PONGs
  2023-02-02 21:24 ` [PATCH v2 2/3] migration: Add a semaphore to count PONGs Peter Xu
@ 2023-02-08 19:19   ` Juan Quintela
  0 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2023-02-08 19:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Manish Mishra, Daniel P . Berrange,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

Peter Xu <peterx@redhat.com> wrote:
> This is mostly useless, but useful for us to know whether the main channel
> is correctly established without changing the migration protocol.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] migration: Rework multi-channel checks on URI
  2023-02-08 19:19   ` Juan Quintela
@ 2023-02-08 20:03     ` Peter Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2023-02-08 20:03 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Manish Mishra, Daniel P . Berrange,
	Dr . David Alan Gilbert, Leonardo Bras Soares Passos

On Wed, Feb 08, 2023 at 08:19:11PM +0100, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > The whole idea of multi-channel checks was not properly done, IMHO.
> >
> > Currently we check multi-channel in a lot of places, but actually that's
> > not needed because we only need to check it right after we get the URI and
> > that should be it.
> >
> > If the URI check succeeded, we should never need to check it again because
> > we must have it.  If it check fails, we should fail immediately on either
> > the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
> > the connection established.
> >
> > Neither should we fail any set capabiliities like what we used to do here:
> >
> > 5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)
> >
> > Because logically the URI will only be set later after the capability is
> > set, so it doesn't make a lot of sense to check the URI type when setting
> > the capability, because we're checking the cap with an old URI passed in,
> > and that may not even be the URI we're going to use later.
> >
> > This patch mostly reverted all such checks for before, dropping the
> > variable migrate_allow_multi_channels and helpers.  Instead, add a common
> > helper to check URI for multi-channels for either qmp_migrate and
> > qmp_migrate_incoming and that should do all the proper checks.  The failure
> > will only trigger with the "migrate" or "migrate_incoming" command, or when
> > user specified "-incoming xxx" where "xxx" is not "defer".
> >
> > With that, make postcopy_preempt_setup() as simple as creating the channel.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> The idea is right.  But I think that changing everything in a single
> patch is confusing.
> 
> > ---
> >  migration/migration.c    | 56 +++++++++++++++++++---------------------
> >  migration/migration.h    |  3 ---
> >  migration/multifd.c      | 12 ++-------
> >  migration/postcopy-ram.c | 14 +---------
> >  migration/postcopy-ram.h |  2 +-
> >  5 files changed, 31 insertions(+), 56 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index f4f7d207f0..ef7fceb5d7 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -182,16 +182,26 @@ static int migration_maybe_pause(MigrationState *s,
> >                                   int new_state);
> >  static void migrate_fd_cancel(MigrationState *s);
> >  
> > -static bool migrate_allow_multi_channels = true;
> > +static bool migration_needs_multiple_sockets(void)
> > +{
> > +    return migrate_use_multifd() || migrate_postcopy_preempt();
> > +}
> 
> This function (and use it) makes sense
> 
> > -void migrate_protocol_allow_multi_channels(bool allow)
> > +static bool uri_supports_multi_channels(const char *uri)
> >  {
> > -    migrate_allow_multi_channels = allow;
> > +    return strstart(uri, "tcp:", NULL) || strstart(uri, "unix:", NULL) ||
> > +        strstart(uri, "vsock:", NULL);
> 
> Indentation is wrong.  Fixing it by hand.

Will do.

> 
> This other is also ok with me.
> 
> >  }
> >  
> > -bool migrate_multi_channels_is_allowed(void)
> > +static bool migration_uri_validate(const char *uri, Error **errp)
> >  {
> > -    return migrate_allow_multi_channels;
> > +    if (migration_needs_multiple_sockets() &&
> > +        !uri_supports_multi_channels(uri)) {
> > +        error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
> > +        return false;
> > +    }
> > +
> > +    return true;
> >  }
> 
> This name is not O:-)
> 
> What about:
> 
> migration_channels_and_uri_compatible()
> 
> No, it is not perfect, but I can think anything else.
> 
> But validate don't mean anything.  I can't know without looking at the
> function  what is the meaning of the result.

I don't have an obvious preference; I think it means I'll just go ahead and
rename it. :)

> 
> >  static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
> > @@ -491,12 +501,15 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
> >  {
> >      const char *p = NULL;
> >  
> > -    migrate_protocol_allow_multi_channels(false); /* reset it anyway */
> > +    /* URI is not suitable for migration? */
> > +    if (!migration_uri_validate(uri, errp)) {
> > +        return;
> > +    }
> > +
> >      qapi_event_send_migration(MIGRATION_STATUS_SETUP);
> >      if (strstart(uri, "tcp:", &p) ||
> >          strstart(uri, "unix:", NULL) ||
> >          strstart(uri, "vsock:", NULL)) {
> > -        migrate_protocol_allow_multi_channels(true);
> >          socket_start_incoming_migration(p ? p : uri, errp);
> >  #ifdef CONFIG_RDMA
> >      } else if (strstart(uri, "rdma:", &p)) {
> > @@ -721,11 +734,6 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
> >      migration_incoming_process();
> >  }
> >  
> > -static bool migration_needs_multiple_sockets(void)
> > -{
> > -    return migrate_use_multifd() || migrate_postcopy_preempt();
> > -}
> > -
> >  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> >  {
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > @@ -1347,15 +1355,6 @@ static bool migrate_caps_check(bool *cap_list,
> >      }
> >  #endif
> >  
> > -
> > -    /* incoming side only */
> > -    if (runstate_check(RUN_STATE_INMIGRATE) &&
> > -        !migrate_multi_channels_is_allowed() &&
> > -        cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
> > -        error_setg(errp, "multifd is not supported by current protocol");
> > -        return false;
> > -    }
> > -
> >      if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
> >          if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
> >              error_setg(errp, "Postcopy preempt requires postcopy-ram");
> > @@ -2440,6 +2439,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
> >      MigrationState *s = migrate_get_current();
> >      const char *p = NULL;
> >  
> > +    /* URI is not suitable for migration? */
> > +    if (!migration_uri_validate(uri, errp)) {
> > +        return;
> > +    }
> > +
> >      if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
> >                           has_resume && resume, errp)) {
> >          /* Error detected, put into errp */
> > @@ -2452,11 +2456,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
> >          }
> >      }
> >  
> > -    migrate_protocol_allow_multi_channels(false);
> >      if (strstart(uri, "tcp:", &p) ||
> >          strstart(uri, "unix:", NULL) ||
> >          strstart(uri, "vsock:", NULL)) {
> > -        migrate_protocol_allow_multi_channels(true);
> >          socket_start_outgoing_migration(s, p ? p : uri, &local_err);
> >  #ifdef CONFIG_RDMA
> >      } else if (strstart(uri, "rdma:", &p)) {
> > @@ -4309,12 +4311,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> >      }
> >  
> >      /* This needs to be done before resuming a postcopy */
> > -    if (postcopy_preempt_setup(s, &local_err)) {
> > -        error_report_err(local_err);
> > -        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > -                          MIGRATION_STATUS_FAILED);
> > -        migrate_fd_cleanup(s);
> > -        return;
> > +    if (migrate_postcopy_preempt()) {
> > +        postcopy_preempt_setup(s);
> >      }
> 
> I think that this should go in a different patch.

It's so small and natural so I "hid" it in.  But I agree, I'll split.

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-08 20:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 21:24 [PATCH v2 0/3] migration: Fix disorder of channel creations Peter Xu
2023-02-02 21:24 ` [PATCH v2 1/3] migration: Rework multi-channel checks on URI Peter Xu
2023-02-08 19:19   ` Juan Quintela
2023-02-08 20:03     ` Peter Xu
2023-02-02 21:24 ` [PATCH v2 2/3] migration: Add a semaphore to count PONGs Peter Xu
2023-02-08 19:19   ` Juan Quintela
2023-02-02 21:24 ` [PATCH v2 3/3] migration: Postpone postcopy preempt channel to be after main Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.