All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] migration: Postcopy Preemption
@ 2022-02-16  6:27 Peter Xu
  2022-02-16  6:27 ` [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
                   ` (20 more replies)
  0 siblings, 21 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

This is v1 of postcopy preempt series.  It can also be found here:

  https://github.com/xzpeter/qemu/tree/postcopy-preempt

This series added a new migration capability called "postcopy-preempt".  It can
be enabled when postcopy is enabled, and it'll simply (but greatly) speed up
postcopy page requests handling process.

  |----------------+--------------+-----------------------|
  | Host page size | Vanilla (ms) | Postcopy Preempt (ms) |
  |----------------+--------------+-----------------------|
  | 2M             |        10.58 |                  4.96 |
  | 4K             |        10.68 |                  0.57 |
  |----------------+--------------+-----------------------|

The major change since RFC is:

  - The very large patch is split into smaller ones
  - Added postcopy recovery support, and its unit test

The RFC series actually broke postcopy recovery on huge pages, and this version
will also have that issue fixed.

Just a quick note: this series is partly preparing for the doublemap support
too in the future.  The channel separation speedup will be beneficial for both
current postcopy or when doublemap is ready.  The huge page preemption part may
only benefit current postcopy, and it won't be enabled in the future doublemap
support because in that new doublemap world there will have no huge pages at
all being mapped.

The new patch layout:

Patch 1-3: Three leftover patches from patchset "[PATCH v3 0/8] migration:
Postcopy cleanup on ram disgard" that I picked up here too.

  https://lore.kernel.org/qemu-devel/20211224065000.97572-1-peterx@redhat.com/

  migration: Dump sub-cmd name in loadvm_process_command tp
  migration: Finer grained tracepoints for POSTCOPY_LISTEN
  migration: Tracepoint change in postcopy-run bottom half

Patch 4-9: Original postcopy preempt RFC preparation patches (with slight
modifications).

  migration: Introduce postcopy channels on dest node
  migration: Dump ramblock and offset too when non-same-page detected
  migration: Add postcopy_thread_create()
  migration: Move static var in ram_block_from_stream() into global
  migration: Add pss.postcopy_requested status
  migration: Move migrate_allow_multifd and helpers into migration.c

Patch 10-15: Some newly added patches when working on postcopy recovery
support.  After these patches migrate-recover command will allow re-entrance,
which is a very nice side effect.

  migration: Enlarge postcopy recovery to capture !-EIO too
  migration: postcopy_pause_fault_thread() never fails
  migration: Export ram_load_postcopy()
  migration: Move channel setup out of postcopy_try_recover()
  migration: Add migration_incoming_transport_cleanup()
  migration: Allow migrate-recover to run multiple times

Patch 16-19: The major work of postcopy preemption implementation is split into
four patches as suggested by Dave.

  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled

Patch 20: the test case.

  tests: Add postcopy preempt test

For more information, feel free to refer to the RFC series cover letter:

  https://lore.kernel.org/qemu-devel/20220119080929.39485-1-peterx@redhat.com/

Please review, thanks.

Peter Xu (20):
  migration: Dump sub-cmd name in loadvm_process_command tp
  migration: Finer grained tracepoints for POSTCOPY_LISTEN
  migration: Tracepoint change in postcopy-run bottom half
  migration: Introduce postcopy channels on dest node
  migration: Dump ramblock and offset too when non-same-page detected
  migration: Add postcopy_thread_create()
  migration: Move static var in ram_block_from_stream() into global
  migration: Add pss.postcopy_requested status
  migration: Move migrate_allow_multifd and helpers into migration.c
  migration: Enlarge postcopy recovery to capture !-EIO too
  migration: postcopy_pause_fault_thread() never fails
  migration: Export ram_load_postcopy()
  migration: Move channel setup out of postcopy_try_recover()
  migration: Add migration_incoming_transport_cleanup()
  migration: Allow migrate-recover to run multiple times
  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled
  tests: Add postcopy preempt test

 migration/migration.c        | 184 +++++++++++++++-----
 migration/migration.h        |  64 ++++++-
 migration/multifd.c          |  19 +--
 migration/multifd.h          |   2 -
 migration/postcopy-ram.c     | 208 ++++++++++++++++++-----
 migration/postcopy-ram.h     |  14 ++
 migration/ram.c              | 320 +++++++++++++++++++++++++++++++----
 migration/ram.h              |   3 +
 migration/savevm.c           |  66 ++++++--
 migration/socket.c           |  22 ++-
 migration/socket.h           |   1 +
 migration/trace-events       |  19 ++-
 qapi/migration.json          |   8 +-
 tests/qtest/migration-test.c |  39 ++++-
 14 files changed, 803 insertions(+), 166 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16 15:42   ` Dr. David Alan Gilbert
  2022-02-16  6:27 ` [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

It'll be easier to read the name rather than index of sub-cmd when debugging.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 3 ++-
 migration/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 1599b02fbc..7bb65e1d61 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2273,12 +2273,13 @@ static int loadvm_process_command(QEMUFile *f)
         return qemu_file_get_error(f);
     }
 
-    trace_loadvm_process_command(cmd, len);
     if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
         error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
         return -EINVAL;
     }
 
+    trace_loadvm_process_command(mig_cmd_args[cmd].name, len);
+
     if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
         error_report("%s received with bad length - expecting %zu, got %d",
                      mig_cmd_args[cmd].name,
diff --git a/migration/trace-events b/migration/trace-events
index 48aa7b10ee..123cfe79d7 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -22,7 +22,7 @@ loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
-loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command(const char *s, uint16_t len) "com=%s len=%d"
 loadvm_process_command_ping(uint32_t val) "0x%x"
 postcopy_ram_listen_thread_exit(void) ""
 postcopy_ram_listen_thread_start(void) ""
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
  2022-02-16  6:27 ` [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16 15:43   ` Dr. David Alan Gilbert
  2022-02-16  6:27 ` [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half Peter Xu
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

The enablement of postcopy listening has a few steps, add a few tracepoints to
be there ready for some basic measurements for them.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 9 ++++++++-
 migration/trace-events | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7bb65e1d61..190cc5fc42 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1948,9 +1948,10 @@ static void *postcopy_ram_listen_thread(void *opaque)
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
-    trace_loadvm_postcopy_handle_listen();
     Error *local_err = NULL;
 
+    trace_loadvm_postcopy_handle_listen("enter");
+
     if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
         error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
         return -1;
@@ -1965,6 +1966,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         }
     }
 
+    trace_loadvm_postcopy_handle_listen("after discard");
+
     /*
      * Sensitise RAM - can now generate requests for blocks that don't exist
      * However, at this point the CPU shouldn't be running, and the IO
@@ -1977,6 +1980,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         }
     }
 
+    trace_loadvm_postcopy_handle_listen("after uffd");
+
     if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
         error_report_err(local_err);
         return -1;
@@ -1991,6 +1996,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
 
+    trace_loadvm_postcopy_handle_listen("return");
+
     return 0;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index 123cfe79d7..92596c00d8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -14,7 +14,7 @@ loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
-loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
 loadvm_postcopy_handle_run_cpu_sync(void) ""
 loadvm_postcopy_handle_run_vmstart(void) ""
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
  2022-02-16  6:27 ` [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
  2022-02-16  6:27 ` [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16 19:00   ` Dr. David Alan Gilbert
  2022-02-16  6:27 ` [PATCH 04/20] migration: Introduce postcopy channels on dest node Peter Xu
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Remove the old two tracepoints and they're even near each other:

    trace_loadvm_postcopy_handle_run_cpu_sync()
    trace_loadvm_postcopy_handle_run_vmstart()

Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 12 +++++++++---
 migration/trace-events |  3 +--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 190cc5fc42..41e3238798 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2006,13 +2006,19 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
+    trace_loadvm_postcopy_handle_run_bh("enter");
+
     /* TODO we should move all of this lot into postcopy_ram.c or a shared code
      * in migration.c
      */
     cpu_synchronize_all_post_init();
 
+    trace_loadvm_postcopy_handle_run_bh("after cpu sync");
+
     qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
+    trace_loadvm_postcopy_handle_run_bh("after announce");
+
     /* Make sure all file formats flush their mutable metadata.
      * If we get an error here, just don't restart the VM yet. */
     bdrv_invalidate_cache_all(&local_err);
@@ -2022,9 +2028,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
         autostart = false;
     }
 
-    trace_loadvm_postcopy_handle_run_cpu_sync();
-
-    trace_loadvm_postcopy_handle_run_vmstart();
+    trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
 
     dirty_bitmap_mig_before_vm_start();
 
@@ -2037,6 +2041,8 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     }
 
     qemu_bh_delete(mis->bh);
+
+    trace_loadvm_postcopy_handle_run_bh("return");
 }
 
 /* After all discards we can start running and asking for pages */
diff --git a/migration/trace-events b/migration/trace-events
index 92596c00d8..1aec580e92 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -16,8 +16,7 @@ loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
-loadvm_postcopy_handle_run_cpu_sync(void) ""
-loadvm_postcopy_handle_run_vmstart(void) ""
+loadvm_postcopy_handle_run_bh(const char *str) "%s"
 loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 04/20] migration: Introduce postcopy channels on dest node
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (2 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-21 15:49   ` Dr. David Alan Gilbert
  2022-02-16  6:27 ` [PATCH 05/20] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.

It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.

Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().

To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.

Meanwhile we need to maintain the tmp huge page status per-channel too.

To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:

  - all_zero:     this keeps whether this huge page contains all zeros
  - target_pages: this counts how many target pages have been copied
  - host_page:    this keeps the host ptr for the page to install

Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
need one structure to keep the state around.

For vanilla postcopy, obviously there's only one channel.  It contains both
precopy and postcopy pages.

This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable.  Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).

Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd).  In this patch there's a TODO marked
for that; so far the channels is always set to 1.

We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage.  One temp host
huge page for each channel will be enough for us for now.

Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    | 36 +++++++++++++++++++++++-
 migration/postcopy-ram.c | 60 ++++++++++++++++++++++++++++++----------
 migration/ram.c          | 43 ++++++++++++++--------------
 migration/savevm.c       | 12 ++++++++
 4 files changed, 113 insertions(+), 38 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..42c7395094 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -45,6 +45,24 @@ struct PostcopyBlocktimeContext;
  */
 #define CLEAR_BITMAP_SHIFT_MAX            31
 
+/* This is an abstraction of a "temp huge page" for postcopy's purpose */
+typedef struct {
+    /*
+     * This points to a temporary huge page as a buffer for UFFDIO_COPY.  It's
+     * mmap()ed and needs to be freed when cleanup.
+     */
+    void *tmp_huge_page;
+    /*
+     * This points to the host page we're going to install for this temp page.
+     * It tells us after we've received the whole page, where we should put it.
+     */
+    void *host_addr;
+    /* Number of small pages copied (in size of TARGET_PAGE_SIZE) */
+    unsigned int target_pages;
+    /* Whether this page contains all zeros */
+    bool all_zero;
+} PostcopyTmpPage;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -81,7 +99,22 @@ struct MigrationIncomingState {
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     /* RAMBlock of last request sent to source */
     RAMBlock *last_rb;
-    void     *postcopy_tmp_page;
+    /*
+     * Number of postcopy channels including the default precopy channel, so
+     * vanilla postcopy will only contain one channel which contain both
+     * precopy and postcopy streams.
+     *
+     * This is calculated when the src requests to enable postcopy but before
+     * it starts.  Its value can depend on e.g. whether postcopy preemption is
+     * enabled.
+     */
+    unsigned int postcopy_channels;
+    /*
+     * An array of temp host huge pages to be used, one for each postcopy
+     * channel.
+     */
+    PostcopyTmpPage *postcopy_tmp_pages;
+    /* This is shared for all postcopy channels */
     void     *postcopy_tmp_zero_page;
     /* PostCopyFD's for external userfaultfds & handlers of shared memory */
     GArray   *postcopy_remote_fds;
@@ -391,5 +424,6 @@ bool migration_rate_limit(void);
 void migration_cancel(const Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
+void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e662dd05cc..315f784965 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -525,9 +525,18 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis)
 
 static void postcopy_temp_pages_cleanup(MigrationIncomingState *mis)
 {
-    if (mis->postcopy_tmp_page) {
-        munmap(mis->postcopy_tmp_page, mis->largest_page_size);
-        mis->postcopy_tmp_page = NULL;
+    int i;
+
+    if (mis->postcopy_tmp_pages) {
+        for (i = 0; i < mis->postcopy_channels; i++) {
+            if (mis->postcopy_tmp_pages[i].tmp_huge_page) {
+                munmap(mis->postcopy_tmp_pages[i].tmp_huge_page,
+                       mis->largest_page_size);
+                mis->postcopy_tmp_pages[i].tmp_huge_page = NULL;
+            }
+        }
+        g_free(mis->postcopy_tmp_pages);
+        mis->postcopy_tmp_pages = NULL;
     }
 
     if (mis->postcopy_tmp_zero_page) {
@@ -1091,17 +1100,30 @@ retry:
 
 static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
 {
-    int err;
-
-    mis->postcopy_tmp_page = mmap(NULL, mis->largest_page_size,
-                                  PROT_READ | PROT_WRITE,
-                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-    if (mis->postcopy_tmp_page == MAP_FAILED) {
-        err = errno;
-        mis->postcopy_tmp_page = NULL;
-        error_report("%s: Failed to map postcopy_tmp_page %s",
-                     __func__, strerror(err));
-        return -err;
+    PostcopyTmpPage *tmp_page;
+    int err, i, channels;
+    void *temp_page;
+
+    /* TODO: will be boosted when enable postcopy preemption */
+    mis->postcopy_channels = 1;
+
+    channels = mis->postcopy_channels;
+    mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
+
+    for (i = 0; i < channels; i++) {
+        tmp_page = &mis->postcopy_tmp_pages[i];
+        temp_page = mmap(NULL, mis->largest_page_size, PROT_READ | PROT_WRITE,
+                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+        if (temp_page == MAP_FAILED) {
+            err = errno;
+            error_report("%s: Failed to map postcopy_tmp_pages[%d]: %s",
+                         __func__, i, strerror(err));
+            /* Clean up will be done later */
+            return -err;
+        }
+        tmp_page->tmp_huge_page = temp_page;
+        /* Initialize default states for each tmp page */
+        postcopy_temp_page_reset(tmp_page);
     }
 
     /*
@@ -1351,6 +1373,16 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
 #endif
 
 /* ------------------------------------------------------------------------- */
+void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page)
+{
+    tmp_page->target_pages = 0;
+    tmp_page->host_addr = NULL;
+    /*
+     * This is set to true when reset, and cleared as long as we received any
+     * of the non-zero small page within this huge page.
+     */
+    tmp_page->all_zero = true;
+}
 
 void postcopy_fault_thread_notify(MigrationIncomingState *mis)
 {
diff --git a/migration/ram.c b/migration/ram.c
index 91ca743ac8..36b0a53afe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3640,11 +3640,8 @@ static int ram_load_postcopy(QEMUFile *f)
     bool place_needed = false;
     bool matches_target_page_size = false;
     MigrationIncomingState *mis = migration_incoming_get_current();
-    /* Temporary page that is later 'placed' */
-    void *postcopy_host_page = mis->postcopy_tmp_page;
-    void *host_page = NULL;
-    bool all_zero = true;
-    int target_pages = 0;
+    /* Currently we only use channel 0.  TODO: use all the channels */
+    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[0];
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
@@ -3688,7 +3685,7 @@ static int ram_load_postcopy(QEMUFile *f)
                 ret = -EINVAL;
                 break;
             }
-            target_pages++;
+            tmp_page->target_pages++;
             matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
             /*
              * Postcopy requires that we place whole host pages atomically;
@@ -3700,15 +3697,16 @@ static int ram_load_postcopy(QEMUFile *f)
              * however the source ensures it always sends all the components
              * of a host page in one chunk.
              */
-            page_buffer = postcopy_host_page +
+            page_buffer = tmp_page->tmp_huge_page +
                           host_page_offset_from_ram_block_offset(block, addr);
             /* If all TP are zero then we can optimise the place */
-            if (target_pages == 1) {
-                host_page = host_page_from_ram_block_offset(block, addr);
-            } else if (host_page != host_page_from_ram_block_offset(block,
-                                                                    addr)) {
+            if (tmp_page->target_pages == 1) {
+                tmp_page->host_addr =
+                    host_page_from_ram_block_offset(block, addr);
+            } else if (tmp_page->host_addr !=
+                       host_page_from_ram_block_offset(block, addr)) {
                 /* not the 1st TP within the HP */
-                error_report("Non-same host page %p/%p", host_page,
+                error_report("Non-same host page %p/%p", tmp_page->host_addr,
                              host_page_from_ram_block_offset(block, addr));
                 ret = -EINVAL;
                 break;
@@ -3718,10 +3716,11 @@ static int ram_load_postcopy(QEMUFile *f)
              * If it's the last part of a host page then we place the host
              * page
              */
-            if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
+            if (tmp_page->target_pages ==
+                (block->page_size / TARGET_PAGE_SIZE)) {
                 place_needed = true;
             }
-            place_source = postcopy_host_page;
+            place_source = tmp_page->tmp_huge_page;
         }
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
@@ -3735,12 +3734,12 @@ static int ram_load_postcopy(QEMUFile *f)
                 memset(page_buffer, ch, TARGET_PAGE_SIZE);
             }
             if (ch) {
-                all_zero = false;
+                tmp_page->all_zero = false;
             }
             break;
 
         case RAM_SAVE_FLAG_PAGE:
-            all_zero = false;
+            tmp_page->all_zero = false;
             if (!matches_target_page_size) {
                 /* For huge pages, we always use temporary buffer */
                 qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
@@ -3758,7 +3757,7 @@ static int ram_load_postcopy(QEMUFile *f)
             }
             break;
         case RAM_SAVE_FLAG_COMPRESS_PAGE:
-            all_zero = false;
+            tmp_page->all_zero = false;
             len = qemu_get_be32(f);
             if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
                 error_report("Invalid compressed data length: %d", len);
@@ -3790,16 +3789,14 @@ static int ram_load_postcopy(QEMUFile *f)
         }
 
         if (!ret && place_needed) {
-            if (all_zero) {
-                ret = postcopy_place_page_zero(mis, host_page, block);
+            if (tmp_page->all_zero) {
+                ret = postcopy_place_page_zero(mis, tmp_page->host_addr, block);
             } else {
-                ret = postcopy_place_page(mis, host_page, place_source,
+                ret = postcopy_place_page(mis, tmp_page->host_addr, place_source,
                                           block);
             }
             place_needed = false;
-            target_pages = 0;
-            /* Assume we have a zero page until we detect something different */
-            all_zero = true;
+            postcopy_temp_page_reset(tmp_page);
         }
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 41e3238798..0ccd7e5e3f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2579,6 +2579,18 @@ void qemu_loadvm_state_cleanup(void)
 /* Return true if we should continue the migration, or false. */
 static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
+    int i;
+
+    /*
+     * If network is interrupted, any temp page we received will be useless
+     * because we didn't mark them as "received" in receivedmap.  After a
+     * proper recovery later (which will sync src dirty bitmap with receivedmap
+     * on dest) these cached small pages will be resent again.
+     */
+    for (i = 0; i < mis->postcopy_channels; i++) {
+        postcopy_temp_page_reset(&mis->postcopy_tmp_pages[i]);
+    }
+
     trace_postcopy_pause_incoming();
 
     assert(migrate_postcopy_ram());
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 05/20] migration: Dump ramblock and offset too when non-same-page detected
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (3 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 04/20] migration: Introduce postcopy channels on dest node Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16  6:27 ` [PATCH 06/20] migration: Add postcopy_thread_create() Peter Xu
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging.  Adding ramblock & offset into the
error log too.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 36b0a53afe..87bcb704d4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3706,8 +3706,12 @@ static int ram_load_postcopy(QEMUFile *f)
             } else if (tmp_page->host_addr !=
                        host_page_from_ram_block_offset(block, addr)) {
                 /* not the 1st TP within the HP */
-                error_report("Non-same host page %p/%p", tmp_page->host_addr,
-                             host_page_from_ram_block_offset(block, addr));
+                error_report("Non-same host page detected.  Target host page %p, "
+                             "received host page %p "
+                             "(rb %s offset 0x"RAM_ADDR_FMT" target_pages %d)",
+                             tmp_page->host_addr,
+                             host_page_from_ram_block_offset(block, addr),
+                             block->idstr, addr, tmp_page->target_pages);
                 ret = -EINVAL;
                 break;
             }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 06/20] migration: Add postcopy_thread_create()
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (4 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 05/20] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-21 16:00   ` Dr. David Alan Gilbert
  2022-02-16  6:27 ` [PATCH 07/20] migration: Move static var in ram_block_from_stream() into global Peter Xu
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread.  Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.

Make it a shared infrastructure so it's easier to create yet another thread.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  8 +++++---
 migration/postcopy-ram.c | 23 +++++++++++++++++------
 migration/postcopy-ram.h |  4 ++++
 migration/savevm.c       | 12 +++---------
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 42c7395094..8445e1d14a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -70,7 +70,11 @@ struct MigrationIncomingState {
     /* A hook to allow cleanup at the end of incoming migration */
     void *transport_data;
     void (*transport_cleanup)(void *data);
-
+    /*
+     * Used to sync thread creations.  Note that we can't create threads in
+     * parallel with this sem.
+     */
+    QemuSemaphore  thread_sync_sem;
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
@@ -83,13 +87,11 @@ struct MigrationIncomingState {
     size_t         largest_page_size;
     bool           have_fault_thread;
     QemuThread     fault_thread;
-    QemuSemaphore  fault_thread_sem;
     /* Set this when we want the fault thread to quit */
     bool           fault_thread_quit;
 
     bool           have_listen_thread;
     QemuThread     listen_thread;
-    QemuSemaphore  listen_thread_sem;
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 315f784965..d3ec22e6de 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -77,6 +77,20 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
                                             &pnd);
 }
 
+/*
+ * NOTE: this routine is not thread safe, we can't call it concurrently. But it
+ * should be good enough for migration's purposes.
+ */
+void postcopy_thread_create(MigrationIncomingState *mis,
+                            QemuThread *thread, const char *name,
+                            void *(*fn)(void *), int joinable)
+{
+    qemu_sem_init(&mis->thread_sync_sem, 0);
+    qemu_thread_create(thread, name, fn, mis, joinable);
+    qemu_sem_wait(&mis->thread_sync_sem);
+    qemu_sem_destroy(&mis->thread_sync_sem);
+}
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -901,7 +915,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     trace_postcopy_ram_fault_thread_entry();
     rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
-    qemu_sem_post(&mis->fault_thread_sem);
+    qemu_sem_post(&mis->thread_sync_sem);
 
     struct pollfd *pfd;
     size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
@@ -1172,11 +1186,8 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
-    qemu_sem_init(&mis->fault_thread_sem, 0);
-    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
-    qemu_sem_wait(&mis->fault_thread_sem);
-    qemu_sem_destroy(&mis->fault_thread_sem);
+    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
+                           postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
     mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 6d2b3cf124..07684c0e1d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -135,6 +135,10 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
 /* Call the notifier list set by postcopy_add_start_notifier */
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
 
+void postcopy_thread_create(MigrationIncomingState *mis,
+                            QemuThread *thread, const char *name,
+                            void *(*fn)(void *), int joinable);
+
 struct PostCopyFD;
 
 /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ccd7e5e3f..967ff80547 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1863,7 +1863,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    qemu_sem_post(&mis->listen_thread_sem);
+    qemu_sem_post(&mis->thread_sync_sem);
     trace_postcopy_ram_listen_thread_start();
 
     rcu_register_thread();
@@ -1988,14 +1988,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     }
 
     mis->have_listen_thread = true;
-    /* Start up the listening thread and wait for it to signal ready */
-    qemu_sem_init(&mis->listen_thread_sem, 0);
-    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, NULL,
-                       QEMU_THREAD_DETACHED);
-    qemu_sem_wait(&mis->listen_thread_sem);
-    qemu_sem_destroy(&mis->listen_thread_sem);
-
+    postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen",
+                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
     trace_loadvm_postcopy_handle_listen("return");
 
     return 0;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 07/20] migration: Move static var in ram_block_from_stream() into global
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (5 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 06/20] migration: Add postcopy_thread_create() Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16  6:27 ` [PATCH 08/20] migration: Add pss.postcopy_requested status Peter Xu
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Static variable is very unfriendly to threading of ram_block_from_stream().
Move it into MigrationIncomingState.

Make the incoming state pointer to be passed over to ram_block_from_stream() on
both caller sites.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  3 ++-
 migration/ram.c       | 13 +++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8445e1d14a..d8b9850eae 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -66,7 +66,8 @@ typedef struct {
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
-
+    /* Previously received RAM's RAMBlock pointer */
+    RAMBlock *last_recv_block;
     /* A hook to allow cleanup at the end of incoming migration */
     void *transport_data;
     void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index 87bcb704d4..25a3ab5150 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3184,12 +3184,14 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
  *
  * Returns a pointer from within the RCU-protected ram_list.
  *
+ * @mis: the migration incoming state pointer
  * @f: QEMUFile where to read the data from
  * @flags: Page flags (mostly to see if it's a continuation of previous block)
  */
-static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
+static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
+                                              QEMUFile *f, int flags)
 {
-    static RAMBlock *block;
+    RAMBlock *block = mis->last_recv_block;
     char id[256];
     uint8_t len;
 
@@ -3216,6 +3218,8 @@ static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
         return NULL;
     }
 
+    mis->last_recv_block = block;
+
     return block;
 }
 
@@ -3668,7 +3672,7 @@ static int ram_load_postcopy(QEMUFile *f)
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
-            block = ram_block_from_stream(f, flags);
+            block = ram_block_from_stream(mis, f, flags);
             if (!block) {
                 ret = -EINVAL;
                 break;
@@ -3880,6 +3884,7 @@ void colo_flush_ram_cache(void)
  */
 static int ram_load_precopy(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
     bool postcopy_advised = postcopy_is_advised();
@@ -3918,7 +3923,7 @@ static int ram_load_precopy(QEMUFile *f)
 
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-            RAMBlock *block = ram_block_from_stream(f, flags);
+            RAMBlock *block = ram_block_from_stream(mis, f, flags);
 
             host = host_from_ram_block_offset(block, addr);
             /*
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 08/20] migration: Add pss.postcopy_requested status
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (6 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 07/20] migration: Move static var in ram_block_from_stream() into global Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16  6:27 ` [PATCH 09/20] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

This boolean flag shows whether the current page during migration is triggered
by postcopy or not.  Then in ram_save_host_page() and deeper stack we'll be
able to have a reference on the priority of this page.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 25a3ab5150..1ed70b17d7 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -413,6 +413,8 @@ struct PageSearchStatus {
     unsigned long page;
     /* Set once we wrap around */
     bool         complete_round;
+    /* Whether current page is explicitly requested by postcopy */
+    bool         postcopy_requested;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -1486,6 +1488,9 @@ retry:
  */
 static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
 {
+    /* This is not a postcopy requested page */
+    pss->postcopy_requested = false;
+
     pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
     if (pss->complete_round && pss->block == rs->last_seen_block &&
         pss->page >= rs->last_page) {
@@ -1980,6 +1985,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          * really rare.
          */
         pss->complete_round = false;
+        pss->postcopy_requested = true;
     }
 
     return !!block;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 09/20] migration: Move migrate_allow_multifd and helpers into migration.c
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (7 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 08/20] migration: Add pss.postcopy_requested status Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-16  6:27 ` [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

This variable, along with its helpers, is used to detect whether multiple
channel will be supported for migration.  In follow up patches, there'll be
other capability that requires multi-channels.  Hence move it outside multifd
specific code and make it public.  Meanwhile rename it from "multifd" to
"multi_channels" to show its real meaning.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 22 +++++++++++++++++-----
 migration/migration.h |  3 +++
 migration/multifd.c   | 19 ++++---------------
 migration/multifd.h   |  2 --
 4 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bcc385b94b..6e4cc9cc87 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -180,6 +180,18 @@ static int migration_maybe_pause(MigrationState *s,
                                  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 
+static bool migrate_allow_multi_channels = true;
+
+void migrate_protocol_allow_multi_channels(bool allow)
+{
+    migrate_allow_multi_channels = allow;
+}
+
+bool migrate_multi_channels_is_allowed(void)
+{
+    return migrate_allow_multi_channels;
+}
+
 static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
 {
     uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
@@ -463,12 +475,12 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
 
-    migrate_protocol_allow_multifd(false); /* reset it anyway */
+    migrate_protocol_allow_multi_channels(false); /* reset it anyway */
     qapi_event_send_migration(MIGRATION_STATUS_SETUP);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
-        migrate_protocol_allow_multifd(true);
+        migrate_protocol_allow_multi_channels(true);
         socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
@@ -1255,7 +1267,7 @@ static bool migrate_caps_check(bool *cap_list,
 
     /* incoming side only */
     if (runstate_check(RUN_STATE_INMIGRATE) &&
-        !migrate_multifd_is_allowed() &&
+        !migrate_multi_channels_is_allowed() &&
         cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
         error_setg(errp, "multifd is not supported by current protocol");
         return false;
@@ -2313,11 +2325,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         }
     }
 
-    migrate_protocol_allow_multifd(false);
+    migrate_protocol_allow_multi_channels(false);
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
-        migrate_protocol_allow_multifd(true);
+        migrate_protocol_allow_multi_channels(true);
         socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
diff --git a/migration/migration.h b/migration/migration.h
index d8b9850eae..d677a750c9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -429,4 +429,7 @@ void migration_cancel(const Error *error);
 void populate_vfio_info(MigrationInfo *info);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+bool migrate_multi_channels_is_allowed(void);
+void migrate_protocol_allow_multi_channels(bool allow);
+
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 76b57a7177..180586dcde 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -517,7 +517,7 @@ void multifd_save_cleanup(void)
 {
     int i;
 
-    if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
+    if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
         return;
     }
     multifd_send_terminate_threads(NULL);
@@ -858,17 +858,6 @@ cleanup:
     multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
 
-static bool migrate_allow_multifd = true;
-void migrate_protocol_allow_multifd(bool allow)
-{
-    migrate_allow_multifd = allow;
-}
-
-bool migrate_multifd_is_allowed(void)
-{
-    return migrate_allow_multifd;
-}
-
 int multifd_save_setup(Error **errp)
 {
     int thread_count;
@@ -879,7 +868,7 @@ int multifd_save_setup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
-    if (!migrate_multifd_is_allowed()) {
+    if (!migrate_multi_channels_is_allowed()) {
         error_setg(errp, "multifd is not supported by current protocol");
         return -1;
     }
@@ -980,7 +969,7 @@ int multifd_load_cleanup(Error **errp)
 {
     int i;
 
-    if (!migrate_use_multifd() || !migrate_multifd_is_allowed()) {
+    if (!migrate_use_multifd() || !migrate_multi_channels_is_allowed()) {
         return 0;
     }
     multifd_recv_terminate_threads(NULL);
@@ -1129,7 +1118,7 @@ int multifd_load_setup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
-    if (!migrate_multifd_is_allowed()) {
+    if (!migrate_multi_channels_is_allowed()) {
         error_setg(errp, "multifd is not supported by current protocol");
         return -1;
     }
diff --git a/migration/multifd.h b/migration/multifd.h
index 4dda900a0b..636e599395 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,8 +13,6 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
-bool migrate_multifd_is_allowed(void);
-void migrate_protocol_allow_multifd(bool allow);
 int multifd_save_setup(Error **errp);
 void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (8 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 09/20] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
@ 2022-02-16  6:27 ` Peter Xu
  2022-02-21 16:15   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails Peter Xu
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery.  That's based on the assumption that
we'll only return -EIO for channel issues.

It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.

I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked.  Neither did I encounter anything outside the -EIO error.

However we'd better touch that up before it triggers a rare VM data loss during
live migrating.

To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 4 ++--
 migration/postcopy-ram.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6e4cc9cc87..67520d3105 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2877,7 +2877,7 @@ retry:
 out:
     res = qemu_file_get_error(rp);
     if (res) {
-        if (res == -EIO && migration_in_postcopy()) {
+        if (res && migration_in_postcopy()) {
             /*
              * Maybe there is something we can do: it looks like a
              * network down issue, and we pause for a recovery.
@@ -3478,7 +3478,7 @@ static MigThrError migration_detect_error(MigrationState *s)
         error_free(local_error);
     }
 
-    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
         /*
          * For postcopy, we allow the network to be down for a
          * while. After that, it can be continued by a
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d3ec22e6de..6be510fea4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1038,7 +1038,7 @@ retry:
                                         msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
-                if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                if (postcopy_pause_fault_thread(mis)) {
                     /* We got reconnected somehow, try to continue */
                     goto retry;
                 } else {
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (9 preceding siblings ...)
  2022-02-16  6:27 ` [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-21 16:16   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 12/20] migration: Export ram_load_postcopy() Peter Xu
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Per the title, remove the return code and simplify the callers as the errors
will never be triggered.  No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 6be510fea4..738cc55fa6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -890,15 +890,11 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
                                       affected_cpu);
 }
 
-static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_fault_thread();
-
     qemu_sem_wait(&mis->postcopy_pause_sem_fault);
-
     trace_postcopy_pause_fault_thread_continued();
-
-    return true;
 }
 
 /*
@@ -958,13 +954,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
              * broken already using the event. We should hold until
              * the channel is rebuilt.
              */
-            if (postcopy_pause_fault_thread(mis)) {
-                /* Continue to read the userfaultfd */
-            } else {
-                error_report("%s: paused but don't allow to continue",
-                             __func__);
-                break;
-            }
+            postcopy_pause_fault_thread(mis);
         }
 
         if (pfd[1].revents) {
@@ -1038,15 +1028,8 @@ retry:
                                         msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
-                if (postcopy_pause_fault_thread(mis)) {
-                    /* We got reconnected somehow, try to continue */
-                    goto retry;
-                } else {
-                    /* This is a unavoidable fault */
-                    error_report("%s: postcopy_request_page() get %d",
-                                 __func__, ret);
-                    break;
-                }
+                postcopy_pause_fault_thread(mis);
+                goto retry;
             }
         }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 12/20] migration: Export ram_load_postcopy()
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (10 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-21 16:17   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover() Peter Xu
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Will be reused in postcopy fast load thread.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 2 +-
 migration/ram.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 1ed70b17d7..f8bc3cd882 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3644,7 +3644,7 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis)
  *
  * @f: QEMUFile where to send the data
  */
-static int ram_load_postcopy(QEMUFile *f)
+int ram_load_postcopy(QEMUFile *f)
 {
     int flags = 0, ret = 0;
     bool place_needed = false;
diff --git a/migration/ram.h b/migration/ram.h
index 2c6dc3675d..ded0a3a086 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -61,6 +61,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+int ram_load_postcopy(QEMUFile *f);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover()
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (11 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 12/20] migration: Export ram_load_postcopy() Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-22 10:57   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 14/20] migration: Add migration_incoming_transport_cleanup() Peter Xu
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

We used to use postcopy_try_recover() to replace migration_incoming_setup() to
setup incoming channels.  That's fine for the old world, but in the new world
there can be more than one channels that need setup.  Better move the channel
setup out of it so that postcopy_try_recover() only handles the last phase of
switching to the recovery phase.

To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
call to be after migration_incoming_setup(), which will setup the channels.
While in migration_ioc_process_incoming(), postpone the recover() routine right
before we'll jump into migration_incoming_process().

A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
anymore.  Remove it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 67520d3105..b2e6446457 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -665,19 +665,20 @@ void migration_incoming_process(void)
 }
 
 /* Returns true if recovered from a paused migration, otherwise false */
-static bool postcopy_try_recover(QEMUFile *f)
+static bool postcopy_try_recover(void)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
         /* Resumed from a paused postcopy migration */
 
-        mis->from_src_file = f;
+        /* This should be set already in migration_incoming_setup() */
+        assert(mis->from_src_file);
         /* Postcopy has standalone thread to do vm load */
-        qemu_file_set_blocking(f, true);
+        qemu_file_set_blocking(mis->from_src_file, true);
 
         /* Re-configure the return path */
-        mis->to_src_file = qemu_file_get_return_path(f);
+        mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
 
         migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
                           MIGRATION_STATUS_POSTCOPY_RECOVER);
@@ -698,11 +699,10 @@ static bool postcopy_try_recover(QEMUFile *f)
 
 void migration_fd_process_incoming(QEMUFile *f, Error **errp)
 {
-    if (postcopy_try_recover(f)) {
+    if (!migration_incoming_setup(f, errp)) {
         return;
     }
-
-    if (!migration_incoming_setup(f, errp)) {
+    if (postcopy_try_recover()) {
         return;
     }
     migration_incoming_process();
@@ -718,11 +718,6 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         /* The first connection (multifd may have multiple) */
         QEMUFile *f = qemu_fopen_channel_input(ioc);
 
-        /* If it's a recovery, we're done */
-        if (postcopy_try_recover(f)) {
-            return;
-        }
-
         if (!migration_incoming_setup(f, errp)) {
             return;
         }
@@ -743,6 +738,10 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
     }
 
     if (start_migration) {
+        /* If it's a recovery, we're done */
+        if (postcopy_try_recover()) {
+            return;
+        }
         migration_incoming_process();
     }
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 14/20] migration: Add migration_incoming_transport_cleanup()
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (12 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover() Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-21 16:56   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 15/20] migration: Allow migrate-recover to run multiple times Peter Xu
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 22 ++++++++++++++--------
 migration/migration.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b2e6446457..6bb321cdd3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -279,6 +279,19 @@ MigrationIncomingState *migration_incoming_get_current(void)
     return current_incoming;
 }
 
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
+{
+    if (mis->socket_address_list) {
+        qapi_free_SocketAddressList(mis->socket_address_list);
+        mis->socket_address_list = NULL;
+    }
+
+    if (mis->transport_cleanup) {
+        mis->transport_cleanup(mis->transport_data);
+        mis->transport_data = mis->transport_cleanup = NULL;
+    }
+}
+
 void migration_incoming_state_destroy(void)
 {
     struct MigrationIncomingState *mis = migration_incoming_get_current();
@@ -299,10 +312,8 @@ void migration_incoming_state_destroy(void)
         g_array_free(mis->postcopy_remote_fds, TRUE);
         mis->postcopy_remote_fds = NULL;
     }
-    if (mis->transport_cleanup) {
-        mis->transport_cleanup(mis->transport_data);
-    }
 
+    migration_incoming_transport_cleanup(mis);
     qemu_event_reset(&mis->main_thread_load_event);
 
     if (mis->page_requested) {
@@ -310,11 +321,6 @@ void migration_incoming_state_destroy(void)
         mis->page_requested = NULL;
     }
 
-    if (mis->socket_address_list) {
-        qapi_free_SocketAddressList(mis->socket_address_list);
-        mis->socket_address_list = NULL;
-    }
-
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index d677a750c9..f17ccc657c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -166,6 +166,7 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
 /*
  * Functions to work with blocktime context
  */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 15/20] migration: Allow migrate-recover to run multiple times
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (13 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 14/20] migration: Add migration_incoming_transport_cleanup() Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-21 17:03   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 16/20] migration: Add postcopy-preempt capability Peter Xu
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Previously migration didn't have an easy way to cleanup the listening
transport, migrate recovery only allows to execute once.  That's done with a
trick flag in postcopy_recover_triggered.

Now the facility is already there.

Drop postcopy_recover_triggered and instead allows a new migrate-recover to
release the previous listener transport.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 13 ++-----------
 migration/migration.h |  1 -
 migration/savevm.c    |  3 ---
 3 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6bb321cdd3..16086897aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2159,11 +2159,8 @@ void qmp_migrate_recover(const char *uri, Error **errp)
         return;
     }
 
-    if (qatomic_cmpxchg(&mis->postcopy_recover_triggered,
-                       false, true) == true) {
-        error_setg(errp, "Migrate recovery is triggered already");
-        return;
-    }
+    /* If there's an existing transport, release it */
+    migration_incoming_transport_cleanup(mis);
 
     /*
      * Note that this call will never start a real migration; it will
@@ -2171,12 +2168,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
      * to continue using that newly established channel.
      */
     qemu_start_incoming_migration(uri, errp);
-
-    /* Safe to dereference with the assert above */
-    if (*errp) {
-        /* Reset the flag so user could still retry */
-        qatomic_set(&mis->postcopy_recover_triggered, false);
-    }
 }
 
 void qmp_migrate_pause(Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index f17ccc657c..a863032b71 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -139,7 +139,6 @@ struct MigrationIncomingState {
     struct PostcopyBlocktimeContext *blocktime_ctx;
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
-    bool postcopy_recover_triggered;
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 967ff80547..254aa78234 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2589,9 +2589,6 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 
     assert(migrate_postcopy_ram());
 
-    /* Clear the triggered bit to allow one recovery */
-    mis->postcopy_recover_triggered = false;
-
     /*
      * Unregister yank with either from/to src would work, since ioc behind it
      * is the same
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 16/20] migration: Add postcopy-preempt capability
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (14 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 15/20] migration: Allow migrate-recover to run multiple times Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-16  6:28 ` [PATCH 17/20] migration: Postcopy preemption preparation on channel creation Peter Xu
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 23 +++++++++++++++++++++++
 migration/migration.h |  1 +
 qapi/migration.json   |  8 +++++++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 16086897aa..4c22bad304 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1235,6 +1235,11 @@ static bool migrate_caps_check(bool *cap_list,
             error_setg(errp, "Postcopy is not compatible with ignore-shared");
             return false;
         }
+
+        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp, "Multifd is not supported in postcopy");
+            return false;
+        }
     }
 
     if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
@@ -1278,6 +1283,13 @@ static bool migrate_caps_check(bool *cap_list,
         return false;
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+        if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+            error_setg(errp, "Postcopy preempt requires postcopy-ram");
+            return false;
+        }
+    }
+
     return true;
 }
 
@@ -2622,6 +2634,15 @@ bool migrate_background_snapshot(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4232,6 +4253,8 @@ static Property migration_properties[] = {
     DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
     DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
     DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+    DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+                        MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
     DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
     DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
     DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index a863032b71..af4bcb19c2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -394,6 +394,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index 5975a0e104..50878b5f3b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -463,6 +463,12 @@
 #                       procedure starts. The VM RAM is saved with running VM.
 #                       (since 6.0)
 #
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#                    requests to preempt precopy stream, so postcopy requests
+#                    will be handled faster.  This is a performance feature and
+#                    should not affect the correctness of postcopy migration.
+#                    (since 7.0)
+#
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -476,7 +482,7 @@
            'block', 'return-path', 'pause-before-switchover', 'multifd',
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
-           'validate-uuid', 'background-snapshot'] }
+           'validate-uuid', 'background-snapshot', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 17/20] migration: Postcopy preemption preparation on channel creation
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (15 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 16/20] migration: Add postcopy-preempt capability Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-21 18:39   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 18/20] migration: Postcopy preemption enablement Peter Xu
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 62 ++++++++++++++++++++++++----
 migration/migration.h    |  8 ++++
 migration/postcopy-ram.c | 88 ++++++++++++++++++++++++++++++++++++++--
 migration/postcopy-ram.h | 10 +++++
 migration/ram.c          | 25 ++++++++----
 migration/ram.h          |  4 +-
 migration/socket.c       | 22 +++++++++-
 migration/socket.h       |  1 +
 migration/trace-events   |  3 ++
 9 files changed, 203 insertions(+), 20 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4c22bad304..3d7f897b72 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
         mis->page_requested = NULL;
     }
 
+    if (mis->postcopy_qemufile_dst) {
+        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+        qemu_fclose(mis->postcopy_qemufile_dst);
+        mis->postcopy_qemufile_dst = NULL;
+    }
+
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
     migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+    return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     bool start_migration;
+    QEMUFile *f;
 
     if (!mis->from_src_file) {
         /* The first connection (multifd may have multiple) */
-        QEMUFile *f = qemu_fopen_channel_input(ioc);
+        f = qemu_fopen_channel_input(ioc);
 
         if (!migration_incoming_setup(f, errp)) {
             return;
@@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 
         /*
          * Common migration only needs one channel, so we can start
-         * right now.  Multifd needs more than one channel, we wait.
+         * right now.  Some features need more than one channel, we wait.
          */
-        start_migration = !migrate_use_multifd();
+        start_migration = !migration_needs_multiple_sockets();
     } else {
         /* Multiple connections */
-        assert(migrate_use_multifd());
-        start_migration = multifd_recv_new_channel(ioc, &local_err);
+        assert(migration_needs_multiple_sockets());
+        if (migrate_use_multifd()) {
+            start_migration = multifd_recv_new_channel(ioc, &local_err);
+        } else if (migrate_postcopy_preempt()) {
+            f = qemu_fopen_channel_input(ioc);
+            start_migration = postcopy_preempt_new_channel(mis, f);
+        }
         if (local_err) {
             error_propagate(errp, local_err);
             return;
@@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 bool migration_has_all_channels(void)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
-    bool all_channels;
 
-    all_channels = multifd_recv_all_channels_created();
+    if (!mis->from_src_file) {
+        return false;
+    }
+
+    if (migrate_use_multifd()) {
+        return multifd_recv_all_channels_created();
+    }
+
+    if (migrate_postcopy_preempt()) {
+        return mis->postcopy_qemufile_dst != NULL;
+    }
 
-    return all_channels && mis->from_src_file != NULL;
+    return true;
 }
 
 /*
@@ -1858,6 +1884,12 @@ static void migrate_fd_cleanup(MigrationState *s)
         qemu_fclose(tmp);
     }
 
+    if (s->postcopy_qemufile_src) {
+        migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+        qemu_fclose(s->postcopy_qemufile_src);
+        s->postcopy_qemufile_src = NULL;
+    }
+
     assert(!migration_is_active(s));
 
     if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3233,6 +3265,11 @@ static void migration_completion(MigrationState *s)
         qemu_savevm_state_complete_postcopy(s->to_dst_file);
         qemu_mutex_unlock_iothread();
 
+        /* Shutdown the postcopy fast path thread */
+        if (migrate_postcopy_preempt()) {
+            postcopy_preempt_shutdown_file(s);
+        }
+
         trace_migration_completion_postcopy_end_after_complete();
     } else {
         goto fail;
@@ -4120,6 +4157,15 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         }
     }
 
+    /* This needs to be done before resuming a postcopy */
+    if (postcopy_preempt_setup(s, &local_err)) {
+        error_report_err(local_err);
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_FAILED);
+        migrate_fd_cleanup(s);
+        return;
+    }
+
     if (resume) {
         /* Wakeup the main migration thread to do the recovery */
         migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
diff --git a/migration/migration.h b/migration/migration.h
index af4bcb19c2..caa910d956 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -23,6 +23,7 @@
 #include "io/channel-buffer.h"
 #include "net/announce.h"
 #include "qom/object.h"
+#include "postcopy-ram.h"
 
 struct PostcopyBlocktimeContext;
 
@@ -112,6 +113,11 @@ struct MigrationIncomingState {
      * enabled.
      */
     unsigned int postcopy_channels;
+    /* QEMUFile for postcopy only; it'll be handled by a separate thread */
+    QEMUFile *postcopy_qemufile_dst;
+    /* Postcopy priority thread is used to receive postcopy requested pages */
+    QemuThread postcopy_prio_thread;
+    bool postcopy_prio_thread_created;
     /*
      * An array of temp host huge pages to be used, one for each postcopy
      * channel.
@@ -192,6 +198,8 @@ struct MigrationState {
     QEMUBH *cleanup_bh;
     /* Protected by qemu_file_lock */
     QEMUFile *to_dst_file;
+    /* Postcopy specific transfer channel */
+    QEMUFile *postcopy_qemufile_src;
     QIOChannelBuffer *bioc;
     /*
      * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 738cc55fa6..30eddaacd1 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -32,6 +32,9 @@
 #include "trace.h"
 #include "hw/boards.h"
 #include "exec/ramblock.h"
+#include "socket.h"
+#include "qemu-file-channel.h"
+#include "yank_functions.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -566,6 +569,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
     trace_postcopy_ram_incoming_cleanup_entry();
 
+    if (mis->postcopy_prio_thread_created) {
+        qemu_thread_join(&mis->postcopy_prio_thread);
+        mis->postcopy_prio_thread_created = false;
+    }
+
     if (mis->have_fault_thread) {
         Error *local_err = NULL;
 
@@ -1101,8 +1109,13 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
     int err, i, channels;
     void *temp_page;
 
-    /* TODO: will be boosted when enable postcopy preemption */
-    mis->postcopy_channels = 1;
+    if (migrate_postcopy_preempt()) {
+        /* If preemption enabled, need extra channel for urgent requests */
+        mis->postcopy_channels = RAM_CHANNEL_MAX;
+    } else {
+        /* Both precopy/postcopy on the same channel */
+        mis->postcopy_channels = 1;
+    }
 
     channels = mis->postcopy_channels;
     mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
@@ -1169,7 +1182,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
-    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
+    postcopy_thread_create(mis, &mis->fault_thread, "qemu/fault-default",
                            postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
     mis->have_fault_thread = true;
 
@@ -1184,6 +1197,16 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (migrate_postcopy_preempt()) {
+        /*
+         * This thread needs to be created after the temp pages because it'll fetch
+         * RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
+         */
+        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "qemu/fault-fast",
+                               postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
+        mis->postcopy_prio_thread_created = true;
+    }
+
     trace_postcopy_ram_enable_notify();
 
     return 0;
@@ -1513,3 +1536,62 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
         }
     }
 }
+
+bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
+{
+    mis->postcopy_qemufile_dst = file;
+
+    trace_postcopy_preempt_new_channel();
+
+    /* Start the migration immediately */
+    return true;
+}
+
+int postcopy_preempt_setup(MigrationState *s, Error **errp)
+{
+    QIOChannel *ioc;
+
+    if (!migrate_postcopy_preempt()) {
+        return 0;
+    }
+
+    if (!migrate_multi_channels_is_allowed()) {
+        error_setg(errp, "Postcopy preempt is not supported as current "
+                   "migration stream does not support multi-channels.");
+        return -1;
+    }
+
+    ioc = socket_send_channel_create_sync(errp);
+
+    if (ioc == NULL) {
+        return -1;
+    }
+
+    migration_ioc_register_yank(ioc);
+    s->postcopy_qemufile_src = qemu_fopen_channel_output(ioc);
+
+    trace_postcopy_preempt_new_channel();
+
+    return 0;
+}
+
+void *postcopy_preempt_thread(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    int ret;
+
+    trace_postcopy_preempt_thread_entry();
+
+    rcu_register_thread();
+
+    qemu_sem_post(&mis->thread_sync_sem);
+
+    /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
+    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
+
+    rcu_unregister_thread();
+
+    trace_postcopy_preempt_thread_exit();
+
+    return ret == 0 ? NULL : (void *)-1;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 07684c0e1d..34b1080cde 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -183,4 +183,14 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, uint64_t client_addr,
 int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                  uint64_t client_addr, uint64_t offset);
 
+/* Hard-code channels for now for postcopy preemption */
+enum PostcopyChannels {
+    RAM_CHANNEL_PRECOPY = 0,
+    RAM_CHANNEL_POSTCOPY = 1,
+    RAM_CHANNEL_MAX,
+};
+
+bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
+int postcopy_preempt_setup(MigrationState *s, Error **errp);
+
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index f8bc3cd882..36e3da6bb0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3643,15 +3643,15 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis)
  * rcu_read_lock is taken prior to this being called.
  *
  * @f: QEMUFile where to send the data
+ * @channel: the channel to use for loading
  */
-int ram_load_postcopy(QEMUFile *f)
+int ram_load_postcopy(QEMUFile *f, int channel)
 {
     int flags = 0, ret = 0;
     bool place_needed = false;
     bool matches_target_page_size = false;
     MigrationIncomingState *mis = migration_incoming_get_current();
-    /* Currently we only use channel 0.  TODO: use all the channels */
-    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[0];
+    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[channel];
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
@@ -3716,10 +3716,10 @@ int ram_load_postcopy(QEMUFile *f)
             } else if (tmp_page->host_addr !=
                        host_page_from_ram_block_offset(block, addr)) {
                 /* not the 1st TP within the HP */
-                error_report("Non-same host page detected.  Target host page %p, "
-                             "received host page %p "
+                error_report("Non-same host page detected on channel %d: "
+                             "Target host page %p, received host page %p "
                              "(rb %s offset 0x"RAM_ADDR_FMT" target_pages %d)",
-                             tmp_page->host_addr,
+                             channel, tmp_page->host_addr,
                              host_page_from_ram_block_offset(block, addr),
                              block->idstr, addr, tmp_page->target_pages);
                 ret = -EINVAL;
@@ -4106,7 +4106,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      */
     WITH_RCU_READ_LOCK_GUARD() {
         if (postcopy_running) {
-            ret = ram_load_postcopy(f);
+            /*
+             * Note!  Here RAM_CHANNEL_PRECOPY is the precopy channel of
+             * postcopy migration, we have another RAM_CHANNEL_POSTCOPY to
+             * service fast page faults.
+             */
+            ret = ram_load_postcopy(f, RAM_CHANNEL_PRECOPY);
         } else {
             ret = ram_load_precopy(f);
         }
@@ -4268,6 +4273,12 @@ static int ram_resume_prepare(MigrationState *s, void *opaque)
     return 0;
 }
 
+void postcopy_preempt_shutdown_file(MigrationState *s)
+{
+    qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
+    qemu_fflush(s->postcopy_qemufile_src);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/ram.h b/migration/ram.h
index ded0a3a086..5d90945a6e 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -61,7 +61,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(const char *block_name, uint64_t start, size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
-int ram_load_postcopy(QEMUFile *f);
+int ram_load_postcopy(QEMUFile *f, int channel);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
@@ -73,6 +73,8 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
+void postcopy_preempt_shutdown_file(MigrationState *s);
+void *postcopy_preempt_thread(void *opaque);
 
 /* ram cache */
 int colo_init_ram_cache(void);
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8..a7f345b353 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -26,7 +26,7 @@
 #include "io/channel-socket.h"
 #include "io/net-listener.h"
 #include "trace.h"
-
+#include "postcopy-ram.h"
 
 struct SocketOutgoingArgs {
     SocketAddress *saddr;
@@ -39,6 +39,24 @@ void socket_send_channel_create(QIOTaskFunc f, void *data)
                                      f, data, NULL, NULL);
 }
 
+QIOChannel *socket_send_channel_create_sync(Error **errp)
+{
+    QIOChannelSocket *sioc = qio_channel_socket_new();
+
+    if (!outgoing_args.saddr) {
+        object_unref(OBJECT(sioc));
+        error_setg(errp, "Initial sock address not set!");
+        return NULL;
+    }
+
+    if (qio_channel_socket_connect_sync(sioc, outgoing_args.saddr, errp) < 0) {
+        object_unref(OBJECT(sioc));
+        return NULL;
+    }
+
+    return QIO_CHANNEL(sioc);
+}
+
 int socket_send_channel_destroy(QIOChannel *send)
 {
     /* Remove channel */
@@ -158,6 +176,8 @@ socket_start_incoming_migration_internal(SocketAddress *saddr,
 
     if (migrate_use_multifd()) {
         num = migrate_multifd_channels();
+    } else if (migrate_postcopy_preempt()) {
+        num = RAM_CHANNEL_MAX;
     }
 
     if (qio_net_listener_open_sync(listener, saddr, num, errp) < 0) {
diff --git a/migration/socket.h b/migration/socket.h
index 891dbccceb..dc54df4e6c 100644
--- a/migration/socket.h
+++ b/migration/socket.h
@@ -21,6 +21,7 @@
 #include "io/task.h"
 
 void socket_send_channel_create(QIOTaskFunc f, void *data);
+QIOChannel *socket_send_channel_create_sync(Error **errp);
 int socket_send_channel_destroy(QIOChannel *send);
 
 void socket_start_incoming_migration(const char *str, Error **errp);
diff --git a/migration/trace-events b/migration/trace-events
index 1aec580e92..4474a76614 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -278,6 +278,9 @@ postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
+postcopy_preempt_new_channel(void) ""
+postcopy_preempt_thread_entry(void) ""
+postcopy_preempt_thread_exit(void) ""
 
 get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 18/20] migration: Postcopy preemption enablement
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (16 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 17/20] migration: Postcopy preemption preparation on channel creation Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-22 10:52   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 19/20] migration: Postcopy recover with preempt enabled Peter Xu
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
    background migration stream, so as to be isolated from very high page
    request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
    intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c        | 247 +++++++++++++++++++++++++++++++++++++++--
 migration/trace-events |   7 ++
 4 files changed, 249 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3d7f897b72..d20db04097 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3153,6 +3153,8 @@ static int postcopy_start(MigrationState *ms)
                               MIGRATION_STATUS_FAILED);
     }
 
+    trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
     return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index caa910d956..b8aacfe3af 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
     /* Previously received RAM's RAMBlock pointer */
-    RAMBlock *last_recv_block;
+    RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
     /* A hook to allow cleanup at the end of incoming migration */
     void *transport_data;
     void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index 36e3da6bb0..ffbe9a9a50 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -294,6 +294,20 @@ struct RAMSrcPageRequest {
     QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+    /*
+     * Cached ramblock/offset values if preempted.  They're only meaningful if
+     * preempted==true below.
+     */
+    RAMBlock *ram_block;
+    unsigned long ram_page;
+    /*
+     * Whether a postcopy preemption just happened.  Will be reset after
+     * precopy recovered to background migration.
+     */
+    bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
     /* QEMUFile used for this migration */
@@ -348,6 +362,14 @@ struct RAMState {
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
     QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+    /* Postcopy preemption informations */
+    PostcopyPreemptState postcopy_preempt_state;
+    /*
+     * Current channel we're using on src VM.  Only valid if postcopy-preempt
+     * is enabled.
+     */
+    int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -355,6 +377,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+    memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1946,6 +1973,55 @@ void ram_write_tracking_stop(void)
 }
 #endif /* defined(__linux__) */
 
+/*
+ * Check whether two addr/offset of the ramblock falls onto the same host huge
+ * page.  Returns true if so, false otherwise.
+ */
+static bool offset_on_same_huge_page(RAMBlock *rb, uint64_t addr1,
+                                     uint64_t addr2)
+{
+    size_t page_size = qemu_ram_pagesize(rb);
+
+    addr1 = ROUND_DOWN(addr1, page_size);
+    addr2 = ROUND_DOWN(addr2, page_size);
+
+    return addr1 == addr2;
+}
+
+/*
+ * Whether a previous preempted precopy huge page contains current requested
+ * page?  Returns true if so, false otherwise.
+ *
+ * This should really happen very rarely, because it means when we were sending
+ * during background migration for postcopy we're sending exactly the page that
+ * some vcpu got faulted on on dest node.  When it happens, we probably don't
+ * need to do much but drop the request, because we know right after we restore
+ * the precopy stream it'll be serviced.  It'll slightly affect the order of
+ * postcopy requests to be serviced (e.g. it'll be the same as we move current
+ * request to the end of the queue) but it shouldn't be a big deal.  The most
+ * imporant thing is we can _never_ try to send a partial-sent huge page on the
+ * POSTCOPY channel again, otherwise that huge page will got "split brain" on
+ * two channels (PRECOPY, POSTCOPY).
+ */
+static bool postcopy_preempted_contains(RAMState *rs, RAMBlock *block,
+                                        ram_addr_t offset)
+{
+    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
+
+    /* No preemption at all? */
+    if (!state->preempted) {
+        return false;
+    }
+
+    /* Not even the same ramblock? */
+    if (state->ram_block != block) {
+        return false;
+    }
+
+    return offset_on_same_huge_page(block, offset,
+                                    state->ram_page << TARGET_PAGE_BITS);
+}
+
 /**
  * get_queued_page: unqueue a page from the postcopy requests
  *
@@ -1961,9 +2037,17 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
     RAMBlock  *block;
     ram_addr_t offset;
 
+again:
     block = unqueue_page(rs, &offset);
 
-    if (!block) {
+    if (block) {
+        /* See comment above postcopy_preempted_contains() */
+        if (postcopy_preempted_contains(rs, block, offset)) {
+            trace_postcopy_preempt_hit(block->idstr, offset);
+            /* This request is dropped */
+            goto again;
+        }
+    } else {
         /*
          * Poll write faults too if background snapshot is enabled; that's
          * when we have vcpus got blocked by the write protected pages.
@@ -2179,6 +2263,114 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     return ram_save_page(rs, pss);
 }
 
+static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
+{
+    /* Not enabled eager preempt?  Then never do that. */
+    if (!migrate_postcopy_preempt()) {
+        return false;
+    }
+
+    /* If the ramblock we're sending is a small page?  Never bother. */
+    if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
+        return false;
+    }
+
+    /* Not in postcopy at all? */
+    if (!migration_in_postcopy()) {
+        return false;
+    }
+
+    /*
+     * If we're already handling a postcopy request, don't preempt as this page
+     * has got the same high priority.
+     */
+    if (pss->postcopy_requested) {
+        return false;
+    }
+
+    /* If there's postcopy requests, then check it up! */
+    return postcopy_has_request(rs);
+}
+
+/* Returns true if we preempted precopy, false otherwise */
+static void postcopy_do_preempt(RAMState *rs, PageSearchStatus *pss)
+{
+    PostcopyPreemptState *p_state = &rs->postcopy_preempt_state;
+
+    trace_postcopy_preempt_triggered(pss->block->idstr, pss->page);
+
+    /*
+     * Time to preempt precopy. Cache current PSS into preempt state, so that
+     * after handling the postcopy pages we can recover to it.  We need to do
+     * so because the dest VM will have partial of the precopy huge page kept
+     * over in its tmp huge page caches; better move on with it when we can.
+     */
+    p_state->ram_block = pss->block;
+    p_state->ram_page = pss->page;
+    p_state->preempted = true;
+}
+
+/* Whether we're preempted by a postcopy request during sending a huge page */
+static bool postcopy_preempt_triggered(RAMState *rs)
+{
+    return rs->postcopy_preempt_state.preempted;
+}
+
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss)
+{
+    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
+
+    assert(state->preempted);
+
+    pss->block = state->ram_block;
+    pss->page = state->ram_page;
+    /* This is not a postcopy request but restoring previous precopy */
+    pss->postcopy_requested = false;
+
+    trace_postcopy_preempt_restored(pss->block->idstr, pss->page);
+
+    /* Reset preempt state, most importantly, set preempted==false */
+    postcopy_preempt_reset(rs);
+}
+
+static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
+{
+    int channel = pss->postcopy_requested ? RAM_CHANNEL_POSTCOPY : RAM_CHANNEL_PRECOPY;
+    MigrationState *s = migrate_get_current();
+    QEMUFile *next;
+
+    if (channel != rs->postcopy_channel) {
+        if (channel == RAM_CHANNEL_PRECOPY) {
+            next = s->to_dst_file;
+        } else {
+            next = s->postcopy_qemufile_src;
+        }
+        /* Update and cache the current channel */
+        rs->f = next;
+        rs->postcopy_channel = channel;
+
+        /*
+         * If channel switched, reset last_sent_block since the old sent block
+         * may not be on the same channel.
+         */
+        rs->last_sent_block = NULL;
+
+        trace_postcopy_preempt_switch_channel(channel);
+    }
+
+    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
+}
+
+/* We need to make sure rs->f always points to the default channel elsewhere */
+static void postcopy_preempt_reset_channel(RAMState *rs)
+{
+    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+        rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
+        rs->f = migrate_get_current()->to_dst_file;
+        trace_postcopy_preempt_reset_channel();
+    }
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -2210,7 +2402,16 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
         return 0;
     }
 
+    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+        postcopy_preempt_choose_channel(rs, pss);
+    }
+
     do {
+        if (postcopy_needs_preempt(rs, pss)) {
+            postcopy_do_preempt(rs, pss);
+            break;
+        }
+
         /* Check the pages is dirty and if it is send it */
         if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
             tmppages = ram_save_target_page(rs, pss);
@@ -2234,6 +2435,19 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
     /* The offset we leave with is the min boundary of host page and block */
     pss->page = MIN(pss->page, hostpage_boundary);
 
+    /*
+     * When with postcopy preempt mode, flush the data as soon as possible for
+     * postcopy requests, because we've already sent a whole huge page, so the
+     * dst node should already have enough resource to atomically filling in
+     * the current missing page.
+     *
+     * More importantly, when using separate postcopy channel, we must do
+     * explicit flush or it won't flush until the buffer is full.
+     */
+    if (migrate_postcopy_preempt() && pss->postcopy_requested) {
+        qemu_fflush(rs->f);
+    }
+
     res = ram_save_release_protection(rs, pss, start_page);
     return (res < 0 ? res : pages);
 }
@@ -2275,8 +2489,17 @@ static int ram_find_and_save_block(RAMState *rs)
         found = get_queued_page(rs, &pss);
 
         if (!found) {
-            /* priority queue empty, so just search for something dirty */
-            found = find_dirty_block(rs, &pss, &again);
+            /*
+             * Recover previous precopy ramblock/offset if postcopy has
+             * preempted precopy.  Otherwise find the next dirty bit.
+             */
+            if (postcopy_preempt_triggered(rs)) {
+                postcopy_preempt_restore(rs, &pss);
+                found = true;
+            } else {
+                /* priority queue empty, so just search for something dirty */
+                found = find_dirty_block(rs, &pss, &again);
+            }
         }
 
         if (found) {
@@ -2404,6 +2627,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->xbzrle_enabled = false;
+    postcopy_preempt_reset(rs);
+    rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -3042,6 +3267,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     }
     qemu_mutex_unlock(&rs->bitmap_mutex);
 
+    postcopy_preempt_reset_channel(rs);
+
     /*
      * Must occur before EOS (or any QEMUFile operation)
      * because of RDMA protocol.
@@ -3111,6 +3338,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         ram_control_after_iterate(f, RAM_CONTROL_FINISH);
     }
 
+    postcopy_preempt_reset_channel(rs);
+
     if (ret >= 0) {
         multifd_send_sync_main(rs->f);
         qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
@@ -3193,11 +3422,13 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
  * @mis: the migration incoming state pointer
  * @f: QEMUFile where to read the data from
  * @flags: Page flags (mostly to see if it's a continuation of previous block)
+ * @channel: the channel we're using
  */
 static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
-                                              QEMUFile *f, int flags)
+                                              QEMUFile *f, int flags,
+                                              int channel)
 {
-    RAMBlock *block = mis->last_recv_block;
+    RAMBlock *block = mis->last_recv_block[channel];
     char id[256];
     uint8_t len;
 
@@ -3224,7 +3455,7 @@ static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
         return NULL;
     }
 
-    mis->last_recv_block = block;
+    mis->last_recv_block[channel] = block;
 
     return block;
 }
@@ -3678,7 +3909,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
-            block = ram_block_from_stream(mis, f, flags);
+            block = ram_block_from_stream(mis, f, flags, channel);
             if (!block) {
                 ret = -EINVAL;
                 break;
@@ -3929,7 +4160,7 @@ static int ram_load_precopy(QEMUFile *f)
 
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-            RAMBlock *block = ram_block_from_stream(mis, f, flags);
+            RAMBlock *block = ram_block_from_stream(mis, f, flags, RAM_CHANNEL_PRECOPY);
 
             host = host_from_ram_block_offset(block, addr);
             /*
diff --git a/migration/trace-events b/migration/trace-events
index 4474a76614..b1155d9da0 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -111,6 +111,12 @@ ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRI
 ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
 ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
 unqueue_page(char *block, uint64_t offset, bool dirty) "ramblock '%s' offset 0x%"PRIx64" dirty %d"
+postcopy_preempt_triggered(char *str, unsigned long page) "during sending ramblock %s offset 0x%lx"
+postcopy_preempt_restored(char *str, unsigned long page) "ramblock %s offset 0x%lx"
+postcopy_preempt_hit(char *str, uint64_t offset) "ramblock %s offset 0x%"PRIx64
+postcopy_preempt_send_host_page(char *str, uint64_t offset) "ramblock %s offset 0x%"PRIx64
+postcopy_preempt_switch_channel(int channel) "%d"
+postcopy_preempt_reset_channel(void) ""
 
 # multifd.c
 multifd_new_send_channel_async(uint8_t id) "channel %u"
@@ -176,6 +182,7 @@ migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
+postcopy_preempt_enabled(bool value) "%d"
 
 # channel.c
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (17 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 18/20] migration: Postcopy preemption enablement Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-22 11:32   ` Dr. David Alan Gilbert
  2022-02-16  6:28 ` [PATCH 20/20] tests: Add postcopy preempt test Peter Xu
  2022-02-16  9:28 ` [PATCH 00/20] migration: Postcopy Preemption Peter Xu
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 17 +++++++++++++++--
 migration/migration.h    |  3 +++
 migration/postcopy-ram.c | 24 ++++++++++++++++++++++--
 migration/savevm.c       | 17 +++++++++++++++++
 migration/trace-events   |  2 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d20db04097..c68a281406 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
     current_incoming->postcopy_remote_fds =
         g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
     qemu_mutex_init(&current_incoming->rp_mutex);
+    qemu_mutex_init(&current_incoming->postcopy_prio_thread_mutex);
     qemu_event_init(&current_incoming->main_thread_load_event, false);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_fast_load, 0);
     qemu_mutex_init(&current_incoming->page_request_mutex);
     current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
         /*
          * Here, we only wake up the main loading thread (while the
-         * fault thread will still be waiting), so that we can receive
+         * rest threads will still be waiting), so that we can receive
          * commands from source now, and answer it if needed. The
-         * fault thread will be woken up afterwards until we are sure
+         * rest threads will be woken up afterwards until we are sure
          * that source is ready to reply to page requests.
          */
         qemu_sem_post(&mis->postcopy_pause_sem_dst);
@@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
         qemu_file_shutdown(file);
         qemu_fclose(file);
 
+        /*
+         * Do the same to postcopy fast path socket too if there is.  No
+         * locking needed because no racer as long as we do this before setting
+         * status to paused.
+         */
+        if (s->postcopy_qemufile_src) {
+            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+            qemu_fclose(s->postcopy_qemufile_src);
+            s->postcopy_qemufile_src = NULL;
+        }
+
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
diff --git a/migration/migration.h b/migration/migration.h
index b8aacfe3af..945088064a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,8 @@ struct MigrationIncomingState {
     /* Postcopy priority thread is used to receive postcopy requested pages */
     QemuThread postcopy_prio_thread;
     bool postcopy_prio_thread_created;
+    /* Used to sync with the prio thread */
+    QemuMutex postcopy_prio_thread_mutex;
     /*
      * An array of temp host huge pages to be used, one for each postcopy
      * channel.
@@ -147,6 +149,7 @@ struct MigrationIncomingState {
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
+    QemuSemaphore postcopy_pause_sem_fast_load;
 
     /* List of listening socket addresses  */
     SocketAddressList *socket_address_list;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 30eddaacd1..b3d23804bc 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1575,6 +1575,15 @@ int postcopy_preempt_setup(MigrationState *s, Error **errp)
     return 0;
 }
 
+static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fast_load();
+    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
+    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
+    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+    trace_postcopy_pause_fast_load_continued();
+}
+
 void *postcopy_preempt_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -1587,11 +1596,22 @@ void *postcopy_preempt_thread(void *opaque)
     qemu_sem_post(&mis->thread_sync_sem);
 
     /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
-    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
+    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+    while (1) {
+        ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
+        /* If error happened, go into recovery routine */
+        if (ret) {
+            postcopy_pause_ram_fast_load(mis);
+        } else {
+            /* We're done */
+            break;
+        }
+    }
+    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
 
     rcu_unregister_thread();
 
     trace_postcopy_preempt_thread_exit();
 
-    return ret == 0 ? NULL : (void *)-1;
+    return NULL;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index 254aa78234..2d32340d28 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2152,6 +2152,13 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
      */
     qemu_sem_post(&mis->postcopy_pause_sem_fault);
 
+    if (migrate_postcopy_preempt()) {
+        /* The channel should already be setup again; make sure of it */
+        assert(mis->postcopy_qemufile_dst);
+        /* Kick the fast ram load thread too */
+        qemu_sem_post(&mis->postcopy_pause_sem_fast_load);
+    }
+
     return 0;
 }
 
@@ -2607,6 +2614,16 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    if (mis->postcopy_qemufile_dst) {
+        qemu_file_shutdown(mis->postcopy_qemufile_dst);
+        /* Take the mutex to make sure the fast ram load thread halted */
+        qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+        qemu_fclose(mis->postcopy_qemufile_dst);
+        mis->postcopy_qemufile_dst = NULL;
+        qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
+    }
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
diff --git a/migration/trace-events b/migration/trace-events
index b1155d9da0..dfe36a3340 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, i
 mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
 postcopy_pause_fault_thread(void) ""
 postcopy_pause_fault_thread_continued(void) ""
+postcopy_pause_fast_load(void) ""
+postcopy_pause_fast_load_continued(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 20/20] tests: Add postcopy preempt test
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (18 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 19/20] migration: Postcopy recover with preempt enabled Peter Xu
@ 2022-02-16  6:28 ` Peter Xu
  2022-02-22 12:51   ` Dr. David Alan Gilbert
  2022-02-16  9:28 ` [PATCH 00/20] migration: Postcopy Preemption Peter Xu
  20 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-16  6:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, peterx,
	Juan Quintela

Two tests are added: a normal postcopy preempt test, and a recovery test.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 39 ++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7b42f6fd90..5053b40589 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -470,6 +470,7 @@ typedef struct {
      */
     bool hide_stderr;
     bool use_shmem;
+    bool postcopy_preempt;
     /* only launch the target process */
     bool only_target;
     /* Use dirty ring if true; dirty logging otherwise */
@@ -673,6 +674,11 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
     migrate_set_capability(to, "postcopy-ram", true);
     migrate_set_capability(to, "postcopy-blocktime", true);
 
+    if (args->postcopy_preempt) {
+        migrate_set_capability(from, "postcopy-preempt", true);
+        migrate_set_capability(to, "postcopy-preempt", true);
+    }
+
     /* We want to pick a speed slow enough that the test completes
      * quickly, but that it doesn't complete precopy even on a slow
      * machine, so also set the downtime.
@@ -719,13 +725,29 @@ static void test_postcopy(void)
     migrate_postcopy_complete(from, to);
 }
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_preempt(void)
+{
+    MigrateStart *args = migrate_start_new();
+    QTestState *from, *to;
+
+    args->postcopy_preempt = true;
+
+    if (migrate_postcopy_prepare(&from, &to, args)) {
+        return;
+    }
+    migrate_postcopy_start(from, to);
+    migrate_postcopy_complete(from, to);
+}
+
+/* @preempt: whether to use postcopy-preempt */
+static void test_postcopy_recovery(bool preempt)
 {
     MigrateStart *args = migrate_start_new();
     QTestState *from, *to;
     g_autofree char *uri = NULL;
 
     args->hide_stderr = true;
+    args->postcopy_preempt = preempt;
 
     if (migrate_postcopy_prepare(&from, &to, args)) {
         return;
@@ -781,6 +803,16 @@ static void test_postcopy_recovery(void)
     migrate_postcopy_complete(from, to);
 }
 
+static void test_postcopy_recovery_normal(void)
+{
+    test_postcopy_recovery(false);
+}
+
+static void test_postcopy_recovery_preempt(void)
+{
+    test_postcopy_recovery(true);
+}
+
 static void test_baddest(void)
 {
     MigrateStart *args = migrate_start_new();
@@ -1458,7 +1490,10 @@ int main(int argc, char **argv)
     module_call_init(MODULE_INIT_QOM);
 
     qtest_add_func("/migration/postcopy/unix", test_postcopy);
-    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery_normal);
+    qtest_add_func("/migration/postcopy/preempt/unix", test_postcopy_preempt);
+    qtest_add_func("/migration/postcopy/preempt/recovery",
+                   test_postcopy_recovery_preempt);
     qtest_add_func("/migration/bad_dest", test_baddest);
     qtest_add_func("/migration/precopy/unix", test_precopy_unix);
     qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 00/20] migration: Postcopy Preemption
  2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
                   ` (19 preceding siblings ...)
  2022-02-16  6:28 ` [PATCH 20/20] tests: Add postcopy preempt test Peter Xu
@ 2022-02-16  9:28 ` Peter Xu
  20 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-16  9:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Leonardo Bras Soares Passos, Dr . David Alan Gilbert, Juan Quintela

On Wed, Feb 16, 2022 at 02:27:49PM +0800, Peter Xu wrote:
> The new patch layout:
> 
> Patch 1-3: Three leftover patches from patchset "[PATCH v3 0/8] migration:
> Postcopy cleanup on ram disgard" that I picked up here too.
> 
>   https://lore.kernel.org/qemu-devel/20211224065000.97572-1-peterx@redhat.com/
> 
>   migration: Dump sub-cmd name in loadvm_process_command tp
>   migration: Finer grained tracepoints for POSTCOPY_LISTEN
>   migration: Tracepoint change in postcopy-run bottom half
> 
> Patch 4-9: Original postcopy preempt RFC preparation patches (with slight
> modifications).
> 
>   migration: Introduce postcopy channels on dest node
>   migration: Dump ramblock and offset too when non-same-page detected
>   migration: Add postcopy_thread_create()
>   migration: Move static var in ram_block_from_stream() into global
>   migration: Add pss.postcopy_requested status
>   migration: Move migrate_allow_multifd and helpers into migration.c
> 
> Patch 10-15: Some newly added patches when working on postcopy recovery
> support.  After these patches migrate-recover command will allow re-entrance,
> which is a very nice side effect.
> 
>   migration: Enlarge postcopy recovery to capture !-EIO too
>   migration: postcopy_pause_fault_thread() never fails
>   migration: Export ram_load_postcopy()
>   migration: Move channel setup out of postcopy_try_recover()
>   migration: Add migration_incoming_transport_cleanup()
>   migration: Allow migrate-recover to run multiple times

Patches before 15 are IMHO good in various aspects with/without the new
preemption, so they can be considered for review earlier.

Especially:

    migration: Enlarge postcopy recovery to capture !-EIO too
    migration: Add migration_incoming_transport_cleanup()
    migration: Allow migrate-recover to run multiple times

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp
  2022-02-16  6:27 ` [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
@ 2022-02-16 15:42   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-16 15:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> It'll be easier to read the name rather than index of sub-cmd when debugging.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c     | 3 ++-
>  migration/trace-events | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1599b02fbc..7bb65e1d61 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2273,12 +2273,13 @@ static int loadvm_process_command(QEMUFile *f)
>          return qemu_file_get_error(f);
>      }
>  
> -    trace_loadvm_process_command(cmd, len);
>      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
>          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
>          return -EINVAL;
>      }
>  
> +    trace_loadvm_process_command(mig_cmd_args[cmd].name, len);
> +
>      if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
>          error_report("%s received with bad length - expecting %zu, got %d",
>                       mig_cmd_args[cmd].name,
> diff --git a/migration/trace-events b/migration/trace-events
> index 48aa7b10ee..123cfe79d7 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -22,7 +22,7 @@ loadvm_postcopy_handle_resume(void) ""
>  loadvm_postcopy_ram_handle_discard(void) ""
>  loadvm_postcopy_ram_handle_discard_end(void) ""
>  loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
> -loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
> +loadvm_process_command(const char *s, uint16_t len) "com=%s len=%d"
>  loadvm_process_command_ping(uint32_t val) "0x%x"
>  postcopy_ram_listen_thread_exit(void) ""
>  postcopy_ram_listen_thread_start(void) ""
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN
  2022-02-16  6:27 ` [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
@ 2022-02-16 15:43   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-16 15:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> The enablement of postcopy listening has a few steps, add a few tracepoints to
> be there ready for some basic measurements for them.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c     | 9 ++++++++-
>  migration/trace-events | 2 +-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7bb65e1d61..190cc5fc42 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1948,9 +1948,10 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>  {
>      PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
> -    trace_loadvm_postcopy_handle_listen();
>      Error *local_err = NULL;
>  
> +    trace_loadvm_postcopy_handle_listen("enter");
> +
>      if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
>          error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
>          return -1;
> @@ -1965,6 +1966,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>          }
>      }
>  
> +    trace_loadvm_postcopy_handle_listen("after discard");
> +
>      /*
>       * Sensitise RAM - can now generate requests for blocks that don't exist
>       * However, at this point the CPU shouldn't be running, and the IO
> @@ -1977,6 +1980,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>          }
>      }
>  
> +    trace_loadvm_postcopy_handle_listen("after uffd");
> +
>      if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
>          error_report_err(local_err);
>          return -1;
> @@ -1991,6 +1996,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      qemu_sem_wait(&mis->listen_thread_sem);
>      qemu_sem_destroy(&mis->listen_thread_sem);
>  
> +    trace_loadvm_postcopy_handle_listen("return");
> +
>      return 0;
>  }
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index 123cfe79d7..92596c00d8 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -14,7 +14,7 @@ loadvm_handle_cmd_packaged_main(int ret) "%d"
>  loadvm_handle_cmd_packaged_received(int ret) "%d"
>  loadvm_handle_recv_bitmap(char *s) "%s"
>  loadvm_postcopy_handle_advise(void) ""
> -loadvm_postcopy_handle_listen(void) ""
> +loadvm_postcopy_handle_listen(const char *str) "%s"
>  loadvm_postcopy_handle_run(void) ""
>  loadvm_postcopy_handle_run_cpu_sync(void) ""
>  loadvm_postcopy_handle_run_vmstart(void) ""
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half
  2022-02-16  6:27 ` [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half Peter Xu
@ 2022-02-16 19:00   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-16 19:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Remove the old two tracepoints and they're even near each other:
> 
>     trace_loadvm_postcopy_handle_run_cpu_sync()
>     trace_loadvm_postcopy_handle_run_vmstart()
> 
> Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/savevm.c     | 12 +++++++++---
>  migration/trace-events |  3 +--
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 190cc5fc42..41e3238798 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2006,13 +2006,19 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>      Error *local_err = NULL;
>      MigrationIncomingState *mis = opaque;
>  
> +    trace_loadvm_postcopy_handle_run_bh("enter");
> +
>      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
>       * in migration.c
>       */
>      cpu_synchronize_all_post_init();
>  
> +    trace_loadvm_postcopy_handle_run_bh("after cpu sync");
> +
>      qemu_announce_self(&mis->announce_timer, migrate_announce_params());
>  
> +    trace_loadvm_postcopy_handle_run_bh("after announce");
> +
>      /* Make sure all file formats flush their mutable metadata.
>       * If we get an error here, just don't restart the VM yet. */
>      bdrv_invalidate_cache_all(&local_err);
> @@ -2022,9 +2028,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>          autostart = false;
>      }
>  
> -    trace_loadvm_postcopy_handle_run_cpu_sync();
> -
> -    trace_loadvm_postcopy_handle_run_vmstart();
> +    trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
>  
>      dirty_bitmap_mig_before_vm_start();
>  
> @@ -2037,6 +2041,8 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>      }
>  
>      qemu_bh_delete(mis->bh);
> +
> +    trace_loadvm_postcopy_handle_run_bh("return");
>  }
>  
>  /* After all discards we can start running and asking for pages */
> diff --git a/migration/trace-events b/migration/trace-events
> index 92596c00d8..1aec580e92 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -16,8 +16,7 @@ loadvm_handle_recv_bitmap(char *s) "%s"
>  loadvm_postcopy_handle_advise(void) ""
>  loadvm_postcopy_handle_listen(const char *str) "%s"
>  loadvm_postcopy_handle_run(void) ""
> -loadvm_postcopy_handle_run_cpu_sync(void) ""
> -loadvm_postcopy_handle_run_vmstart(void) ""
> +loadvm_postcopy_handle_run_bh(const char *str) "%s"
>  loadvm_postcopy_handle_resume(void) ""
>  loadvm_postcopy_ram_handle_discard(void) ""
>  loadvm_postcopy_ram_handle_discard_end(void) ""
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 04/20] migration: Introduce postcopy channels on dest node
  2022-02-16  6:27 ` [PATCH 04/20] migration: Introduce postcopy channels on dest node Peter Xu
@ 2022-02-21 15:49   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 15:49 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Postcopy handles huge pages in a special way that currently we can only have
> one "channel" to transfer the page.
> 
> It's because when we install pages using UFFDIO_COPY, we need to have the whole
> huge page ready, it also means we need to have a temp huge page when trying to
> receive the whole content of the page.
> 
> Currently all maintainance around this tmp page is global: firstly we'll
> allocate a temp huge page, then we maintain its status mostly within
> ram_load_postcopy().
> 
> To enable multiple channels for postcopy, the first thing we need to do is to
> prepare N temp huge pages as caching, one for each channel.
> 
> Meanwhile we need to maintain the tmp huge page status per-channel too.
> 
> To give some example, some local variables maintained in ram_load_postcopy()
> are listed; they are responsible for maintaining temp huge page status:
> 
>   - all_zero:     this keeps whether this huge page contains all zeros
>   - target_pages: this counts how many target pages have been copied
>   - host_page:    this keeps the host ptr for the page to install
> 
> Move all these fields to be together with the temp huge pages to form a new
> structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
> need one structure to keep the state around.
> 
> For vanilla postcopy, obviously there's only one channel.  It contains both
> precopy and postcopy pages.
> 
> This patch teaches the dest migration node to start realize the possible number
> of postcopy channels by introducing the "postcopy_channels" variable.  Its
> value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
> phase).
> 
> Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
> enabled (in the future), we will boost it to 2 because even during partial
> sending of a precopy huge page we still want to preempt it and start sending
> the postcopy requested page right away (so we start to keep two temp huge
> pages; more if we want to enable multifd).  In this patch there's a TODO marked
> for that; so far the channels is always set to 1.
> 
> We need to send one "host huge page" on one channel only and we cannot split
> them, because otherwise the data upon the same huge page can locate on more
> than one channel so we need more complicated logic to manage.  One temp host
> huge page for each channel will be enough for us for now.
> 
> Postcopy will still always use the index=0 huge page even after this patch.
> However it prepares for the latter patches where it can start to use multiple
> channels (which needs src intervention, because only src knows which channel we
> should use).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h    | 36 +++++++++++++++++++++++-
>  migration/postcopy-ram.c | 60 ++++++++++++++++++++++++++++++----------
>  migration/ram.c          | 43 ++++++++++++++--------------
>  migration/savevm.c       | 12 ++++++++
>  4 files changed, 113 insertions(+), 38 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 8130b703eb..42c7395094 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -45,6 +45,24 @@ struct PostcopyBlocktimeContext;
>   */
>  #define CLEAR_BITMAP_SHIFT_MAX            31
>  
> +/* This is an abstraction of a "temp huge page" for postcopy's purpose */
> +typedef struct {
> +    /*
> +     * This points to a temporary huge page as a buffer for UFFDIO_COPY.  It's
> +     * mmap()ed and needs to be freed when cleanup.
> +     */
> +    void *tmp_huge_page;
> +    /*
> +     * This points to the host page we're going to install for this temp page.
> +     * It tells us after we've received the whole page, where we should put it.
> +     */
> +    void *host_addr;
> +    /* Number of small pages copied (in size of TARGET_PAGE_SIZE) */
> +    unsigned int target_pages;
> +    /* Whether this page contains all zeros */
> +    bool all_zero;
> +} PostcopyTmpPage;
> +
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
> @@ -81,7 +99,22 @@ struct MigrationIncomingState {
>      QemuMutex rp_mutex;    /* We send replies from multiple threads */
>      /* RAMBlock of last request sent to source */
>      RAMBlock *last_rb;
> -    void     *postcopy_tmp_page;
> +    /*
> +     * Number of postcopy channels including the default precopy channel, so
> +     * vanilla postcopy will only contain one channel which contain both
> +     * precopy and postcopy streams.
> +     *
> +     * This is calculated when the src requests to enable postcopy but before
> +     * it starts.  Its value can depend on e.g. whether postcopy preemption is
> +     * enabled.
> +     */
> +    unsigned int postcopy_channels;
> +    /*
> +     * An array of temp host huge pages to be used, one for each postcopy
> +     * channel.
> +     */
> +    PostcopyTmpPage *postcopy_tmp_pages;
> +    /* This is shared for all postcopy channels */
>      void     *postcopy_tmp_zero_page;
>      /* PostCopyFD's for external userfaultfds & handlers of shared memory */
>      GArray   *postcopy_remote_fds;
> @@ -391,5 +424,6 @@ bool migration_rate_limit(void);
>  void migration_cancel(const Error *error);
>  
>  void populate_vfio_info(MigrationInfo *info);
> +void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
>  
>  #endif
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index e662dd05cc..315f784965 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -525,9 +525,18 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis)
>  
>  static void postcopy_temp_pages_cleanup(MigrationIncomingState *mis)
>  {
> -    if (mis->postcopy_tmp_page) {
> -        munmap(mis->postcopy_tmp_page, mis->largest_page_size);
> -        mis->postcopy_tmp_page = NULL;
> +    int i;
> +
> +    if (mis->postcopy_tmp_pages) {
> +        for (i = 0; i < mis->postcopy_channels; i++) {
> +            if (mis->postcopy_tmp_pages[i].tmp_huge_page) {
> +                munmap(mis->postcopy_tmp_pages[i].tmp_huge_page,
> +                       mis->largest_page_size);
> +                mis->postcopy_tmp_pages[i].tmp_huge_page = NULL;
> +            }
> +        }
> +        g_free(mis->postcopy_tmp_pages);
> +        mis->postcopy_tmp_pages = NULL;
>      }
>  
>      if (mis->postcopy_tmp_zero_page) {
> @@ -1091,17 +1100,30 @@ retry:
>  
>  static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
>  {
> -    int err;
> -
> -    mis->postcopy_tmp_page = mmap(NULL, mis->largest_page_size,
> -                                  PROT_READ | PROT_WRITE,
> -                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> -    if (mis->postcopy_tmp_page == MAP_FAILED) {
> -        err = errno;
> -        mis->postcopy_tmp_page = NULL;
> -        error_report("%s: Failed to map postcopy_tmp_page %s",
> -                     __func__, strerror(err));
> -        return -err;
> +    PostcopyTmpPage *tmp_page;
> +    int err, i, channels;
> +    void *temp_page;
> +
> +    /* TODO: will be boosted when enable postcopy preemption */
> +    mis->postcopy_channels = 1;
> +
> +    channels = mis->postcopy_channels;
> +    mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
> +
> +    for (i = 0; i < channels; i++) {
> +        tmp_page = &mis->postcopy_tmp_pages[i];
> +        temp_page = mmap(NULL, mis->largest_page_size, PROT_READ | PROT_WRITE,
> +                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +        if (temp_page == MAP_FAILED) {
> +            err = errno;
> +            error_report("%s: Failed to map postcopy_tmp_pages[%d]: %s",
> +                         __func__, i, strerror(err));
> +            /* Clean up will be done later */
> +            return -err;
> +        }
> +        tmp_page->tmp_huge_page = temp_page;
> +        /* Initialize default states for each tmp page */
> +        postcopy_temp_page_reset(tmp_page);
>      }
>  
>      /*
> @@ -1351,6 +1373,16 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
>  #endif
>  
>  /* ------------------------------------------------------------------------- */
> +void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page)
> +{
> +    tmp_page->target_pages = 0;
> +    tmp_page->host_addr = NULL;
> +    /*
> +     * This is set to true when reset, and cleared as long as we received any
> +     * of the non-zero small page within this huge page.
> +     */
> +    tmp_page->all_zero = true;
> +}
>  
>  void postcopy_fault_thread_notify(MigrationIncomingState *mis)
>  {
> diff --git a/migration/ram.c b/migration/ram.c
> index 91ca743ac8..36b0a53afe 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3640,11 +3640,8 @@ static int ram_load_postcopy(QEMUFile *f)
>      bool place_needed = false;
>      bool matches_target_page_size = false;
>      MigrationIncomingState *mis = migration_incoming_get_current();
> -    /* Temporary page that is later 'placed' */
> -    void *postcopy_host_page = mis->postcopy_tmp_page;
> -    void *host_page = NULL;
> -    bool all_zero = true;
> -    int target_pages = 0;
> +    /* Currently we only use channel 0.  TODO: use all the channels */
> +    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[0];
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr;
> @@ -3688,7 +3685,7 @@ static int ram_load_postcopy(QEMUFile *f)
>                  ret = -EINVAL;
>                  break;
>              }
> -            target_pages++;
> +            tmp_page->target_pages++;
>              matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
>              /*
>               * Postcopy requires that we place whole host pages atomically;
> @@ -3700,15 +3697,16 @@ static int ram_load_postcopy(QEMUFile *f)
>               * however the source ensures it always sends all the components
>               * of a host page in one chunk.
>               */
> -            page_buffer = postcopy_host_page +
> +            page_buffer = tmp_page->tmp_huge_page +
>                            host_page_offset_from_ram_block_offset(block, addr);
>              /* If all TP are zero then we can optimise the place */
> -            if (target_pages == 1) {
> -                host_page = host_page_from_ram_block_offset(block, addr);
> -            } else if (host_page != host_page_from_ram_block_offset(block,
> -                                                                    addr)) {
> +            if (tmp_page->target_pages == 1) {
> +                tmp_page->host_addr =
> +                    host_page_from_ram_block_offset(block, addr);
> +            } else if (tmp_page->host_addr !=
> +                       host_page_from_ram_block_offset(block, addr)) {
>                  /* not the 1st TP within the HP */
> -                error_report("Non-same host page %p/%p", host_page,
> +                error_report("Non-same host page %p/%p", tmp_page->host_addr,
>                               host_page_from_ram_block_offset(block, addr));
>                  ret = -EINVAL;
>                  break;
> @@ -3718,10 +3716,11 @@ static int ram_load_postcopy(QEMUFile *f)
>               * If it's the last part of a host page then we place the host
>               * page
>               */
> -            if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
> +            if (tmp_page->target_pages ==
> +                (block->page_size / TARGET_PAGE_SIZE)) {
>                  place_needed = true;
>              }
> -            place_source = postcopy_host_page;
> +            place_source = tmp_page->tmp_huge_page;
>          }
>  
>          switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> @@ -3735,12 +3734,12 @@ static int ram_load_postcopy(QEMUFile *f)
>                  memset(page_buffer, ch, TARGET_PAGE_SIZE);
>              }
>              if (ch) {
> -                all_zero = false;
> +                tmp_page->all_zero = false;
>              }
>              break;
>  
>          case RAM_SAVE_FLAG_PAGE:
> -            all_zero = false;
> +            tmp_page->all_zero = false;
>              if (!matches_target_page_size) {
>                  /* For huge pages, we always use temporary buffer */
>                  qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
> @@ -3758,7 +3757,7 @@ static int ram_load_postcopy(QEMUFile *f)
>              }
>              break;
>          case RAM_SAVE_FLAG_COMPRESS_PAGE:
> -            all_zero = false;
> +            tmp_page->all_zero = false;
>              len = qemu_get_be32(f);
>              if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
>                  error_report("Invalid compressed data length: %d", len);
> @@ -3790,16 +3789,14 @@ static int ram_load_postcopy(QEMUFile *f)
>          }
>  
>          if (!ret && place_needed) {
> -            if (all_zero) {
> -                ret = postcopy_place_page_zero(mis, host_page, block);
> +            if (tmp_page->all_zero) {
> +                ret = postcopy_place_page_zero(mis, tmp_page->host_addr, block);
>              } else {
> -                ret = postcopy_place_page(mis, host_page, place_source,
> +                ret = postcopy_place_page(mis, tmp_page->host_addr, place_source,
>                                            block);
>              }
>              place_needed = false;
> -            target_pages = 0;
> -            /* Assume we have a zero page until we detect something different */
> -            all_zero = true;
> +            postcopy_temp_page_reset(tmp_page);
>          }
>      }
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 41e3238798..0ccd7e5e3f 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2579,6 +2579,18 @@ void qemu_loadvm_state_cleanup(void)
>  /* Return true if we should continue the migration, or false. */
>  static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>  {
> +    int i;
> +
> +    /*
> +     * If network is interrupted, any temp page we received will be useless
> +     * because we didn't mark them as "received" in receivedmap.  After a
> +     * proper recovery later (which will sync src dirty bitmap with receivedmap
> +     * on dest) these cached small pages will be resent again.
> +     */
> +    for (i = 0; i < mis->postcopy_channels; i++) {
> +        postcopy_temp_page_reset(&mis->postcopy_tmp_pages[i]);
> +    }
> +
>      trace_postcopy_pause_incoming();
>  
>      assert(migrate_postcopy_ram());
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/20] migration: Add postcopy_thread_create()
  2022-02-16  6:27 ` [PATCH 06/20] migration: Add postcopy_thread_create() Peter Xu
@ 2022-02-21 16:00   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 16:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Postcopy create threads. A common manner is we init a sem and use it to sync
> with the thread.  Namely, we have fault_thread_sem and listen_thread_sem and
> they're only used for this.
> 
> Make it a shared infrastructure so it's easier to create yet another thread.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h    |  8 +++++---
>  migration/postcopy-ram.c | 23 +++++++++++++++++------
>  migration/postcopy-ram.h |  4 ++++
>  migration/savevm.c       | 12 +++---------
>  4 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 42c7395094..8445e1d14a 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -70,7 +70,11 @@ struct MigrationIncomingState {
>      /* A hook to allow cleanup at the end of incoming migration */
>      void *transport_data;
>      void (*transport_cleanup)(void *data);
> -
> +    /*
> +     * Used to sync thread creations.  Note that we can't create threads in
> +     * parallel with this sem.
> +     */
> +    QemuSemaphore  thread_sync_sem;
>      /*
>       * Free at the start of the main state load, set as the main thread finishes
>       * loading state.
> @@ -83,13 +87,11 @@ struct MigrationIncomingState {
>      size_t         largest_page_size;
>      bool           have_fault_thread;
>      QemuThread     fault_thread;
> -    QemuSemaphore  fault_thread_sem;
>      /* Set this when we want the fault thread to quit */
>      bool           fault_thread_quit;
>  
>      bool           have_listen_thread;
>      QemuThread     listen_thread;
> -    QemuSemaphore  listen_thread_sem;
>  
>      /* For the kernel to send us notifications */
>      int       userfault_fd;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 315f784965..d3ec22e6de 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -77,6 +77,20 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
>                                              &pnd);
>  }
>  
> +/*
> + * NOTE: this routine is not thread safe, we can't call it concurrently. But it
> + * should be good enough for migration's purposes.
> + */
> +void postcopy_thread_create(MigrationIncomingState *mis,
> +                            QemuThread *thread, const char *name,
> +                            void *(*fn)(void *), int joinable)
> +{
> +    qemu_sem_init(&mis->thread_sync_sem, 0);
> +    qemu_thread_create(thread, name, fn, mis, joinable);
> +    qemu_sem_wait(&mis->thread_sync_sem);
> +    qemu_sem_destroy(&mis->thread_sync_sem);
> +}
> +
>  /* Postcopy needs to detect accesses to pages that haven't yet been copied
>   * across, and efficiently map new pages in, the techniques for doing this
>   * are target OS specific.
> @@ -901,7 +915,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>      trace_postcopy_ram_fault_thread_entry();
>      rcu_register_thread();
>      mis->last_rb = NULL; /* last RAMBlock we sent part of */
> -    qemu_sem_post(&mis->fault_thread_sem);
> +    qemu_sem_post(&mis->thread_sync_sem);
>  
>      struct pollfd *pfd;
>      size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
> @@ -1172,11 +1186,8 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    qemu_sem_init(&mis->fault_thread_sem, 0);
> -    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> -                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
> -    qemu_sem_wait(&mis->fault_thread_sem);
> -    qemu_sem_destroy(&mis->fault_thread_sem);
> +    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
> +                           postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
>      mis->have_fault_thread = true;
>  
>      /* Mark so that we get notified of accesses to unwritten areas */
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 6d2b3cf124..07684c0e1d 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -135,6 +135,10 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
>  /* Call the notifier list set by postcopy_add_start_notifier */
>  int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
>  
> +void postcopy_thread_create(MigrationIncomingState *mis,
> +                            QemuThread *thread, const char *name,
> +                            void *(*fn)(void *), int joinable);
> +
>  struct PostCopyFD;
>  
>  /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 0ccd7e5e3f..967ff80547 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1863,7 +1863,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>                                     MIGRATION_STATUS_POSTCOPY_ACTIVE);
> -    qemu_sem_post(&mis->listen_thread_sem);
> +    qemu_sem_post(&mis->thread_sync_sem);
>      trace_postcopy_ram_listen_thread_start();
>  
>      rcu_register_thread();
> @@ -1988,14 +1988,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      }
>  
>      mis->have_listen_thread = true;
> -    /* Start up the listening thread and wait for it to signal ready */
> -    qemu_sem_init(&mis->listen_thread_sem, 0);
> -    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> -                       postcopy_ram_listen_thread, NULL,
> -                       QEMU_THREAD_DETACHED);
> -    qemu_sem_wait(&mis->listen_thread_sem);
> -    qemu_sem_destroy(&mis->listen_thread_sem);
> -
> +    postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen",
> +                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
>      trace_loadvm_postcopy_handle_listen("return");
>  
>      return 0;
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too
  2022-02-16  6:27 ` [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
@ 2022-02-21 16:15   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 16:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We used to have quite a few places making sure -EIO happened and that's the
> only way to trigger postcopy recovery.  That's based on the assumption that
> we'll only return -EIO for channel issues.
> 
> It'll work in 99.99% cases but logically that won't cover some corner cases.
> One example is e.g. ram_block_from_stream() could fail with an interrupted
> network, then -EINVAL will be returned instead of -EIO.
> 
> I remembered Dave Gilbert pointed that out before, but somehow this is
> overlooked.  Neither did I encounter anything outside the -EIO error.
> 
> However we'd better touch that up before it triggers a rare VM data loss during
> live migrating.
> 
> To cover as much those cases as possible, remove the -EIO restriction on
> triggering the postcopy recovery, because even if it's not a channel failure,
> we can't do anything better than halting QEMU anyway - the corpse of the
> process may even be used by a good hand to dig out useful memory regions, or
> the admin could simply kill the process later on.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c    | 4 ++--
>  migration/postcopy-ram.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6e4cc9cc87..67520d3105 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2877,7 +2877,7 @@ retry:
>  out:
>      res = qemu_file_get_error(rp);
>      if (res) {
> -        if (res == -EIO && migration_in_postcopy()) {
> +        if (res && migration_in_postcopy()) {
>              /*
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.
> @@ -3478,7 +3478,7 @@ static MigThrError migration_detect_error(MigrationState *s)
>          error_free(local_error);
>      }
>  
> -    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> +    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
>          /*
>           * For postcopy, we allow the network to be down for a
>           * while. After that, it can be continued by a
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index d3ec22e6de..6be510fea4 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1038,7 +1038,7 @@ retry:
>                                          msg.arg.pagefault.address);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
> -                if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> +                if (postcopy_pause_fault_thread(mis)) {
>                      /* We got reconnected somehow, try to continue */
>                      goto retry;
>                  } else {
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails
  2022-02-16  6:28 ` [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails Peter Xu
@ 2022-02-21 16:16   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 16:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Per the title, remove the return code and simplify the callers as the errors
> will never be triggered.  No functional change intended.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/postcopy-ram.c | 25 ++++---------------------
>  1 file changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 6be510fea4..738cc55fa6 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -890,15 +890,11 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
>                                        affected_cpu);
>  }
>  
> -static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
> +static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
>  {
>      trace_postcopy_pause_fault_thread();
> -
>      qemu_sem_wait(&mis->postcopy_pause_sem_fault);
> -
>      trace_postcopy_pause_fault_thread_continued();
> -
> -    return true;
>  }
>  
>  /*
> @@ -958,13 +954,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>               * broken already using the event. We should hold until
>               * the channel is rebuilt.
>               */
> -            if (postcopy_pause_fault_thread(mis)) {
> -                /* Continue to read the userfaultfd */
> -            } else {
> -                error_report("%s: paused but don't allow to continue",
> -                             __func__);
> -                break;
> -            }
> +            postcopy_pause_fault_thread(mis);
>          }
>  
>          if (pfd[1].revents) {
> @@ -1038,15 +1028,8 @@ retry:
>                                          msg.arg.pagefault.address);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
> -                if (postcopy_pause_fault_thread(mis)) {
> -                    /* We got reconnected somehow, try to continue */
> -                    goto retry;
> -                } else {
> -                    /* This is a unavoidable fault */
> -                    error_report("%s: postcopy_request_page() get %d",
> -                                 __func__, ret);
> -                    break;
> -                }
> +                postcopy_pause_fault_thread(mis);
> +                goto retry;
>              }
>          }
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 12/20] migration: Export ram_load_postcopy()
  2022-02-16  6:28 ` [PATCH 12/20] migration: Export ram_load_postcopy() Peter Xu
@ 2022-02-21 16:17   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 16:17 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Will be reused in postcopy fast load thread.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 2 +-
>  migration/ram.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 1ed70b17d7..f8bc3cd882 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3644,7 +3644,7 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis)
>   *
>   * @f: QEMUFile where to send the data
>   */
> -static int ram_load_postcopy(QEMUFile *f)
> +int ram_load_postcopy(QEMUFile *f)
>  {
>      int flags = 0, ret = 0;
>      bool place_needed = false;
> diff --git a/migration/ram.h b/migration/ram.h
> index 2c6dc3675d..ded0a3a086 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -61,6 +61,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms);
>  /* For incoming postcopy discard */
>  int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
> +int ram_load_postcopy(QEMUFile *f);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 14/20] migration: Add migration_incoming_transport_cleanup()
  2022-02-16  6:28 ` [PATCH 14/20] migration: Add migration_incoming_transport_cleanup() Peter Xu
@ 2022-02-21 16:56   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 16:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Add a helper to cleanup the transport listener.
> 
> When do it, we should also null-ify the cleanup hook and the data, then it's
> even safe to call it multiple times.
> 
> Move the socket_address_list cleanup altogether, because that's a mirror of the
> listener channels and only for the purpose of query-migrate.  Hence when
> someone wants to cleanup the listener transport, it should also want to cleanup
> the socket list too, always.
> 
> No functional change intended.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 22 ++++++++++++++--------
>  migration/migration.h |  1 +
>  2 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b2e6446457..6bb321cdd3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -279,6 +279,19 @@ MigrationIncomingState *migration_incoming_get_current(void)
>      return current_incoming;
>  }
>  
> +void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> +{
> +    if (mis->socket_address_list) {
> +        qapi_free_SocketAddressList(mis->socket_address_list);
> +        mis->socket_address_list = NULL;
> +    }
> +
> +    if (mis->transport_cleanup) {
> +        mis->transport_cleanup(mis->transport_data);
> +        mis->transport_data = mis->transport_cleanup = NULL;
> +    }
> +}
> +
>  void migration_incoming_state_destroy(void)
>  {
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
> @@ -299,10 +312,8 @@ void migration_incoming_state_destroy(void)
>          g_array_free(mis->postcopy_remote_fds, TRUE);
>          mis->postcopy_remote_fds = NULL;
>      }
> -    if (mis->transport_cleanup) {
> -        mis->transport_cleanup(mis->transport_data);
> -    }
>  
> +    migration_incoming_transport_cleanup(mis);
>      qemu_event_reset(&mis->main_thread_load_event);
>  
>      if (mis->page_requested) {
> @@ -310,11 +321,6 @@ void migration_incoming_state_destroy(void)
>          mis->page_requested = NULL;
>      }
>  
> -    if (mis->socket_address_list) {
> -        qapi_free_SocketAddressList(mis->socket_address_list);
> -        mis->socket_address_list = NULL;
> -    }
> -
>      yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>  }
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index d677a750c9..f17ccc657c 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -166,6 +166,7 @@ struct MigrationIncomingState {
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
> +void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
>  /*
>   * Functions to work with blocktime context
>   */
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 15/20] migration: Allow migrate-recover to run multiple times
  2022-02-16  6:28 ` [PATCH 15/20] migration: Allow migrate-recover to run multiple times Peter Xu
@ 2022-02-21 17:03   ` Dr. David Alan Gilbert
  2022-02-22  2:51     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 17:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Previously migration didn't have an easy way to cleanup the listening
> transport, migrate recovery only allows to execute once.  That's done with a
> trick flag in postcopy_recover_triggered.
> 
> Now the facility is already there.
> 
> Drop postcopy_recover_triggered and instead allows a new migrate-recover to
> release the previous listener transport.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

OK, was that the only reason you couldn't recover twice?


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 13 ++-----------
>  migration/migration.h |  1 -
>  migration/savevm.c    |  3 ---
>  3 files changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6bb321cdd3..16086897aa 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2159,11 +2159,8 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>          return;
>      }
>  
> -    if (qatomic_cmpxchg(&mis->postcopy_recover_triggered,
> -                       false, true) == true) {
> -        error_setg(errp, "Migrate recovery is triggered already");
> -        return;
> -    }
> +    /* If there's an existing transport, release it */
> +    migration_incoming_transport_cleanup(mis);
>  
>      /*
>       * Note that this call will never start a real migration; it will
> @@ -2171,12 +2168,6 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>       * to continue using that newly established channel.
>       */
>      qemu_start_incoming_migration(uri, errp);
> -
> -    /* Safe to dereference with the assert above */
> -    if (*errp) {
> -        /* Reset the flag so user could still retry */
> -        qatomic_set(&mis->postcopy_recover_triggered, false);
> -    }
>  }
>  
>  void qmp_migrate_pause(Error **errp)
> diff --git a/migration/migration.h b/migration/migration.h
> index f17ccc657c..a863032b71 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -139,7 +139,6 @@ struct MigrationIncomingState {
>      struct PostcopyBlocktimeContext *blocktime_ctx;
>  
>      /* notify PAUSED postcopy incoming migrations to try to continue */
> -    bool postcopy_recover_triggered;
>      QemuSemaphore postcopy_pause_sem_dst;
>      QemuSemaphore postcopy_pause_sem_fault;
>  
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 967ff80547..254aa78234 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2589,9 +2589,6 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>  
>      assert(migrate_postcopy_ram());
>  
> -    /* Clear the triggered bit to allow one recovery */
> -    mis->postcopy_recover_triggered = false;
> -
>      /*
>       * Unregister yank with either from/to src would work, since ioc behind it
>       * is the same
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation
  2022-02-16  6:28 ` [PATCH 17/20] migration: Postcopy preemption preparation on channel creation Peter Xu
@ 2022-02-21 18:39   ` Dr. David Alan Gilbert
  2022-02-22  8:34     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-21 18:39 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Create a new socket for postcopy to be prepared to send postcopy requested
> pages via this specific channel, so as to not get blocked by precopy pages.
> 
> A new thread is also created on dest qemu to receive data from this new channel
> based on the ram_load_postcopy() routine.
> 
> The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
> function, and that'll be done in follow up patches.
> 
> Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
> thread too to make sure it'll be recycled properly.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c    | 62 ++++++++++++++++++++++++----
>  migration/migration.h    |  8 ++++
>  migration/postcopy-ram.c | 88 ++++++++++++++++++++++++++++++++++++++--
>  migration/postcopy-ram.h | 10 +++++
>  migration/ram.c          | 25 ++++++++----
>  migration/ram.h          |  4 +-
>  migration/socket.c       | 22 +++++++++-
>  migration/socket.h       |  1 +
>  migration/trace-events   |  3 ++
>  9 files changed, 203 insertions(+), 20 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4c22bad304..3d7f897b72 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
>          mis->page_requested = NULL;
>      }
>  
> +    if (mis->postcopy_qemufile_dst) {
> +        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
> +        qemu_fclose(mis->postcopy_qemufile_dst);
> +        mis->postcopy_qemufile_dst = NULL;
> +    }
> +
>      yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>  }
>  
> @@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
>      migration_incoming_process();
>  }
>  
> +static bool migration_needs_multiple_sockets(void)
> +{
> +    return migrate_use_multifd() || migrate_postcopy_preempt();
> +}
> +
>  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      Error *local_err = NULL;
>      bool start_migration;
> +    QEMUFile *f;
>  
>      if (!mis->from_src_file) {
>          /* The first connection (multifd may have multiple) */
> -        QEMUFile *f = qemu_fopen_channel_input(ioc);
> +        f = qemu_fopen_channel_input(ioc);
>  
>          if (!migration_incoming_setup(f, errp)) {
>              return;
> @@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  
>          /*
>           * Common migration only needs one channel, so we can start
> -         * right now.  Multifd needs more than one channel, we wait.
> +         * right now.  Some features need more than one channel, we wait.
>           */
> -        start_migration = !migrate_use_multifd();
> +        start_migration = !migration_needs_multiple_sockets();
>      } else {
>          /* Multiple connections */
> -        assert(migrate_use_multifd());
> -        start_migration = multifd_recv_new_channel(ioc, &local_err);
> +        assert(migration_needs_multiple_sockets());
> +        if (migrate_use_multifd()) {
> +            start_migration = multifd_recv_new_channel(ioc, &local_err);
> +        } else if (migrate_postcopy_preempt()) {
> +            f = qemu_fopen_channel_input(ioc);
> +            start_migration = postcopy_preempt_new_channel(mis, f);
> +        }
>          if (local_err) {
>              error_propagate(errp, local_err);
>              return;
> @@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  bool migration_has_all_channels(void)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
> -    bool all_channels;
>  
> -    all_channels = multifd_recv_all_channels_created();
> +    if (!mis->from_src_file) {
> +        return false;
> +    }
> +
> +    if (migrate_use_multifd()) {
> +        return multifd_recv_all_channels_created();
> +    }
> +
> +    if (migrate_postcopy_preempt()) {
> +        return mis->postcopy_qemufile_dst != NULL;
> +    }
>  
> -    return all_channels && mis->from_src_file != NULL;
> +    return true;
>  }
>  
>  /*
> @@ -1858,6 +1884,12 @@ static void migrate_fd_cleanup(MigrationState *s)
>          qemu_fclose(tmp);
>      }
>  
> +    if (s->postcopy_qemufile_src) {
> +        migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> +        qemu_fclose(s->postcopy_qemufile_src);
> +        s->postcopy_qemufile_src = NULL;
> +    }
> +
>      assert(!migration_is_active(s));
>  
>      if (s->state == MIGRATION_STATUS_CANCELLING) {
> @@ -3233,6 +3265,11 @@ static void migration_completion(MigrationState *s)
>          qemu_savevm_state_complete_postcopy(s->to_dst_file);
>          qemu_mutex_unlock_iothread();
>  
> +        /* Shutdown the postcopy fast path thread */
> +        if (migrate_postcopy_preempt()) {
> +            postcopy_preempt_shutdown_file(s);
> +        }
> +
>          trace_migration_completion_postcopy_end_after_complete();
>      } else {
>          goto fail;
> @@ -4120,6 +4157,15 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>          }
>      }
>  
> +    /* This needs to be done before resuming a postcopy */
> +    if (postcopy_preempt_setup(s, &local_err)) {
> +        error_report_err(local_err);
> +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> +                          MIGRATION_STATUS_FAILED);
> +        migrate_fd_cleanup(s);
> +        return;
> +    }
> +
>      if (resume) {
>          /* Wakeup the main migration thread to do the recovery */
>          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> diff --git a/migration/migration.h b/migration/migration.h
> index af4bcb19c2..caa910d956 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -23,6 +23,7 @@
>  #include "io/channel-buffer.h"
>  #include "net/announce.h"
>  #include "qom/object.h"
> +#include "postcopy-ram.h"
>  
>  struct PostcopyBlocktimeContext;
>  
> @@ -112,6 +113,11 @@ struct MigrationIncomingState {
>       * enabled.
>       */
>      unsigned int postcopy_channels;
> +    /* QEMUFile for postcopy only; it'll be handled by a separate thread */
> +    QEMUFile *postcopy_qemufile_dst;
> +    /* Postcopy priority thread is used to receive postcopy requested pages */
> +    QemuThread postcopy_prio_thread;
> +    bool postcopy_prio_thread_created;
>      /*
>       * An array of temp host huge pages to be used, one for each postcopy
>       * channel.
> @@ -192,6 +198,8 @@ struct MigrationState {
>      QEMUBH *cleanup_bh;
>      /* Protected by qemu_file_lock */
>      QEMUFile *to_dst_file;
> +    /* Postcopy specific transfer channel */
> +    QEMUFile *postcopy_qemufile_src;
>      QIOChannelBuffer *bioc;
>      /*
>       * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 738cc55fa6..30eddaacd1 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -32,6 +32,9 @@
>  #include "trace.h"
>  #include "hw/boards.h"
>  #include "exec/ramblock.h"
> +#include "socket.h"
> +#include "qemu-file-channel.h"
> +#include "yank_functions.h"
>  
>  /* Arbitrary limit on size of each discard command,
>   * keeps them around ~200 bytes
> @@ -566,6 +569,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>  {
>      trace_postcopy_ram_incoming_cleanup_entry();
>  
> +    if (mis->postcopy_prio_thread_created) {
> +        qemu_thread_join(&mis->postcopy_prio_thread);
> +        mis->postcopy_prio_thread_created = false;
> +    }
> +
>      if (mis->have_fault_thread) {
>          Error *local_err = NULL;
>  
> @@ -1101,8 +1109,13 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
>      int err, i, channels;
>      void *temp_page;
>  
> -    /* TODO: will be boosted when enable postcopy preemption */
> -    mis->postcopy_channels = 1;
> +    if (migrate_postcopy_preempt()) {
> +        /* If preemption enabled, need extra channel for urgent requests */
> +        mis->postcopy_channels = RAM_CHANNEL_MAX;
> +    } else {
> +        /* Both precopy/postcopy on the same channel */
> +        mis->postcopy_channels = 1;
> +    }
>  
>      channels = mis->postcopy_channels;
>      mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
> @@ -1169,7 +1182,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
> +    postcopy_thread_create(mis, &mis->fault_thread, "qemu/fault-default",

That name is still too long; I'd lose the 'qemu/'

>                             postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
>      mis->have_fault_thread = true;
>  
> @@ -1184,6 +1197,16 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> +    if (migrate_postcopy_preempt()) {
> +        /*
> +         * This thread needs to be created after the temp pages because it'll fetch
> +         * RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
> +         */
> +        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "qemu/fault-fast",

and that one.

> +                               postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
> +        mis->postcopy_prio_thread_created = true;
> +    }
> +
>      trace_postcopy_ram_enable_notify();
>  
>      return 0;
> @@ -1513,3 +1536,62 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
>          }
>      }
>  }
> +
> +bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
> +{
> +    mis->postcopy_qemufile_dst = file;
> +
> +    trace_postcopy_preempt_new_channel();
> +
> +    /* Start the migration immediately */
> +    return true;
> +}
> +
> +int postcopy_preempt_setup(MigrationState *s, Error **errp)
> +{
> +    QIOChannel *ioc;
> +
> +    if (!migrate_postcopy_preempt()) {
> +        return 0;
> +    }
> +
> +    if (!migrate_multi_channels_is_allowed()) {
> +        error_setg(errp, "Postcopy preempt is not supported as current "
> +                   "migration stream does not support multi-channels.");
> +        return -1;
> +    }
> +
> +    ioc = socket_send_channel_create_sync(errp);

How do we get away with doing this sync here, but have to use async for
multifd?

Dave

> +    if (ioc == NULL) {
> +        return -1;
> +    }
> +
> +    migration_ioc_register_yank(ioc);
> +    s->postcopy_qemufile_src = qemu_fopen_channel_output(ioc);
> +
> +    trace_postcopy_preempt_new_channel();
> +
> +    return 0;
> +}
> +
> +void *postcopy_preempt_thread(void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +    int ret;
> +
> +    trace_postcopy_preempt_thread_entry();
> +
> +    rcu_register_thread();
> +
> +    qemu_sem_post(&mis->thread_sync_sem);
> +
> +    /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
> +    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
> +
> +    rcu_unregister_thread();
> +
> +    trace_postcopy_preempt_thread_exit();
> +
> +    return ret == 0 ? NULL : (void *)-1;
> +}
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 07684c0e1d..34b1080cde 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -183,4 +183,14 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd, uint64_t client_addr,
>  int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>                                   uint64_t client_addr, uint64_t offset);
>  
> +/* Hard-code channels for now for postcopy preemption */
> +enum PostcopyChannels {
> +    RAM_CHANNEL_PRECOPY = 0,
> +    RAM_CHANNEL_POSTCOPY = 1,
> +    RAM_CHANNEL_MAX,
> +};
> +
> +bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
> +int postcopy_preempt_setup(MigrationState *s, Error **errp);
> +
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index f8bc3cd882..36e3da6bb0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3643,15 +3643,15 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis)
>   * rcu_read_lock is taken prior to this being called.
>   *
>   * @f: QEMUFile where to send the data
> + * @channel: the channel to use for loading
>   */
> -int ram_load_postcopy(QEMUFile *f)
> +int ram_load_postcopy(QEMUFile *f, int channel)
>  {
>      int flags = 0, ret = 0;
>      bool place_needed = false;
>      bool matches_target_page_size = false;
>      MigrationIncomingState *mis = migration_incoming_get_current();
> -    /* Currently we only use channel 0.  TODO: use all the channels */
> -    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[0];
> +    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[channel];
>  
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr;
> @@ -3716,10 +3716,10 @@ int ram_load_postcopy(QEMUFile *f)
>              } else if (tmp_page->host_addr !=
>                         host_page_from_ram_block_offset(block, addr)) {
>                  /* not the 1st TP within the HP */
> -                error_report("Non-same host page detected.  Target host page %p, "
> -                             "received host page %p "
> +                error_report("Non-same host page detected on channel %d: "
> +                             "Target host page %p, received host page %p "
>                               "(rb %s offset 0x"RAM_ADDR_FMT" target_pages %d)",
> -                             tmp_page->host_addr,
> +                             channel, tmp_page->host_addr,
>                               host_page_from_ram_block_offset(block, addr),
>                               block->idstr, addr, tmp_page->target_pages);
>                  ret = -EINVAL;
> @@ -4106,7 +4106,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       */
>      WITH_RCU_READ_LOCK_GUARD() {
>          if (postcopy_running) {
> -            ret = ram_load_postcopy(f);
> +            /*
> +             * Note!  Here RAM_CHANNEL_PRECOPY is the precopy channel of
> +             * postcopy migration, we have another RAM_CHANNEL_POSTCOPY to
> +             * service fast page faults.
> +             */
> +            ret = ram_load_postcopy(f, RAM_CHANNEL_PRECOPY);
>          } else {
>              ret = ram_load_precopy(f);
>          }
> @@ -4268,6 +4273,12 @@ static int ram_resume_prepare(MigrationState *s, void *opaque)
>      return 0;
>  }
>  
> +void postcopy_preempt_shutdown_file(MigrationState *s)
> +{
> +    qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
> +    qemu_fflush(s->postcopy_qemufile_src);
> +}
> +
>  static SaveVMHandlers savevm_ram_handlers = {
>      .save_setup = ram_save_setup,
>      .save_live_iterate = ram_save_iterate,
> diff --git a/migration/ram.h b/migration/ram.h
> index ded0a3a086..5d90945a6e 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -61,7 +61,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms);
>  /* For incoming postcopy discard */
>  int ram_discard_range(const char *block_name, uint64_t start, size_t length);
>  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
> -int ram_load_postcopy(QEMUFile *f);
> +int ram_load_postcopy(QEMUFile *f, int channel);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>  
> @@ -73,6 +73,8 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>                                    const char *block_name);
>  int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
>  bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start);
> +void postcopy_preempt_shutdown_file(MigrationState *s);
> +void *postcopy_preempt_thread(void *opaque);
>  
>  /* ram cache */
>  int colo_init_ram_cache(void);
> diff --git a/migration/socket.c b/migration/socket.c
> index 05705a32d8..a7f345b353 100644
> --- a/migration/socket.c
> +++ b/migration/socket.c
> @@ -26,7 +26,7 @@
>  #include "io/channel-socket.h"
>  #include "io/net-listener.h"
>  #include "trace.h"
> -
> +#include "postcopy-ram.h"
>  
>  struct SocketOutgoingArgs {
>      SocketAddress *saddr;
> @@ -39,6 +39,24 @@ void socket_send_channel_create(QIOTaskFunc f, void *data)
>                                       f, data, NULL, NULL);
>  }
>  
> +QIOChannel *socket_send_channel_create_sync(Error **errp)
> +{
> +    QIOChannelSocket *sioc = qio_channel_socket_new();
> +
> +    if (!outgoing_args.saddr) {
> +        object_unref(OBJECT(sioc));
> +        error_setg(errp, "Initial sock address not set!");
> +        return NULL;
> +    }
> +
> +    if (qio_channel_socket_connect_sync(sioc, outgoing_args.saddr, errp) < 0) {
> +        object_unref(OBJECT(sioc));
> +        return NULL;
> +    }
> +
> +    return QIO_CHANNEL(sioc);
> +}
> +
>  int socket_send_channel_destroy(QIOChannel *send)
>  {
>      /* Remove channel */
> @@ -158,6 +176,8 @@ socket_start_incoming_migration_internal(SocketAddress *saddr,
>  
>      if (migrate_use_multifd()) {
>          num = migrate_multifd_channels();
> +    } else if (migrate_postcopy_preempt()) {
> +        num = RAM_CHANNEL_MAX;
>      }
>  
>      if (qio_net_listener_open_sync(listener, saddr, num, errp) < 0) {
> diff --git a/migration/socket.h b/migration/socket.h
> index 891dbccceb..dc54df4e6c 100644
> --- a/migration/socket.h
> +++ b/migration/socket.h
> @@ -21,6 +21,7 @@
>  #include "io/task.h"
>  
>  void socket_send_channel_create(QIOTaskFunc f, void *data);
> +QIOChannel *socket_send_channel_create_sync(Error **errp);
>  int socket_send_channel_destroy(QIOChannel *send);
>  
>  void socket_start_incoming_migration(const char *str, Error **errp);
> diff --git a/migration/trace-events b/migration/trace-events
> index 1aec580e92..4474a76614 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -278,6 +278,9 @@ postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_off
>  postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
>  postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
>  postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
> +postcopy_preempt_new_channel(void) ""
> +postcopy_preempt_thread_entry(void) ""
> +postcopy_preempt_thread_exit(void) ""
>  
>  get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 15/20] migration: Allow migrate-recover to run multiple times
  2022-02-21 17:03   ` Dr. David Alan Gilbert
@ 2022-02-22  2:51     ` Peter Xu
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-22  2:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Mon, Feb 21, 2022 at 05:03:24PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Previously migration didn't have an easy way to cleanup the listening
> > transport, migrate recovery only allows to execute once.  That's done with a
> > trick flag in postcopy_recover_triggered.
> > 
> > Now the facility is already there.
> > 
> > Drop postcopy_recover_triggered and instead allows a new migrate-recover to
> > release the previous listener transport.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> OK, was that the only reason you couldn't recover twice?

We could recover twice, but we couldn't specify the listening port twice
because AFAIU previously we don't have a good way to clean the existing
listener.

IOW we could always run pause->recover->pause->recover sequence even before
this patch[set], but we can never run continuous recover->recover because
the 2nd one will fail telling that we've already setup a recovery port.

> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation
  2022-02-21 18:39   ` Dr. David Alan Gilbert
@ 2022-02-22  8:34     ` Peter Xu
  2022-02-22 10:19       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-22  8:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Mon, Feb 21, 2022 at 06:39:36PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Create a new socket for postcopy to be prepared to send postcopy requested
> > pages via this specific channel, so as to not get blocked by precopy pages.
> > 
> > A new thread is also created on dest qemu to receive data from this new channel
> > based on the ram_load_postcopy() routine.
> > 
> > The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
> > function, and that'll be done in follow up patches.
> > 
> > Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
> > thread too to make sure it'll be recycled properly.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/migration.c    | 62 ++++++++++++++++++++++++----
> >  migration/migration.h    |  8 ++++
> >  migration/postcopy-ram.c | 88 ++++++++++++++++++++++++++++++++++++++--
> >  migration/postcopy-ram.h | 10 +++++
> >  migration/ram.c          | 25 ++++++++----
> >  migration/ram.h          |  4 +-
> >  migration/socket.c       | 22 +++++++++-
> >  migration/socket.h       |  1 +
> >  migration/trace-events   |  3 ++
> >  9 files changed, 203 insertions(+), 20 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 4c22bad304..3d7f897b72 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
> >          mis->page_requested = NULL;
> >      }
> >  
> > +    if (mis->postcopy_qemufile_dst) {
> > +        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
> > +        qemu_fclose(mis->postcopy_qemufile_dst);
> > +        mis->postcopy_qemufile_dst = NULL;
> > +    }
> > +
> >      yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> >  }
> >  
> > @@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
> >      migration_incoming_process();
> >  }
> >  
> > +static bool migration_needs_multiple_sockets(void)
> > +{
> > +    return migrate_use_multifd() || migrate_postcopy_preempt();
> > +}
> > +
> >  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> >  {
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> >      Error *local_err = NULL;
> >      bool start_migration;
> > +    QEMUFile *f;
> >  
> >      if (!mis->from_src_file) {
> >          /* The first connection (multifd may have multiple) */
> > -        QEMUFile *f = qemu_fopen_channel_input(ioc);
> > +        f = qemu_fopen_channel_input(ioc);
> >  
> >          if (!migration_incoming_setup(f, errp)) {
> >              return;
> > @@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> >  
> >          /*
> >           * Common migration only needs one channel, so we can start
> > -         * right now.  Multifd needs more than one channel, we wait.
> > +         * right now.  Some features need more than one channel, we wait.
> >           */
> > -        start_migration = !migrate_use_multifd();
> > +        start_migration = !migration_needs_multiple_sockets();
> >      } else {
> >          /* Multiple connections */
> > -        assert(migrate_use_multifd());
> > -        start_migration = multifd_recv_new_channel(ioc, &local_err);
> > +        assert(migration_needs_multiple_sockets());
> > +        if (migrate_use_multifd()) {
> > +            start_migration = multifd_recv_new_channel(ioc, &local_err);
> > +        } else if (migrate_postcopy_preempt()) {
> > +            f = qemu_fopen_channel_input(ioc);
> > +            start_migration = postcopy_preempt_new_channel(mis, f);
> > +        }
> >          if (local_err) {
> >              error_propagate(errp, local_err);
> >              return;
> > @@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> >  bool migration_has_all_channels(void)
> >  {
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > -    bool all_channels;
> >  
> > -    all_channels = multifd_recv_all_channels_created();
> > +    if (!mis->from_src_file) {
> > +        return false;
> > +    }
> > +
> > +    if (migrate_use_multifd()) {
> > +        return multifd_recv_all_channels_created();
> > +    }
> > +
> > +    if (migrate_postcopy_preempt()) {
> > +        return mis->postcopy_qemufile_dst != NULL;
> > +    }
> >  
> > -    return all_channels && mis->from_src_file != NULL;
> > +    return true;
> >  }
> >  
> >  /*
> > @@ -1858,6 +1884,12 @@ static void migrate_fd_cleanup(MigrationState *s)
> >          qemu_fclose(tmp);
> >      }
> >  
> > +    if (s->postcopy_qemufile_src) {
> > +        migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > +        qemu_fclose(s->postcopy_qemufile_src);
> > +        s->postcopy_qemufile_src = NULL;
> > +    }
> > +
> >      assert(!migration_is_active(s));
> >  
> >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > @@ -3233,6 +3265,11 @@ static void migration_completion(MigrationState *s)
> >          qemu_savevm_state_complete_postcopy(s->to_dst_file);
> >          qemu_mutex_unlock_iothread();
> >  
> > +        /* Shutdown the postcopy fast path thread */
> > +        if (migrate_postcopy_preempt()) {
> > +            postcopy_preempt_shutdown_file(s);
> > +        }
> > +
> >          trace_migration_completion_postcopy_end_after_complete();
> >      } else {
> >          goto fail;
> > @@ -4120,6 +4157,15 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> >          }
> >      }
> >  
> > +    /* This needs to be done before resuming a postcopy */
> > +    if (postcopy_preempt_setup(s, &local_err)) {
> > +        error_report_err(local_err);
> > +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > +                          MIGRATION_STATUS_FAILED);
> > +        migrate_fd_cleanup(s);
> > +        return;
> > +    }
> > +
> >      if (resume) {
> >          /* Wakeup the main migration thread to do the recovery */
> >          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> > diff --git a/migration/migration.h b/migration/migration.h
> > index af4bcb19c2..caa910d956 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -23,6 +23,7 @@
> >  #include "io/channel-buffer.h"
> >  #include "net/announce.h"
> >  #include "qom/object.h"
> > +#include "postcopy-ram.h"
> >  
> >  struct PostcopyBlocktimeContext;
> >  
> > @@ -112,6 +113,11 @@ struct MigrationIncomingState {
> >       * enabled.
> >       */
> >      unsigned int postcopy_channels;
> > +    /* QEMUFile for postcopy only; it'll be handled by a separate thread */
> > +    QEMUFile *postcopy_qemufile_dst;
> > +    /* Postcopy priority thread is used to receive postcopy requested pages */
> > +    QemuThread postcopy_prio_thread;
> > +    bool postcopy_prio_thread_created;
> >      /*
> >       * An array of temp host huge pages to be used, one for each postcopy
> >       * channel.
> > @@ -192,6 +198,8 @@ struct MigrationState {
> >      QEMUBH *cleanup_bh;
> >      /* Protected by qemu_file_lock */
> >      QEMUFile *to_dst_file;
> > +    /* Postcopy specific transfer channel */
> > +    QEMUFile *postcopy_qemufile_src;
> >      QIOChannelBuffer *bioc;
> >      /*
> >       * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 738cc55fa6..30eddaacd1 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -32,6 +32,9 @@
> >  #include "trace.h"
> >  #include "hw/boards.h"
> >  #include "exec/ramblock.h"
> > +#include "socket.h"
> > +#include "qemu-file-channel.h"
> > +#include "yank_functions.h"
> >  
> >  /* Arbitrary limit on size of each discard command,
> >   * keeps them around ~200 bytes
> > @@ -566,6 +569,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >  {
> >      trace_postcopy_ram_incoming_cleanup_entry();
> >  
> > +    if (mis->postcopy_prio_thread_created) {
> > +        qemu_thread_join(&mis->postcopy_prio_thread);
> > +        mis->postcopy_prio_thread_created = false;
> > +    }
> > +
> >      if (mis->have_fault_thread) {
> >          Error *local_err = NULL;
> >  
> > @@ -1101,8 +1109,13 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
> >      int err, i, channels;
> >      void *temp_page;
> >  
> > -    /* TODO: will be boosted when enable postcopy preemption */
> > -    mis->postcopy_channels = 1;
> > +    if (migrate_postcopy_preempt()) {
> > +        /* If preemption enabled, need extra channel for urgent requests */
> > +        mis->postcopy_channels = RAM_CHANNEL_MAX;
> > +    } else {
> > +        /* Both precopy/postcopy on the same channel */
> > +        mis->postcopy_channels = 1;
> > +    }
> >  
> >      channels = mis->postcopy_channels;
> >      mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
> > @@ -1169,7 +1182,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
> >          return -1;
> >      }
> >  
> > -    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
> > +    postcopy_thread_create(mis, &mis->fault_thread, "qemu/fault-default",
> 
> That name is still too long; I'd lose the 'qemu/'

Sure.

> 
> >                             postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
> >      mis->have_fault_thread = true;
> >  
> > @@ -1184,6 +1197,16 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
> >          return -1;
> >      }
> >  
> > +    if (migrate_postcopy_preempt()) {
> > +        /*
> > +         * This thread needs to be created after the temp pages because it'll fetch
> > +         * RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
> > +         */
> > +        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "qemu/fault-fast",
> 
> and that one.

OK.

> 
> > +                               postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
> > +        mis->postcopy_prio_thread_created = true;
> > +    }
> > +
> >      trace_postcopy_ram_enable_notify();
> >  
> >      return 0;
> > @@ -1513,3 +1536,62 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
> >          }
> >      }
> >  }
> > +
> > +bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
> > +{
> > +    mis->postcopy_qemufile_dst = file;
> > +
> > +    trace_postcopy_preempt_new_channel();
> > +
> > +    /* Start the migration immediately */
> > +    return true;
> > +}
> > +
> > +int postcopy_preempt_setup(MigrationState *s, Error **errp)
> > +{
> > +    QIOChannel *ioc;
> > +
> > +    if (!migrate_postcopy_preempt()) {
> > +        return 0;
> > +    }
> > +
> > +    if (!migrate_multi_channels_is_allowed()) {
> > +        error_setg(errp, "Postcopy preempt is not supported as current "
> > +                   "migration stream does not support multi-channels.");
> > +        return -1;
> > +    }
> > +
> > +    ioc = socket_send_channel_create_sync(errp);
> 
> How do we get away with doing this sync here, but have to use async for
> multifd?

Ah right..  I think both will work?  But async (as what multifd is doing)
is definitely more elegant because synced version can block the main thread
for unexpected times.

I'll add a new patch to support async channel creations (assuming that's
better than squashing the change).

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation
  2022-02-22  8:34     ` Peter Xu
@ 2022-02-22 10:19       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-22 10:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Feb 21, 2022 at 06:39:36PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Create a new socket for postcopy to be prepared to send postcopy requested
> > > pages via this specific channel, so as to not get blocked by precopy pages.
> > > 
> > > A new thread is also created on dest qemu to receive data from this new channel
> > > based on the ram_load_postcopy() routine.
> > > 
> > > The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
> > > function, and that'll be done in follow up patches.
> > > 
> > > Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
> > > thread too to make sure it'll be recycled properly.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  migration/migration.c    | 62 ++++++++++++++++++++++++----
> > >  migration/migration.h    |  8 ++++
> > >  migration/postcopy-ram.c | 88 ++++++++++++++++++++++++++++++++++++++--
> > >  migration/postcopy-ram.h | 10 +++++
> > >  migration/ram.c          | 25 ++++++++----
> > >  migration/ram.h          |  4 +-
> > >  migration/socket.c       | 22 +++++++++-
> > >  migration/socket.h       |  1 +
> > >  migration/trace-events   |  3 ++
> > >  9 files changed, 203 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 4c22bad304..3d7f897b72 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
> > >          mis->page_requested = NULL;
> > >      }
> > >  
> > > +    if (mis->postcopy_qemufile_dst) {
> > > +        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
> > > +        qemu_fclose(mis->postcopy_qemufile_dst);
> > > +        mis->postcopy_qemufile_dst = NULL;
> > > +    }
> > > +
> > >      yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> > >  }
> > >  
> > > @@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error **errp)
> > >      migration_incoming_process();
> > >  }
> > >  
> > > +static bool migration_needs_multiple_sockets(void)
> > > +{
> > > +    return migrate_use_multifd() || migrate_postcopy_preempt();
> > > +}
> > > +
> > >  void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> > >  {
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > >      Error *local_err = NULL;
> > >      bool start_migration;
> > > +    QEMUFile *f;
> > >  
> > >      if (!mis->from_src_file) {
> > >          /* The first connection (multifd may have multiple) */
> > > -        QEMUFile *f = qemu_fopen_channel_input(ioc);
> > > +        f = qemu_fopen_channel_input(ioc);
> > >  
> > >          if (!migration_incoming_setup(f, errp)) {
> > >              return;
> > > @@ -730,13 +742,18 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> > >  
> > >          /*
> > >           * Common migration only needs one channel, so we can start
> > > -         * right now.  Multifd needs more than one channel, we wait.
> > > +         * right now.  Some features need more than one channel, we wait.
> > >           */
> > > -        start_migration = !migrate_use_multifd();
> > > +        start_migration = !migration_needs_multiple_sockets();
> > >      } else {
> > >          /* Multiple connections */
> > > -        assert(migrate_use_multifd());
> > > -        start_migration = multifd_recv_new_channel(ioc, &local_err);
> > > +        assert(migration_needs_multiple_sockets());
> > > +        if (migrate_use_multifd()) {
> > > +            start_migration = multifd_recv_new_channel(ioc, &local_err);
> > > +        } else if (migrate_postcopy_preempt()) {
> > > +            f = qemu_fopen_channel_input(ioc);
> > > +            start_migration = postcopy_preempt_new_channel(mis, f);
> > > +        }
> > >          if (local_err) {
> > >              error_propagate(errp, local_err);
> > >              return;
> > > @@ -761,11 +778,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
> > >  bool migration_has_all_channels(void)
> > >  {
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > -    bool all_channels;
> > >  
> > > -    all_channels = multifd_recv_all_channels_created();
> > > +    if (!mis->from_src_file) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (migrate_use_multifd()) {
> > > +        return multifd_recv_all_channels_created();
> > > +    }
> > > +
> > > +    if (migrate_postcopy_preempt()) {
> > > +        return mis->postcopy_qemufile_dst != NULL;
> > > +    }
> > >  
> > > -    return all_channels && mis->from_src_file != NULL;
> > > +    return true;
> > >  }
> > >  
> > >  /*
> > > @@ -1858,6 +1884,12 @@ static void migrate_fd_cleanup(MigrationState *s)
> > >          qemu_fclose(tmp);
> > >      }
> > >  
> > > +    if (s->postcopy_qemufile_src) {
> > > +        migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > > +        qemu_fclose(s->postcopy_qemufile_src);
> > > +        s->postcopy_qemufile_src = NULL;
> > > +    }
> > > +
> > >      assert(!migration_is_active(s));
> > >  
> > >      if (s->state == MIGRATION_STATUS_CANCELLING) {
> > > @@ -3233,6 +3265,11 @@ static void migration_completion(MigrationState *s)
> > >          qemu_savevm_state_complete_postcopy(s->to_dst_file);
> > >          qemu_mutex_unlock_iothread();
> > >  
> > > +        /* Shutdown the postcopy fast path thread */
> > > +        if (migrate_postcopy_preempt()) {
> > > +            postcopy_preempt_shutdown_file(s);
> > > +        }
> > > +
> > >          trace_migration_completion_postcopy_end_after_complete();
> > >      } else {
> > >          goto fail;
> > > @@ -4120,6 +4157,15 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> > >          }
> > >      }
> > >  
> > > +    /* This needs to be done before resuming a postcopy */
> > > +    if (postcopy_preempt_setup(s, &local_err)) {
> > > +        error_report_err(local_err);
> > > +        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > +                          MIGRATION_STATUS_FAILED);
> > > +        migrate_fd_cleanup(s);
> > > +        return;
> > > +    }
> > > +
> > >      if (resume) {
> > >          /* Wakeup the main migration thread to do the recovery */
> > >          migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index af4bcb19c2..caa910d956 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -23,6 +23,7 @@
> > >  #include "io/channel-buffer.h"
> > >  #include "net/announce.h"
> > >  #include "qom/object.h"
> > > +#include "postcopy-ram.h"
> > >  
> > >  struct PostcopyBlocktimeContext;
> > >  
> > > @@ -112,6 +113,11 @@ struct MigrationIncomingState {
> > >       * enabled.
> > >       */
> > >      unsigned int postcopy_channels;
> > > +    /* QEMUFile for postcopy only; it'll be handled by a separate thread */
> > > +    QEMUFile *postcopy_qemufile_dst;
> > > +    /* Postcopy priority thread is used to receive postcopy requested pages */
> > > +    QemuThread postcopy_prio_thread;
> > > +    bool postcopy_prio_thread_created;
> > >      /*
> > >       * An array of temp host huge pages to be used, one for each postcopy
> > >       * channel.
> > > @@ -192,6 +198,8 @@ struct MigrationState {
> > >      QEMUBH *cleanup_bh;
> > >      /* Protected by qemu_file_lock */
> > >      QEMUFile *to_dst_file;
> > > +    /* Postcopy specific transfer channel */
> > > +    QEMUFile *postcopy_qemufile_src;
> > >      QIOChannelBuffer *bioc;
> > >      /*
> > >       * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index 738cc55fa6..30eddaacd1 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -32,6 +32,9 @@
> > >  #include "trace.h"
> > >  #include "hw/boards.h"
> > >  #include "exec/ramblock.h"
> > > +#include "socket.h"
> > > +#include "qemu-file-channel.h"
> > > +#include "yank_functions.h"
> > >  
> > >  /* Arbitrary limit on size of each discard command,
> > >   * keeps them around ~200 bytes
> > > @@ -566,6 +569,11 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> > >  {
> > >      trace_postcopy_ram_incoming_cleanup_entry();
> > >  
> > > +    if (mis->postcopy_prio_thread_created) {
> > > +        qemu_thread_join(&mis->postcopy_prio_thread);
> > > +        mis->postcopy_prio_thread_created = false;
> > > +    }
> > > +
> > >      if (mis->have_fault_thread) {
> > >          Error *local_err = NULL;
> > >  
> > > @@ -1101,8 +1109,13 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
> > >      int err, i, channels;
> > >      void *temp_page;
> > >  
> > > -    /* TODO: will be boosted when enable postcopy preemption */
> > > -    mis->postcopy_channels = 1;
> > > +    if (migrate_postcopy_preempt()) {
> > > +        /* If preemption enabled, need extra channel for urgent requests */
> > > +        mis->postcopy_channels = RAM_CHANNEL_MAX;
> > > +    } else {
> > > +        /* Both precopy/postcopy on the same channel */
> > > +        mis->postcopy_channels = 1;
> > > +    }
> > >  
> > >      channels = mis->postcopy_channels;
> > >      mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
> > > @@ -1169,7 +1182,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
> > >          return -1;
> > >      }
> > >  
> > > -    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
> > > +    postcopy_thread_create(mis, &mis->fault_thread, "qemu/fault-default",
> > 
> > That name is still too long; I'd lose the 'qemu/'
> 
> Sure.
> 
> > 
> > >                             postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
> > >      mis->have_fault_thread = true;
> > >  
> > > @@ -1184,6 +1197,16 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
> > >          return -1;
> > >      }
> > >  
> > > +    if (migrate_postcopy_preempt()) {
> > > +        /*
> > > +         * This thread needs to be created after the temp pages because it'll fetch
> > > +         * RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
> > > +         */
> > > +        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "qemu/fault-fast",
> > 
> > and that one.
> 
> OK.
> 
> > 
> > > +                               postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
> > > +        mis->postcopy_prio_thread_created = true;
> > > +    }
> > > +
> > >      trace_postcopy_ram_enable_notify();
> > >  
> > >      return 0;
> > > @@ -1513,3 +1536,62 @@ void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
> > >          }
> > >      }
> > >  }
> > > +
> > > +bool postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
> > > +{
> > > +    mis->postcopy_qemufile_dst = file;
> > > +
> > > +    trace_postcopy_preempt_new_channel();
> > > +
> > > +    /* Start the migration immediately */
> > > +    return true;
> > > +}
> > > +
> > > +int postcopy_preempt_setup(MigrationState *s, Error **errp)
> > > +{
> > > +    QIOChannel *ioc;
> > > +
> > > +    if (!migrate_postcopy_preempt()) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    if (!migrate_multi_channels_is_allowed()) {
> > > +        error_setg(errp, "Postcopy preempt is not supported as current "
> > > +                   "migration stream does not support multi-channels.");
> > > +        return -1;
> > > +    }
> > > +
> > > +    ioc = socket_send_channel_create_sync(errp);
> > 
> > How do we get away with doing this sync here, but have to use async for
> > multifd?
> 
> Ah right..  I think both will work?  But async (as what multifd is doing)
> is definitely more elegant because synced version can block the main thread
> for unexpected times.

Right, that was my worry.

Dave

> I'll add a new patch to support async channel creations (assuming that's
> better than squashing the change).
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 18/20] migration: Postcopy preemption enablement
  2022-02-16  6:28 ` [PATCH 18/20] migration: Postcopy preemption enablement Peter Xu
@ 2022-02-22 10:52   ` Dr. David Alan Gilbert
  2022-02-23  7:01     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-22 10:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> This patch enables postcopy-preempt feature.
> 
> It contains two major changes to the migration logic:
> 
> (1) Postcopy requests are now sent via a different socket from precopy
>     background migration stream, so as to be isolated from very high page
>     request delays.
> 
> (2) For huge page enabled hosts: when there's postcopy requests, they can now
>     intercept a partial sending of huge host pages on src QEMU.
> 
> After this patch, we'll live migrate a VM with two channels for postcopy: (1)
> PRECOPY channel, which is the default channel that transfers background pages;
> and (2) POSTCOPY channel, which only transfers requested pages.
> 
> There's no strict rule of which channel to use, e.g., if a requested page is
> already being transferred on precopy channel, then we will keep using the same
> precopy channel to transfer the page even if it's explicitly requested.  In 99%
> of the cases we'll prioritize the channels so we send requested page via the
> postcopy channel as long as possible.
> 
> On the source QEMU, when we found a postcopy request, we'll interrupt the
> PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
> After we serviced all the high priority postcopy pages, we'll switch back to
> PRECOPY channel so that we'll continue to send the interrupted huge page again.
> There's no new thread introduced on src QEMU.
> 
> On the destination QEMU, one new thread is introduced to receive page data from
> the postcopy specific socket (done in the preparation patch).
> 
> This patch has a side effect: after sending postcopy pages, previously we'll
> assume the guest will access follow up pages so we'll keep sending from there.
> Now it's changed.  Instead of going on with a postcopy requested page, we'll go
> back and continue sending the precopy huge page (which can be intercepted by a
> postcopy request so the huge page can be sent partially before).
> 
> Whether that's a problem is debatable, because "assuming the guest will
> continue to access the next page" may not really suite when huge pages are
> used, especially if the huge page is large (e.g. 1GB pages).  So that locality
> hint is much meaningless if huge pages are used.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>


This does get a bit complicated, which worries me a bit; the code here
is already quite complicated.
(If you repost, there are a few 'channel' variables that could probably
be 'unsigned' rather than int)

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c  |   2 +
>  migration/migration.h  |   2 +-
>  migration/ram.c        | 247 +++++++++++++++++++++++++++++++++++++++--
>  migration/trace-events |   7 ++
>  4 files changed, 249 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 3d7f897b72..d20db04097 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3153,6 +3153,8 @@ static int postcopy_start(MigrationState *ms)
>                                MIGRATION_STATUS_FAILED);
>      }
>  
> +    trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
> +
>      return ret;
>  
>  fail_closefb:
> diff --git a/migration/migration.h b/migration/migration.h
> index caa910d956..b8aacfe3af 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -68,7 +68,7 @@ typedef struct {
>  struct MigrationIncomingState {
>      QEMUFile *from_src_file;
>      /* Previously received RAM's RAMBlock pointer */
> -    RAMBlock *last_recv_block;
> +    RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
>      /* A hook to allow cleanup at the end of incoming migration */
>      void *transport_data;
>      void (*transport_cleanup)(void *data);
> diff --git a/migration/ram.c b/migration/ram.c
> index 36e3da6bb0..ffbe9a9a50 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -294,6 +294,20 @@ struct RAMSrcPageRequest {
>      QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
>  };
>  
> +typedef struct {
> +    /*
> +     * Cached ramblock/offset values if preempted.  They're only meaningful if
> +     * preempted==true below.
> +     */
> +    RAMBlock *ram_block;
> +    unsigned long ram_page;
> +    /*
> +     * Whether a postcopy preemption just happened.  Will be reset after
> +     * precopy recovered to background migration.
> +     */
> +    bool preempted;
> +} PostcopyPreemptState;
> +
>  /* State of RAM for migration */
>  struct RAMState {
>      /* QEMUFile used for this migration */
> @@ -348,6 +362,14 @@ struct RAMState {
>      /* Queue of outstanding page requests from the destination */
>      QemuMutex src_page_req_mutex;
>      QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
> +
> +    /* Postcopy preemption informations */
> +    PostcopyPreemptState postcopy_preempt_state;
> +    /*
> +     * Current channel we're using on src VM.  Only valid if postcopy-preempt
> +     * is enabled.
> +     */
> +    int postcopy_channel;
>  };
>  typedef struct RAMState RAMState;
>  
> @@ -355,6 +377,11 @@ static RAMState *ram_state;
>  
>  static NotifierWithReturnList precopy_notifier_list;
>  
> +static void postcopy_preempt_reset(RAMState *rs)
> +{
> +    memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
> +}
> +
>  /* Whether postcopy has queued requests? */
>  static bool postcopy_has_request(RAMState *rs)
>  {
> @@ -1946,6 +1973,55 @@ void ram_write_tracking_stop(void)
>  }
>  #endif /* defined(__linux__) */
>  
> +/*
> + * Check whether two addr/offset of the ramblock falls onto the same host huge
> + * page.  Returns true if so, false otherwise.
> + */
> +static bool offset_on_same_huge_page(RAMBlock *rb, uint64_t addr1,
> +                                     uint64_t addr2)
> +{
> +    size_t page_size = qemu_ram_pagesize(rb);
> +
> +    addr1 = ROUND_DOWN(addr1, page_size);
> +    addr2 = ROUND_DOWN(addr2, page_size);
> +
> +    return addr1 == addr2;
> +}
> +
> +/*
> + * Whether a previous preempted precopy huge page contains current requested
> + * page?  Returns true if so, false otherwise.
> + *
> + * This should really happen very rarely, because it means when we were sending
> + * during background migration for postcopy we're sending exactly the page that
> + * some vcpu got faulted on on dest node.  When it happens, we probably don't
> + * need to do much but drop the request, because we know right after we restore
> + * the precopy stream it'll be serviced.  It'll slightly affect the order of
> + * postcopy requests to be serviced (e.g. it'll be the same as we move current
> + * request to the end of the queue) but it shouldn't be a big deal.  The most
> + * imporant thing is we can _never_ try to send a partial-sent huge page on the
> + * POSTCOPY channel again, otherwise that huge page will got "split brain" on
> + * two channels (PRECOPY, POSTCOPY).
> + */
> +static bool postcopy_preempted_contains(RAMState *rs, RAMBlock *block,
> +                                        ram_addr_t offset)
> +{
> +    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
> +
> +    /* No preemption at all? */
> +    if (!state->preempted) {
> +        return false;
> +    }
> +
> +    /* Not even the same ramblock? */
> +    if (state->ram_block != block) {
> +        return false;
> +    }
> +
> +    return offset_on_same_huge_page(block, offset,
> +                                    state->ram_page << TARGET_PAGE_BITS);
> +}
> +
>  /**
>   * get_queued_page: unqueue a page from the postcopy requests
>   *
> @@ -1961,9 +2037,17 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>      RAMBlock  *block;
>      ram_addr_t offset;
>  
> +again:
>      block = unqueue_page(rs, &offset);
>  
> -    if (!block) {
> +    if (block) {
> +        /* See comment above postcopy_preempted_contains() */
> +        if (postcopy_preempted_contains(rs, block, offset)) {
> +            trace_postcopy_preempt_hit(block->idstr, offset);
> +            /* This request is dropped */
> +            goto again;
> +        }
> +    } else {
>          /*
>           * Poll write faults too if background snapshot is enabled; that's
>           * when we have vcpus got blocked by the write protected pages.
> @@ -2179,6 +2263,114 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
>      return ram_save_page(rs, pss);
>  }
>  
> +static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
> +{
> +    /* Not enabled eager preempt?  Then never do that. */
> +    if (!migrate_postcopy_preempt()) {
> +        return false;
> +    }
> +
> +    /* If the ramblock we're sending is a small page?  Never bother. */
> +    if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
> +        return false;
> +    }
> +
> +    /* Not in postcopy at all? */
> +    if (!migration_in_postcopy()) {
> +        return false;
> +    }
> +
> +    /*
> +     * If we're already handling a postcopy request, don't preempt as this page
> +     * has got the same high priority.
> +     */
> +    if (pss->postcopy_requested) {
> +        return false;
> +    }
> +
> +    /* If there's postcopy requests, then check it up! */
> +    return postcopy_has_request(rs);
> +}
> +
> +/* Returns true if we preempted precopy, false otherwise */
> +static void postcopy_do_preempt(RAMState *rs, PageSearchStatus *pss)
> +{
> +    PostcopyPreemptState *p_state = &rs->postcopy_preempt_state;
> +
> +    trace_postcopy_preempt_triggered(pss->block->idstr, pss->page);
> +
> +    /*
> +     * Time to preempt precopy. Cache current PSS into preempt state, so that
> +     * after handling the postcopy pages we can recover to it.  We need to do
> +     * so because the dest VM will have partial of the precopy huge page kept
> +     * over in its tmp huge page caches; better move on with it when we can.
> +     */
> +    p_state->ram_block = pss->block;
> +    p_state->ram_page = pss->page;
> +    p_state->preempted = true;
> +}
> +
> +/* Whether we're preempted by a postcopy request during sending a huge page */
> +static bool postcopy_preempt_triggered(RAMState *rs)
> +{
> +    return rs->postcopy_preempt_state.preempted;
> +}
> +
> +static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss)
> +{
> +    PostcopyPreemptState *state = &rs->postcopy_preempt_state;
> +
> +    assert(state->preempted);
> +
> +    pss->block = state->ram_block;
> +    pss->page = state->ram_page;
> +    /* This is not a postcopy request but restoring previous precopy */
> +    pss->postcopy_requested = false;
> +
> +    trace_postcopy_preempt_restored(pss->block->idstr, pss->page);
> +
> +    /* Reset preempt state, most importantly, set preempted==false */
> +    postcopy_preempt_reset(rs);
> +}
> +
> +static void postcopy_preempt_choose_channel(RAMState *rs, PageSearchStatus *pss)
> +{
> +    int channel = pss->postcopy_requested ? RAM_CHANNEL_POSTCOPY : RAM_CHANNEL_PRECOPY;
> +    MigrationState *s = migrate_get_current();
> +    QEMUFile *next;
> +
> +    if (channel != rs->postcopy_channel) {
> +        if (channel == RAM_CHANNEL_PRECOPY) {
> +            next = s->to_dst_file;
> +        } else {
> +            next = s->postcopy_qemufile_src;
> +        }
> +        /* Update and cache the current channel */
> +        rs->f = next;
> +        rs->postcopy_channel = channel;
> +
> +        /*
> +         * If channel switched, reset last_sent_block since the old sent block
> +         * may not be on the same channel.
> +         */
> +        rs->last_sent_block = NULL;
> +
> +        trace_postcopy_preempt_switch_channel(channel);
> +    }
> +
> +    trace_postcopy_preempt_send_host_page(pss->block->idstr, pss->page);
> +}
> +
> +/* We need to make sure rs->f always points to the default channel elsewhere */
> +static void postcopy_preempt_reset_channel(RAMState *rs)
> +{
> +    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
> +        rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
> +        rs->f = migrate_get_current()->to_dst_file;
> +        trace_postcopy_preempt_reset_channel();
> +    }
> +}
> +
>  /**
>   * ram_save_host_page: save a whole host page
>   *
> @@ -2210,7 +2402,16 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>          return 0;
>      }
>  
> +    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
> +        postcopy_preempt_choose_channel(rs, pss);
> +    }
> +
>      do {
> +        if (postcopy_needs_preempt(rs, pss)) {
> +            postcopy_do_preempt(rs, pss);
> +            break;
> +        }
> +
>          /* Check the pages is dirty and if it is send it */
>          if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>              tmppages = ram_save_target_page(rs, pss);
> @@ -2234,6 +2435,19 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
>      /* The offset we leave with is the min boundary of host page and block */
>      pss->page = MIN(pss->page, hostpage_boundary);
>  
> +    /*
> +     * When with postcopy preempt mode, flush the data as soon as possible for
> +     * postcopy requests, because we've already sent a whole huge page, so the
> +     * dst node should already have enough resource to atomically filling in
> +     * the current missing page.
> +     *
> +     * More importantly, when using separate postcopy channel, we must do
> +     * explicit flush or it won't flush until the buffer is full.
> +     */
> +    if (migrate_postcopy_preempt() && pss->postcopy_requested) {
> +        qemu_fflush(rs->f);
> +    }
> +
>      res = ram_save_release_protection(rs, pss, start_page);
>      return (res < 0 ? res : pages);
>  }
> @@ -2275,8 +2489,17 @@ static int ram_find_and_save_block(RAMState *rs)
>          found = get_queued_page(rs, &pss);
>  
>          if (!found) {
> -            /* priority queue empty, so just search for something dirty */
> -            found = find_dirty_block(rs, &pss, &again);
> +            /*
> +             * Recover previous precopy ramblock/offset if postcopy has
> +             * preempted precopy.  Otherwise find the next dirty bit.
> +             */
> +            if (postcopy_preempt_triggered(rs)) {
> +                postcopy_preempt_restore(rs, &pss);
> +                found = true;
> +            } else {
> +                /* priority queue empty, so just search for something dirty */
> +                found = find_dirty_block(rs, &pss, &again);
> +            }
>          }
>  
>          if (found) {
> @@ -2404,6 +2627,8 @@ static void ram_state_reset(RAMState *rs)
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->xbzrle_enabled = false;
> +    postcopy_preempt_reset(rs);
> +    rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -3042,6 +3267,8 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>      }
>      qemu_mutex_unlock(&rs->bitmap_mutex);
>  
> +    postcopy_preempt_reset_channel(rs);
> +
>      /*
>       * Must occur before EOS (or any QEMUFile operation)
>       * because of RDMA protocol.
> @@ -3111,6 +3338,8 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          ram_control_after_iterate(f, RAM_CONTROL_FINISH);
>      }
>  
> +    postcopy_preempt_reset_channel(rs);
> +
>      if (ret >= 0) {
>          multifd_send_sync_main(rs->f);
>          qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> @@ -3193,11 +3422,13 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
>   * @mis: the migration incoming state pointer
>   * @f: QEMUFile where to read the data from
>   * @flags: Page flags (mostly to see if it's a continuation of previous block)
> + * @channel: the channel we're using
>   */
>  static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
> -                                              QEMUFile *f, int flags)
> +                                              QEMUFile *f, int flags,
> +                                              int channel)
>  {
> -    RAMBlock *block = mis->last_recv_block;
> +    RAMBlock *block = mis->last_recv_block[channel];
>      char id[256];
>      uint8_t len;
>  
> @@ -3224,7 +3455,7 @@ static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
>          return NULL;
>      }
>  
> -    mis->last_recv_block = block;
> +    mis->last_recv_block[channel] = block;
>  
>      return block;
>  }
> @@ -3678,7 +3909,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
>          trace_ram_load_postcopy_loop((uint64_t)addr, flags);
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE)) {
> -            block = ram_block_from_stream(mis, f, flags);
> +            block = ram_block_from_stream(mis, f, flags, channel);
>              if (!block) {
>                  ret = -EINVAL;
>                  break;
> @@ -3929,7 +4160,7 @@ static int ram_load_precopy(QEMUFile *f)
>  
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> -            RAMBlock *block = ram_block_from_stream(mis, f, flags);
> +            RAMBlock *block = ram_block_from_stream(mis, f, flags, RAM_CHANNEL_PRECOPY);
>  
>              host = host_from_ram_block_offset(block, addr);
>              /*
> diff --git a/migration/trace-events b/migration/trace-events
> index 4474a76614..b1155d9da0 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -111,6 +111,12 @@ ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRI
>  ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
>  ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu"
>  unqueue_page(char *block, uint64_t offset, bool dirty) "ramblock '%s' offset 0x%"PRIx64" dirty %d"
> +postcopy_preempt_triggered(char *str, unsigned long page) "during sending ramblock %s offset 0x%lx"
> +postcopy_preempt_restored(char *str, unsigned long page) "ramblock %s offset 0x%lx"
> +postcopy_preempt_hit(char *str, uint64_t offset) "ramblock %s offset 0x%"PRIx64
> +postcopy_preempt_send_host_page(char *str, uint64_t offset) "ramblock %s offset 0x%"PRIx64
> +postcopy_preempt_switch_channel(int channel) "%d"
> +postcopy_preempt_reset_channel(void) ""
>  
>  # multifd.c
>  multifd_new_send_channel_async(uint8_t id) "channel %u"
> @@ -176,6 +182,7 @@ migration_thread_low_pending(uint64_t pending) "%" PRIu64
>  migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
>  process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
> +postcopy_preempt_enabled(bool value) "%d"
>  
>  # channel.c
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover()
  2022-02-16  6:28 ` [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover() Peter Xu
@ 2022-02-22 10:57   ` Dr. David Alan Gilbert
  2022-02-23  6:40     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-22 10:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We used to use postcopy_try_recover() to replace migration_incoming_setup() to
> setup incoming channels.  That's fine for the old world, but in the new world
> there can be more than one channels that need setup.  Better move the channel
> setup out of it so that postcopy_try_recover() only handles the last phase of
> switching to the recovery phase.
> 
> To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
> call to be after migration_incoming_setup(), which will setup the channels.
> While in migration_ioc_process_incoming(), postpone the recover() routine right
> before we'll jump into migration_incoming_process().
> 
> A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
> anymore.  Remove it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

OK, but note one question below:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 67520d3105..b2e6446457 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -665,19 +665,20 @@ void migration_incoming_process(void)
>  }
>  
>  /* Returns true if recovered from a paused migration, otherwise false */
> -static bool postcopy_try_recover(QEMUFile *f)
> +static bool postcopy_try_recover(void)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
>          /* Resumed from a paused postcopy migration */
>  
> -        mis->from_src_file = f;
> +        /* This should be set already in migration_incoming_setup() */
> +        assert(mis->from_src_file);
>          /* Postcopy has standalone thread to do vm load */
> -        qemu_file_set_blocking(f, true);
> +        qemu_file_set_blocking(mis->from_src_file, true);

Does that set_blocking happen on the 2nd channel somewhere?

Dave

>  
>          /* Re-configure the return path */
> -        mis->to_src_file = qemu_file_get_return_path(f);
> +        mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
>  
>          migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
>                            MIGRATION_STATUS_POSTCOPY_RECOVER);
> @@ -698,11 +699,10 @@ static bool postcopy_try_recover(QEMUFile *f)
>  
>  void migration_fd_process_incoming(QEMUFile *f, Error **errp)
>  {
> -    if (postcopy_try_recover(f)) {
> +    if (!migration_incoming_setup(f, errp)) {
>          return;
>      }
> -
> -    if (!migration_incoming_setup(f, errp)) {
> +    if (postcopy_try_recover()) {
>          return;
>      }
>      migration_incoming_process();
> @@ -718,11 +718,6 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>          /* The first connection (multifd may have multiple) */
>          QEMUFile *f = qemu_fopen_channel_input(ioc);
>  
> -        /* If it's a recovery, we're done */
> -        if (postcopy_try_recover(f)) {
> -            return;
> -        }
> -
>          if (!migration_incoming_setup(f, errp)) {
>              return;
>          }
> @@ -743,6 +738,10 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>      }
>  
>      if (start_migration) {
> +        /* If it's a recovery, we're done */
> +        if (postcopy_try_recover()) {
> +            return;
> +        }
>          migration_incoming_process();
>      }
>  }
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-16  6:28 ` [PATCH 19/20] migration: Postcopy recover with preempt enabled Peter Xu
@ 2022-02-22 11:32   ` Dr. David Alan Gilbert
  2022-02-23  7:45     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-22 11:32 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> instead of stopping the thread it halts with a semaphore, preparing to be
> kicked again when recovery is detected.
> 
> A mutex is introduced to make sure there's no concurrent operation upon the
> socket.  To make it simple, the fast ram load thread will take the mutex during
> its whole procedure, and only release it if it's paused.  The fast-path socket
> will be properly released by the main loading thread safely when there's
> network failures during postcopy with that mutex held.

I *think* this is mostly OK; but I worry I don't understand all the
cases; e.g.
  a) If the postcopy channel errors first
  b) If the main channel errors first

Can you add some docs to walk through those and explain the locking ?

Dave

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c    | 17 +++++++++++++++--
>  migration/migration.h    |  3 +++
>  migration/postcopy-ram.c | 24 ++++++++++++++++++++++--
>  migration/savevm.c       | 17 +++++++++++++++++
>  migration/trace-events   |  2 ++
>  5 files changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index d20db04097..c68a281406 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -215,9 +215,11 @@ void migration_object_init(void)
>      current_incoming->postcopy_remote_fds =
>          g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
>      qemu_mutex_init(&current_incoming->rp_mutex);
> +    qemu_mutex_init(&current_incoming->postcopy_prio_thread_mutex);
>      qemu_event_init(&current_incoming->main_thread_load_event, false);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
> +    qemu_sem_init(&current_incoming->postcopy_pause_sem_fast_load, 0);
>      qemu_mutex_init(&current_incoming->page_request_mutex);
>      current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>  
> @@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
>  
>          /*
>           * Here, we only wake up the main loading thread (while the
> -         * fault thread will still be waiting), so that we can receive
> +         * rest threads will still be waiting), so that we can receive
>           * commands from source now, and answer it if needed. The
> -         * fault thread will be woken up afterwards until we are sure
> +         * rest threads will be woken up afterwards until we are sure
>           * that source is ready to reply to page requests.
>           */
>          qemu_sem_post(&mis->postcopy_pause_sem_dst);
> @@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
>          qemu_file_shutdown(file);
>          qemu_fclose(file);
>  
> +        /*
> +         * Do the same to postcopy fast path socket too if there is.  No
> +         * locking needed because no racer as long as we do this before setting
> +         * status to paused.
> +         */
> +        if (s->postcopy_qemufile_src) {
> +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);

Shouldn't this do a qemu_file_shutdown on here first?

> +            qemu_fclose(s->postcopy_qemufile_src);
> +            s->postcopy_qemufile_src = NULL;
> +        }
> +
>          migrate_set_state(&s->state, s->state,
>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index b8aacfe3af..945088064a 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -118,6 +118,8 @@ struct MigrationIncomingState {
>      /* Postcopy priority thread is used to receive postcopy requested pages */
>      QemuThread postcopy_prio_thread;
>      bool postcopy_prio_thread_created;
> +    /* Used to sync with the prio thread */
> +    QemuMutex postcopy_prio_thread_mutex;
>      /*
>       * An array of temp host huge pages to be used, one for each postcopy
>       * channel.
> @@ -147,6 +149,7 @@ struct MigrationIncomingState {
>      /* notify PAUSED postcopy incoming migrations to try to continue */
>      QemuSemaphore postcopy_pause_sem_dst;
>      QemuSemaphore postcopy_pause_sem_fault;
> +    QemuSemaphore postcopy_pause_sem_fast_load;
>  
>      /* List of listening socket addresses  */
>      SocketAddressList *socket_address_list;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 30eddaacd1..b3d23804bc 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1575,6 +1575,15 @@ int postcopy_preempt_setup(MigrationState *s, Error **errp)
>      return 0;
>  }
>  
> +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_fast_load();
> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
> +    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
> +    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +    trace_postcopy_pause_fast_load_continued();
> +}
> +
>  void *postcopy_preempt_thread(void *opaque)
>  {
>      MigrationIncomingState *mis = opaque;
> @@ -1587,11 +1596,22 @@ void *postcopy_preempt_thread(void *opaque)
>      qemu_sem_post(&mis->thread_sync_sem);
>  
>      /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
> -    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
> +    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +    while (1) {
> +        ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
> +        /* If error happened, go into recovery routine */
> +        if (ret) {
> +            postcopy_pause_ram_fast_load(mis);
> +        } else {
> +            /* We're done */
> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
>  
>      rcu_unregister_thread();
>  
>      trace_postcopy_preempt_thread_exit();
>  
> -    return ret == 0 ? NULL : (void *)-1;
> +    return NULL;
>  }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 254aa78234..2d32340d28 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2152,6 +2152,13 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>       */
>      qemu_sem_post(&mis->postcopy_pause_sem_fault);
>  
> +    if (migrate_postcopy_preempt()) {
> +        /* The channel should already be setup again; make sure of it */
> +        assert(mis->postcopy_qemufile_dst);
> +        /* Kick the fast ram load thread too */
> +        qemu_sem_post(&mis->postcopy_pause_sem_fast_load);
> +    }
> +
>      return 0;
>  }
>  
> @@ -2607,6 +2614,16 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>      mis->to_src_file = NULL;
>      qemu_mutex_unlock(&mis->rp_mutex);
>  
> +    if (mis->postcopy_qemufile_dst) {
> +        qemu_file_shutdown(mis->postcopy_qemufile_dst);
> +        /* Take the mutex to make sure the fast ram load thread halted */
> +        qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
> +        qemu_fclose(mis->postcopy_qemufile_dst);
> +        mis->postcopy_qemufile_dst = NULL;
> +        qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
> +    }
> +
>      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                        MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index b1155d9da0..dfe36a3340 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, i
>  mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
>  postcopy_pause_fault_thread(void) ""
>  postcopy_pause_fault_thread_continued(void) ""
> +postcopy_pause_fast_load(void) ""
> +postcopy_pause_fast_load_continued(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 20/20] tests: Add postcopy preempt test
  2022-02-16  6:28 ` [PATCH 20/20] tests: Add postcopy preempt test Peter Xu
@ 2022-02-22 12:51   ` Dr. David Alan Gilbert
  2022-02-23  7:50     ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-22 12:51 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Two tests are added: a normal postcopy preempt test, and a recovery test.

Yes, this is difficult; without hugepages the tests are limited; did you
see if this test actually causes pages to go down the fast path?



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 39 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7b42f6fd90..5053b40589 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -470,6 +470,7 @@ typedef struct {
>       */
>      bool hide_stderr;
>      bool use_shmem;
> +    bool postcopy_preempt;
>      /* only launch the target process */
>      bool only_target;
>      /* Use dirty ring if true; dirty logging otherwise */
> @@ -673,6 +674,11 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
>      migrate_set_capability(to, "postcopy-ram", true);
>      migrate_set_capability(to, "postcopy-blocktime", true);
>  
> +    if (args->postcopy_preempt) {
> +        migrate_set_capability(from, "postcopy-preempt", true);
> +        migrate_set_capability(to, "postcopy-preempt", true);
> +    }
> +
>      /* We want to pick a speed slow enough that the test completes
>       * quickly, but that it doesn't complete precopy even on a slow
>       * machine, so also set the downtime.
> @@ -719,13 +725,29 @@ static void test_postcopy(void)
>      migrate_postcopy_complete(from, to);
>  }
>  
> -static void test_postcopy_recovery(void)
> +static void test_postcopy_preempt(void)
> +{
> +    MigrateStart *args = migrate_start_new();
> +    QTestState *from, *to;
> +
> +    args->postcopy_preempt = true;
> +
> +    if (migrate_postcopy_prepare(&from, &to, args)) {
> +        return;
> +    }
> +    migrate_postcopy_start(from, to);
> +    migrate_postcopy_complete(from, to);
> +}
> +
> +/* @preempt: whether to use postcopy-preempt */
> +static void test_postcopy_recovery(bool preempt)
>  {
>      MigrateStart *args = migrate_start_new();
>      QTestState *from, *to;
>      g_autofree char *uri = NULL;
>  
>      args->hide_stderr = true;
> +    args->postcopy_preempt = preempt;
>  
>      if (migrate_postcopy_prepare(&from, &to, args)) {
>          return;
> @@ -781,6 +803,16 @@ static void test_postcopy_recovery(void)
>      migrate_postcopy_complete(from, to);
>  }
>  
> +static void test_postcopy_recovery_normal(void)
> +{
> +    test_postcopy_recovery(false);
> +}
> +
> +static void test_postcopy_recovery_preempt(void)
> +{
> +    test_postcopy_recovery(true);
> +}
> +
>  static void test_baddest(void)
>  {
>      MigrateStart *args = migrate_start_new();
> @@ -1458,7 +1490,10 @@ int main(int argc, char **argv)
>      module_call_init(MODULE_INIT_QOM);
>  
>      qtest_add_func("/migration/postcopy/unix", test_postcopy);
> -    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
> +    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery_normal);
> +    qtest_add_func("/migration/postcopy/preempt/unix", test_postcopy_preempt);
> +    qtest_add_func("/migration/postcopy/preempt/recovery",
> +                   test_postcopy_recovery_preempt);
>      qtest_add_func("/migration/bad_dest", test_baddest);
>      qtest_add_func("/migration/precopy/unix", test_precopy_unix);
>      qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover()
  2022-02-22 10:57   ` Dr. David Alan Gilbert
@ 2022-02-23  6:40     ` Peter Xu
  2022-02-23  9:47       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-23  6:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Tue, Feb 22, 2022 at 10:57:34AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > We used to use postcopy_try_recover() to replace migration_incoming_setup() to
> > setup incoming channels.  That's fine for the old world, but in the new world
> > there can be more than one channels that need setup.  Better move the channel
> > setup out of it so that postcopy_try_recover() only handles the last phase of
> > switching to the recovery phase.
> > 
> > To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
> > call to be after migration_incoming_setup(), which will setup the channels.
> > While in migration_ioc_process_incoming(), postpone the recover() routine right
> > before we'll jump into migration_incoming_process().
> > 
> > A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
> > anymore.  Remove it.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> OK, but note one question below:
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks.

> 
> > ---
> >  migration/migration.c | 23 +++++++++++------------
> >  1 file changed, 11 insertions(+), 12 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 67520d3105..b2e6446457 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -665,19 +665,20 @@ void migration_incoming_process(void)
> >  }
> >  
> >  /* Returns true if recovered from a paused migration, otherwise false */
> > -static bool postcopy_try_recover(QEMUFile *f)
> > +static bool postcopy_try_recover(void)
> >  {
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> >  
> >      if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> >          /* Resumed from a paused postcopy migration */
> >  
> > -        mis->from_src_file = f;
> > +        /* This should be set already in migration_incoming_setup() */
> > +        assert(mis->from_src_file);
> >          /* Postcopy has standalone thread to do vm load */
> > -        qemu_file_set_blocking(f, true);
> > +        qemu_file_set_blocking(mis->from_src_file, true);
> 
> Does that set_blocking happen on the 2nd channel somewhere?

Nop.  I think the rational is that by default all channels are blocking.

Then what happened is: migration code only sets the main channel to
non-blocking on incoming, that's in migration_incoming_setup().  Hence for
postcopy recovery we need to tweak it to blocking here.

The 2nd new channel is not operated by migration_incoming_setup(), but by
postcopy_preempt_new_channel(), so it keeps the original blocking state,
which should be blocking.

If we want to make that clear, we can proactively set non-blocking too in
postcopy_preempt_new_channel() on the 2nd channel.  It's just that it
should be optional as long as blocking is the default for any new fd of a
socket.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 18/20] migration: Postcopy preemption enablement
  2022-02-22 10:52   ` Dr. David Alan Gilbert
@ 2022-02-23  7:01     ` Peter Xu
  2022-02-23  9:56       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-23  7:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Tue, Feb 22, 2022 at 10:52:23AM +0000, Dr. David Alan Gilbert wrote:
> This does get a bit complicated, which worries me a bit; the code here
> is already quite complicated.

Right, it's the way I chose in this patchset on solving this problem.  Not
sure whether there's any better and easier way.

For example, we could have used a new thread to send requested pages, and
synchronize it with the main thread.  But that'll need other kind of
complexity, and I can't quickly tell whether that'll be better.

For this single patch, more than half of the complexity comes from the
ability to interrupt sending one huge page half-way.  It's a bit of a pity
that, that part will be noop ultimately when with doublemap.

However I kept those only because we don't know when doublemap will be
ready, not to say, landing.  Meanwhile we can't assume all kernels will
have doublemap even in the future.

> (If you repost, there are a few 'channel' variables that could probably
> be 'unsigned' rather than int)

That I can do for sure.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-22 11:32   ` Dr. David Alan Gilbert
@ 2022-02-23  7:45     ` Peter Xu
  2022-02-23  9:52       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-23  7:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Tue, Feb 22, 2022 at 11:32:10AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> > needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> > instead of stopping the thread it halts with a semaphore, preparing to be
> > kicked again when recovery is detected.
> > 
> > A mutex is introduced to make sure there's no concurrent operation upon the
> > socket.  To make it simple, the fast ram load thread will take the mutex during
> > its whole procedure, and only release it if it's paused.  The fast-path socket
> > will be properly released by the main loading thread safely when there's
> > network failures during postcopy with that mutex held.
> 
> I *think* this is mostly OK; but I worry I don't understand all the
> cases; e.g.
>   a) If the postcopy channel errors first
>   b) If the main channel errors first

Ah right, I don't think I handled all the cases.  Sorry.

We always check the main channel, but if the postcopy channel got faulted,
we may not fall into paused mode as expected.

I'll fix that up.

> 
> Can you add some docs to walk through those and explain the locking ?

Sure.

The sem is mentioned in the last sentence of paragraph 1, where it's purely
used for a way to yield the fast ram load thread so that when something
wrong happens it can sleep on that semaphore.  Then when we recover we'll
post to the semaphore to kick it up.  We used it like that in many places,
e.g. postcopy_pause_sem_dst to yield the main load thread.

The 2nd paragraph above was for explaining why we need the mutex; it's
basically the same as rp_mutex protecting to_src_file, so that we won't
accidentally close() the qemufile during some other thread using it.  So
the fast ram load thread needs to take that new mutex for mostly the whole
lifecycle of itself (because it's loading from that qemufile), meanwhile
only drop the mutex when it prepares to sleep.  Then the main load thread
can recycle the postcopy channel using qemu_fclose() safely.

[...]

> > @@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
> >          qemu_file_shutdown(file);
> >          qemu_fclose(file);
> >  
> > +        /*
> > +         * Do the same to postcopy fast path socket too if there is.  No
> > +         * locking needed because no racer as long as we do this before setting
> > +         * status to paused.
> > +         */
> > +        if (s->postcopy_qemufile_src) {
> > +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> 
> Shouldn't this do a qemu_file_shutdown on here first?

Yes I probably should.

With all above, I plan to squash below changes into this patch:

---8<---
diff --git a/migration/migration.c b/migration/migration.c
index c68a281406..69778cab23 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3475,6 +3475,7 @@ static MigThrError postcopy_pause(MigrationState *s)
          */
         if (s->postcopy_qemufile_src) {
             migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+            qemu_file_shutdown(s->postcopy_qemufile_src);
             qemu_fclose(s->postcopy_qemufile_src);
             s->postcopy_qemufile_src = NULL;
         }
@@ -3534,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState *s)
         return MIG_THR_ERR_FATAL;
     }

-    /* Try to detect any file errors */
-    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
+    /*
+     * Try to detect any file errors.  Note that postcopy_qemufile_src will
+     * be NULL when postcopy preempt is not enabled.
+     */
+    ret = qemu_file_get_error_obj_any(s->to_dst_file,
+                                      s->postcopy_qemufile_src,
+                                      &local_error);
     if (!ret) {
         /* Everything is fine */
         assert(!local_error);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1479cddad9..397652f0ba 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
     return f->last_error;
 }

+/*
+ * Get last error for either stream f1 or f2 with optional Error*.
+ * The error returned (non-zero) can be either from f1 or f2.
+ *
+ * If any of the qemufile* is NULL, then skip the check on that file.
+ *
+ * When there is no error on both qemufile, zero is returned.
+ */
+int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
+{
+    int ret = 0;
+
+    if (f1) {
+        ret = qemu_file_get_error_obj(f1, errp);
+        /* If there's already error detected, return */
+        if (ret) {
+            return ret;
+        }
+    }
+
+    if (f2) {
+        ret = qemu_file_get_error_obj(f2, errp);
+    }
+
+    return ret;
+}
+
 /*
  * Set the last error for stream f with optional Error*
  */
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 3f36d4dc8c..2564e5e1c7 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
+int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
 void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
diff --git a/migration/savevm.c b/migration/savevm.c
index 2d32340d28..24b69a1008 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2651,8 +2651,8 @@ retry:
     while (true) {
         section_type = qemu_get_byte(f);

-        if (qemu_file_get_error(f)) {
-            ret = qemu_file_get_error(f);
+        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
+        if (ret) {
             break;
         }
---8<---

Does it look sane?  Let me know if there's still things missing.

Thanks!

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 20/20] tests: Add postcopy preempt test
  2022-02-22 12:51   ` Dr. David Alan Gilbert
@ 2022-02-23  7:50     ` Peter Xu
  2022-03-01  5:34       ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-23  7:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Tue, Feb 22, 2022 at 12:51:59PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Two tests are added: a normal postcopy preempt test, and a recovery test.
> 
> Yes, this is difficult; without hugepages the tests are limited; did you
> see if this test actually causes pages to go down the fast path?

I didn't observe the test case explicitly, but I did observe in my own test
that I ran that it goes with the fast path, or I can't get a huge speed up.

Meanwhile my own test is only using 2M huge pages, and I can observe
interruptions of huge page sendings frequently.

But yeah let me try to capture something in this test too, at least to make
sure the standalone socket is being used.  Covering of huge pages might be
doable but obviously requires host privileges, so I'll leave that for later.

> 
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover()
  2022-02-23  6:40     ` Peter Xu
@ 2022-02-23  9:47       ` Dr. David Alan Gilbert
  2022-02-23 12:55         ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-23  9:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Feb 22, 2022 at 10:57:34AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > We used to use postcopy_try_recover() to replace migration_incoming_setup() to
> > > setup incoming channels.  That's fine for the old world, but in the new world
> > > there can be more than one channels that need setup.  Better move the channel
> > > setup out of it so that postcopy_try_recover() only handles the last phase of
> > > switching to the recovery phase.
> > > 
> > > To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
> > > call to be after migration_incoming_setup(), which will setup the channels.
> > > While in migration_ioc_process_incoming(), postpone the recover() routine right
> > > before we'll jump into migration_incoming_process().
> > > 
> > > A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
> > > anymore.  Remove it.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > OK, but note one question below:
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks.
> 
> > 
> > > ---
> > >  migration/migration.c | 23 +++++++++++------------
> > >  1 file changed, 11 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 67520d3105..b2e6446457 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -665,19 +665,20 @@ void migration_incoming_process(void)
> > >  }
> > >  
> > >  /* Returns true if recovered from a paused migration, otherwise false */
> > > -static bool postcopy_try_recover(QEMUFile *f)
> > > +static bool postcopy_try_recover(void)
> > >  {
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > >  
> > >      if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > >          /* Resumed from a paused postcopy migration */
> > >  
> > > -        mis->from_src_file = f;
> > > +        /* This should be set already in migration_incoming_setup() */
> > > +        assert(mis->from_src_file);
> > >          /* Postcopy has standalone thread to do vm load */
> > > -        qemu_file_set_blocking(f, true);
> > > +        qemu_file_set_blocking(mis->from_src_file, true);
> > 
> > Does that set_blocking happen on the 2nd channel somewhere?
> 
> Nop.  I think the rational is that by default all channels are blocking.
> 
> Then what happened is: migration code only sets the main channel to
> non-blocking on incoming, that's in migration_incoming_setup().  Hence for
> postcopy recovery we need to tweak it to blocking here.

OK, yes, so the rule seems to be if it's done in it's own thread, we
make it blocking.

> The 2nd new channel is not operated by migration_incoming_setup(), but by
> postcopy_preempt_new_channel(), so it keeps the original blocking state,
> which should be blocking.
> 
> If we want to make that clear, we can proactively set non-blocking too in
> postcopy_preempt_new_channel() on the 2nd channel.  It's just that it
> should be optional as long as blocking is the default for any new fd of a
> socket.

OK, I notice that in 9e4d2b9 made it explicit on the outgoing side.

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-23  7:45     ` Peter Xu
@ 2022-02-23  9:52       ` Dr. David Alan Gilbert
  2022-02-23 13:14         ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-23  9:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Feb 22, 2022 at 11:32:10AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> > > needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> > > instead of stopping the thread it halts with a semaphore, preparing to be
> > > kicked again when recovery is detected.
> > > 
> > > A mutex is introduced to make sure there's no concurrent operation upon the
> > > socket.  To make it simple, the fast ram load thread will take the mutex during
> > > its whole procedure, and only release it if it's paused.  The fast-path socket
> > > will be properly released by the main loading thread safely when there's
> > > network failures during postcopy with that mutex held.
> > 
> > I *think* this is mostly OK; but I worry I don't understand all the
> > cases; e.g.
> >   a) If the postcopy channel errors first
> >   b) If the main channel errors first
> 
> Ah right, I don't think I handled all the cases.  Sorry.
> 
> We always check the main channel, but if the postcopy channel got faulted,
> we may not fall into paused mode as expected.
> 
> I'll fix that up.

Thanks.

> > 
> > Can you add some docs to walk through those and explain the locking ?
> 
> Sure.
> 
> The sem is mentioned in the last sentence of paragraph 1, where it's purely
> used for a way to yield the fast ram load thread so that when something
> wrong happens it can sleep on that semaphore.  Then when we recover we'll
> post to the semaphore to kick it up.  We used it like that in many places,
> e.g. postcopy_pause_sem_dst to yield the main load thread.
> 
> The 2nd paragraph above was for explaining why we need the mutex; it's
> basically the same as rp_mutex protecting to_src_file, so that we won't
> accidentally close() the qemufile during some other thread using it.  So
> the fast ram load thread needs to take that new mutex for mostly the whole
> lifecycle of itself (because it's loading from that qemufile), meanwhile
> only drop the mutex when it prepares to sleep.  Then the main load thread
> can recycle the postcopy channel using qemu_fclose() safely.

Yes, that feels like it needs to go in the code somewhere.

> [...]
> 
> > > @@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
> > >          qemu_file_shutdown(file);
> > >          qemu_fclose(file);
> > >  
> > > +        /*
> > > +         * Do the same to postcopy fast path socket too if there is.  No
> > > +         * locking needed because no racer as long as we do this before setting
> > > +         * status to paused.
> > > +         */
> > > +        if (s->postcopy_qemufile_src) {
> > > +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > 
> > Shouldn't this do a qemu_file_shutdown on here first?
> 
> Yes I probably should.
> 
> With all above, I plan to squash below changes into this patch:
> 
> ---8<---
> diff --git a/migration/migration.c b/migration/migration.c
> index c68a281406..69778cab23 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3475,6 +3475,7 @@ static MigThrError postcopy_pause(MigrationState *s)
>           */
>          if (s->postcopy_qemufile_src) {
>              migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> +            qemu_file_shutdown(s->postcopy_qemufile_src);
>              qemu_fclose(s->postcopy_qemufile_src);
>              s->postcopy_qemufile_src = NULL;
>          }
> @@ -3534,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState *s)
>          return MIG_THR_ERR_FATAL;
>      }
> 
> -    /* Try to detect any file errors */
> -    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> +    /*
> +     * Try to detect any file errors.  Note that postcopy_qemufile_src will
> +     * be NULL when postcopy preempt is not enabled.
> +     */
> +    ret = qemu_file_get_error_obj_any(s->to_dst_file,
> +                                      s->postcopy_qemufile_src,
> +                                      &local_error);
>      if (!ret) {
>          /* Everything is fine */
>          assert(!local_error);
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1479cddad9..397652f0ba 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
>      return f->last_error;
>  }
> 
> +/*
> + * Get last error for either stream f1 or f2 with optional Error*.
> + * The error returned (non-zero) can be either from f1 or f2.
> + *
> + * If any of the qemufile* is NULL, then skip the check on that file.
> + *
> + * When there is no error on both qemufile, zero is returned.
> + */
> +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (f1) {
> +        ret = qemu_file_get_error_obj(f1, errp);
> +        /* If there's already error detected, return */
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
> +    if (f2) {
> +        ret = qemu_file_get_error_obj(f2, errp);
> +    }
> +
> +    return ret;
> +}
> +
>  /*
>   * Set the last error for stream f with optional Error*
>   */
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index 3f36d4dc8c..2564e5e1c7 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
>  void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
>  int64_t qemu_file_get_rate_limit(QEMUFile *f);
>  int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
> +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
>  void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
>  void qemu_file_set_error(QEMUFile *f, int ret);
>  int qemu_file_shutdown(QEMUFile *f);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 2d32340d28..24b69a1008 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2651,8 +2651,8 @@ retry:
>      while (true) {
>          section_type = qemu_get_byte(f);
> 
> -        if (qemu_file_get_error(f)) {
> -            ret = qemu_file_get_error(f);
> +        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
> +        if (ret) {
>              break;
>          }
> ---8<---
> 
> Does it look sane?  Let me know if there's still things missing.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> 
> Thanks!
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 18/20] migration: Postcopy preemption enablement
  2022-02-23  7:01     ` Peter Xu
@ 2022-02-23  9:56       ` Dr. David Alan Gilbert
  2022-02-23 13:05         ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-23  9:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Feb 22, 2022 at 10:52:23AM +0000, Dr. David Alan Gilbert wrote:
> > This does get a bit complicated, which worries me a bit; the code here
> > is already quite complicated.
> 
> Right, it's the way I chose in this patchset on solving this problem.  Not
> sure whether there's any better and easier way.
> 
> For example, we could have used a new thread to send requested pages, and
> synchronize it with the main thread.  But that'll need other kind of
> complexity, and I can't quickly tell whether that'll be better.
> 
> For this single patch, more than half of the complexity comes from the
> ability to interrupt sending one huge page half-way.  It's a bit of a pity
> that, that part will be noop ultimately when with doublemap.

How does that huge-page interruption interact with recovery?
i.e. do we know the start of that hugepage arrived?

> 
> However I kept those only because we don't know when doublemap will be
> ready, not to say, landing.  Meanwhile we can't assume all kernels will
> have doublemap even in the future.

Yeh, if doublemap was already here you could make it a condition of
allowing you to set the option.

Dave

> > (If you repost, there are a few 'channel' variables that could probably
> > be 'unsigned' rather than int)
> 
> That I can do for sure.
> 
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover()
  2022-02-23  9:47       ` Dr. David Alan Gilbert
@ 2022-02-23 12:55         ` Peter Xu
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-23 12:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Wed, Feb 23, 2022 at 09:47:03AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Feb 22, 2022 at 10:57:34AM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > We used to use postcopy_try_recover() to replace migration_incoming_setup() to
> > > > setup incoming channels.  That's fine for the old world, but in the new world
> > > > there can be more than one channels that need setup.  Better move the channel
> > > > setup out of it so that postcopy_try_recover() only handles the last phase of
> > > > switching to the recovery phase.
> > > > 
> > > > To do that in migration_fd_process_incoming(), move the postcopy_try_recover()
> > > > call to be after migration_incoming_setup(), which will setup the channels.
> > > > While in migration_ioc_process_incoming(), postpone the recover() routine right
> > > > before we'll jump into migration_incoming_process().
> > > > 
> > > > A side benefit is we don't need to pass in QEMUFile* to postcopy_try_recover()
> > > > anymore.  Remove it.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > 
> > > OK, but note one question below:
> > > 
> > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > Thanks.
> > 
> > > 
> > > > ---
> > > >  migration/migration.c | 23 +++++++++++------------
> > > >  1 file changed, 11 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 67520d3105..b2e6446457 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -665,19 +665,20 @@ void migration_incoming_process(void)
> > > >  }
> > > >  
> > > >  /* Returns true if recovered from a paused migration, otherwise false */
> > > > -static bool postcopy_try_recover(QEMUFile *f)
> > > > +static bool postcopy_try_recover(void)
> > > >  {
> > > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > >  
> > > >      if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > >          /* Resumed from a paused postcopy migration */
> > > >  
> > > > -        mis->from_src_file = f;
> > > > +        /* This should be set already in migration_incoming_setup() */
> > > > +        assert(mis->from_src_file);
> > > >          /* Postcopy has standalone thread to do vm load */
> > > > -        qemu_file_set_blocking(f, true);
> > > > +        qemu_file_set_blocking(mis->from_src_file, true);
> > > 
> > > Does that set_blocking happen on the 2nd channel somewhere?
> > 
> > Nop.  I think the rational is that by default all channels are blocking.
> > 
> > Then what happened is: migration code only sets the main channel to
> > non-blocking on incoming, that's in migration_incoming_setup().  Hence for
> > postcopy recovery we need to tweak it to blocking here.
> 
> OK, yes, so the rule seems to be if it's done in it's own thread, we
> make it blocking.
> 
> > The 2nd new channel is not operated by migration_incoming_setup(), but by
> > postcopy_preempt_new_channel(), so it keeps the original blocking state,
> > which should be blocking.
> > 
> > If we want to make that clear, we can proactively set non-blocking too in
> > postcopy_preempt_new_channel() on the 2nd channel.  It's just that it
> > should be optional as long as blocking is the default for any new fd of a
> > socket.
> 
> OK, I notice that in 9e4d2b9 made it explicit on the outgoing side.

Indeed, then let me do the same!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 18/20] migration: Postcopy preemption enablement
  2022-02-23  9:56       ` Dr. David Alan Gilbert
@ 2022-02-23 13:05         ` Peter Xu
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-02-23 13:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Wed, Feb 23, 2022 at 09:56:08AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Feb 22, 2022 at 10:52:23AM +0000, Dr. David Alan Gilbert wrote:
> > > This does get a bit complicated, which worries me a bit; the code here
> > > is already quite complicated.
> > 
> > Right, it's the way I chose in this patchset on solving this problem.  Not
> > sure whether there's any better and easier way.
> > 
> > For example, we could have used a new thread to send requested pages, and
> > synchronize it with the main thread.  But that'll need other kind of
> > complexity, and I can't quickly tell whether that'll be better.
> > 
> > For this single patch, more than half of the complexity comes from the
> > ability to interrupt sending one huge page half-way.  It's a bit of a pity
> > that, that part will be noop ultimately when with doublemap.
> 
> How does that huge-page interruption interact with recovery?
> i.e. do we know the start of that hugepage arrived?

That's a great question..  I should have mentioned that but I forgot.

When postcopy is interrupted during sending a huge page, the dest QEMU will
not be able to do the UFFDIO_COPY of that huge page (because it lacks
data!) then it also means the received bitmap of that huge page will be
completely cleared.

So when recover happens, the dest QEMU will tell the source about this fact
("Hey this huge page has never transferred", even if it actually has
transferred a few small pages already!).  Then the whole huge page will be
resent.

When postcopy preempt joins the equation, what we need to do is to reset
the temp huge pages (postcopy_pause_incoming()):

    /*
     * If network is interrupted, any temp page we received will be useless
     * because we didn't mark them as "received" in receivedmap.  After a
     * proper recovery later (which will sync src dirty bitmap with receivedmap
     * on dest) these cached small pages will be resent again.
     */
    for (i = 0; i < mis->postcopy_channels; i++) {
        postcopy_temp_page_reset(&mis->postcopy_tmp_pages[i]);
    }

This chunk of code lies in "migration: Introduce postcopy channels on dest
node" but not in the recovery patch, I think that's the major reason why
it's easily overlooked.  However it needs to be there to not break existing
postcopy.

So that's kind of hidden in the past because we don't manage the temp huge
pages explicitly (they used to be local vars, so get reset automatically),
but now we need to do that by hand.

> 
> > 
> > However I kept those only because we don't know when doublemap will be
> > ready, not to say, landing.  Meanwhile we can't assume all kernels will
> > have doublemap even in the future.
> 
> Yeh, if doublemap was already here you could make it a condition of
> allowing you to set the option.

Right.  We'll 100% skip the huge page interruption, just like when the
ramblock is using PAGE_SIZE small pages.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-23  9:52       ` Dr. David Alan Gilbert
@ 2022-02-23 13:14         ` Peter Xu
  2022-02-23 18:53           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-02-23 13:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Wed, Feb 23, 2022 at 09:52:08AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Feb 22, 2022 at 11:32:10AM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> > > > needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> > > > instead of stopping the thread it halts with a semaphore, preparing to be
> > > > kicked again when recovery is detected.
> > > > 
> > > > A mutex is introduced to make sure there's no concurrent operation upon the
> > > > socket.  To make it simple, the fast ram load thread will take the mutex during
> > > > its whole procedure, and only release it if it's paused.  The fast-path socket
> > > > will be properly released by the main loading thread safely when there's
> > > > network failures during postcopy with that mutex held.
> > > 
> > > I *think* this is mostly OK; but I worry I don't understand all the
> > > cases; e.g.
> > >   a) If the postcopy channel errors first
> > >   b) If the main channel errors first
> > 
> > Ah right, I don't think I handled all the cases.  Sorry.
> > 
> > We always check the main channel, but if the postcopy channel got faulted,
> > we may not fall into paused mode as expected.
> > 
> > I'll fix that up.
> 
> Thanks.
> 
> > > 
> > > Can you add some docs to walk through those and explain the locking ?
> > 
> > Sure.
> > 
> > The sem is mentioned in the last sentence of paragraph 1, where it's purely
> > used for a way to yield the fast ram load thread so that when something
> > wrong happens it can sleep on that semaphore.  Then when we recover we'll
> > post to the semaphore to kick it up.  We used it like that in many places,
> > e.g. postcopy_pause_sem_dst to yield the main load thread.
> > 
> > The 2nd paragraph above was for explaining why we need the mutex; it's
> > basically the same as rp_mutex protecting to_src_file, so that we won't
> > accidentally close() the qemufile during some other thread using it.  So
> > the fast ram load thread needs to take that new mutex for mostly the whole
> > lifecycle of itself (because it's loading from that qemufile), meanwhile
> > only drop the mutex when it prepares to sleep.  Then the main load thread
> > can recycle the postcopy channel using qemu_fclose() safely.
> 
> Yes, that feels like it needs to go in the code somewhere.

Sure, I'll further squash below comment update into the same patch.  I
reworded some places but mostly it should be telling the same thing:

---8<---
diff --git a/migration/migration.h b/migration/migration.h
index 945088064a..91f845e9e4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,7 +118,17 @@ struct MigrationIncomingState {
     /* Postcopy priority thread is used to receive postcopy requested pages */
     QemuThread postcopy_prio_thread;
     bool postcopy_prio_thread_created;
-    /* Used to sync with the prio thread */
+    /*
+     * Used to sync between the ram load main thread and the fast ram load
+     * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+     * fast channel.
+     *
+     * The ram fast load thread will take it mostly for the whole lifecycle
+     * because it needs to continuously read data from the channel, and
+     * it'll only release this mutex if postcopy is interrupted, so that
+     * the ram load main thread will take this mutex over and properly
+     * release the broken channel.
+     */
     QemuMutex postcopy_prio_thread_mutex;
     /*
      * An array of temp host huge pages to be used, one for each postcopy
@@ -149,6 +159,12 @@ struct MigrationIncomingState {
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
+    /*
+     * This semaphore is used to allow the ram fast load thread (only when
+     * postcopy preempt is enabled) fall into sleep when there's network
+     * interruption detected.  When the recovery is done, the main load
+     * thread will kick the fast ram load thread using this semaphore.
+     */
     QemuSemaphore postcopy_pause_sem_fast_load;
 
     /* List of listening socket addresses  */
---8<---

> 
> > [...]
> > 
> > > > @@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
> > > >          qemu_file_shutdown(file);
> > > >          qemu_fclose(file);
> > > >  
> > > > +        /*
> > > > +         * Do the same to postcopy fast path socket too if there is.  No
> > > > +         * locking needed because no racer as long as we do this before setting
> > > > +         * status to paused.
> > > > +         */
> > > > +        if (s->postcopy_qemufile_src) {
> > > > +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > > 
> > > Shouldn't this do a qemu_file_shutdown on here first?
> > 
> > Yes I probably should.
> > 
> > With all above, I plan to squash below changes into this patch:
> > 
> > ---8<---
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c68a281406..69778cab23 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3475,6 +3475,7 @@ static MigThrError postcopy_pause(MigrationState *s)
> >           */
> >          if (s->postcopy_qemufile_src) {
> >              migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > +            qemu_file_shutdown(s->postcopy_qemufile_src);
> >              qemu_fclose(s->postcopy_qemufile_src);
> >              s->postcopy_qemufile_src = NULL;
> >          }
> > @@ -3534,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState *s)
> >          return MIG_THR_ERR_FATAL;
> >      }
> > 
> > -    /* Try to detect any file errors */
> > -    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > +    /*
> > +     * Try to detect any file errors.  Note that postcopy_qemufile_src will
> > +     * be NULL when postcopy preempt is not enabled.
> > +     */
> > +    ret = qemu_file_get_error_obj_any(s->to_dst_file,
> > +                                      s->postcopy_qemufile_src,
> > +                                      &local_error);
> >      if (!ret) {
> >          /* Everything is fine */
> >          assert(!local_error);
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 1479cddad9..397652f0ba 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
> >      return f->last_error;
> >  }
> > 
> > +/*
> > + * Get last error for either stream f1 or f2 with optional Error*.
> > + * The error returned (non-zero) can be either from f1 or f2.
> > + *
> > + * If any of the qemufile* is NULL, then skip the check on that file.
> > + *
> > + * When there is no error on both qemufile, zero is returned.
> > + */
> > +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
> > +{
> > +    int ret = 0;
> > +
> > +    if (f1) {
> > +        ret = qemu_file_get_error_obj(f1, errp);
> > +        /* If there's already error detected, return */
> > +        if (ret) {
> > +            return ret;
> > +        }
> > +    }
> > +
> > +    if (f2) {
> > +        ret = qemu_file_get_error_obj(f2, errp);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> >  /*
> >   * Set the last error for stream f with optional Error*
> >   */
> > diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> > index 3f36d4dc8c..2564e5e1c7 100644
> > --- a/migration/qemu-file.h
> > +++ b/migration/qemu-file.h
> > @@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
> >  void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
> >  int64_t qemu_file_get_rate_limit(QEMUFile *f);
> >  int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
> > +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
> >  void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
> >  void qemu_file_set_error(QEMUFile *f, int ret);
> >  int qemu_file_shutdown(QEMUFile *f);
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 2d32340d28..24b69a1008 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2651,8 +2651,8 @@ retry:
> >      while (true) {
> >          section_type = qemu_get_byte(f);
> > 
> > -        if (qemu_file_get_error(f)) {
> > -            ret = qemu_file_get_error(f);
> > +        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
> > +        if (ret) {
> >              break;
> >          }
> > ---8<---
> > 
> > Does it look sane?  Let me know if there's still things missing.
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled
  2022-02-23 13:14         ` Peter Xu
@ 2022-02-23 18:53           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-02-23 18:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Feb 23, 2022 at 09:52:08AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Feb 22, 2022 at 11:32:10AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> > > > > needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> > > > > instead of stopping the thread it halts with a semaphore, preparing to be
> > > > > kicked again when recovery is detected.
> > > > > 
> > > > > A mutex is introduced to make sure there's no concurrent operation upon the
> > > > > socket.  To make it simple, the fast ram load thread will take the mutex during
> > > > > its whole procedure, and only release it if it's paused.  The fast-path socket
> > > > > will be properly released by the main loading thread safely when there's
> > > > > network failures during postcopy with that mutex held.
> > > > 
> > > > I *think* this is mostly OK; but I worry I don't understand all the
> > > > cases; e.g.
> > > >   a) If the postcopy channel errors first
> > > >   b) If the main channel errors first
> > > 
> > > Ah right, I don't think I handled all the cases.  Sorry.
> > > 
> > > We always check the main channel, but if the postcopy channel got faulted,
> > > we may not fall into paused mode as expected.
> > > 
> > > I'll fix that up.
> > 
> > Thanks.
> > 
> > > > 
> > > > Can you add some docs to walk through those and explain the locking ?
> > > 
> > > Sure.
> > > 
> > > The sem is mentioned in the last sentence of paragraph 1, where it's purely
> > > used for a way to yield the fast ram load thread so that when something
> > > wrong happens it can sleep on that semaphore.  Then when we recover we'll
> > > post to the semaphore to kick it up.  We used it like that in many places,
> > > e.g. postcopy_pause_sem_dst to yield the main load thread.
> > > 
> > > The 2nd paragraph above was for explaining why we need the mutex; it's
> > > basically the same as rp_mutex protecting to_src_file, so that we won't
> > > accidentally close() the qemufile during some other thread using it.  So
> > > the fast ram load thread needs to take that new mutex for mostly the whole
> > > lifecycle of itself (because it's loading from that qemufile), meanwhile
> > > only drop the mutex when it prepares to sleep.  Then the main load thread
> > > can recycle the postcopy channel using qemu_fclose() safely.
> > 
> > Yes, that feels like it needs to go in the code somewhere.
> 
> Sure, I'll further squash below comment update into the same patch.  I
> reworded some places but mostly it should be telling the same thing:
> 
> ---8<---
> diff --git a/migration/migration.h b/migration/migration.h
> index 945088064a..91f845e9e4 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -118,7 +118,17 @@ struct MigrationIncomingState {
>      /* Postcopy priority thread is used to receive postcopy requested pages */
>      QemuThread postcopy_prio_thread;
>      bool postcopy_prio_thread_created;
> -    /* Used to sync with the prio thread */
> +    /*
> +     * Used to sync between the ram load main thread and the fast ram load
> +     * thread.  It protects postcopy_qemufile_dst, which is the postcopy
> +     * fast channel.
> +     *
> +     * The ram fast load thread will take it mostly for the whole lifecycle
> +     * because it needs to continuously read data from the channel, and
> +     * it'll only release this mutex if postcopy is interrupted, so that
> +     * the ram load main thread will take this mutex over and properly
> +     * release the broken channel.
> +     */
>      QemuMutex postcopy_prio_thread_mutex;
>      /*
>       * An array of temp host huge pages to be used, one for each postcopy
> @@ -149,6 +159,12 @@ struct MigrationIncomingState {
>      /* notify PAUSED postcopy incoming migrations to try to continue */
>      QemuSemaphore postcopy_pause_sem_dst;
>      QemuSemaphore postcopy_pause_sem_fault;
> +    /*
> +     * This semaphore is used to allow the ram fast load thread (only when
> +     * postcopy preempt is enabled) fall into sleep when there's network
> +     * interruption detected.  When the recovery is done, the main load
> +     * thread will kick the fast ram load thread using this semaphore.
> +     */
>      QemuSemaphore postcopy_pause_sem_fast_load;

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  
>      /* List of listening socket addresses  */
> ---8<---
> 
> > 
> > > [...]
> > > 
> > > > > @@ -3466,6 +3468,17 @@ static MigThrError postcopy_pause(MigrationState *s)
> > > > >          qemu_file_shutdown(file);
> > > > >          qemu_fclose(file);
> > > > >  
> > > > > +        /*
> > > > > +         * Do the same to postcopy fast path socket too if there is.  No
> > > > > +         * locking needed because no racer as long as we do this before setting
> > > > > +         * status to paused.
> > > > > +         */
> > > > > +        if (s->postcopy_qemufile_src) {
> > > > > +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > > > 
> > > > Shouldn't this do a qemu_file_shutdown on here first?
> > > 
> > > Yes I probably should.
> > > 
> > > With all above, I plan to squash below changes into this patch:
> > > 
> > > ---8<---
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index c68a281406..69778cab23 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -3475,6 +3475,7 @@ static MigThrError postcopy_pause(MigrationState *s)
> > >           */
> > >          if (s->postcopy_qemufile_src) {
> > >              migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> > > +            qemu_file_shutdown(s->postcopy_qemufile_src);
> > >              qemu_fclose(s->postcopy_qemufile_src);
> > >              s->postcopy_qemufile_src = NULL;
> > >          }
> > > @@ -3534,8 +3535,13 @@ static MigThrError migration_detect_error(MigrationState *s)
> > >          return MIG_THR_ERR_FATAL;
> > >      }
> > > 
> > > -    /* Try to detect any file errors */
> > > -    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > +    /*
> > > +     * Try to detect any file errors.  Note that postcopy_qemufile_src will
> > > +     * be NULL when postcopy preempt is not enabled.
> > > +     */
> > > +    ret = qemu_file_get_error_obj_any(s->to_dst_file,
> > > +                                      s->postcopy_qemufile_src,
> > > +                                      &local_error);
> > >      if (!ret) {
> > >          /* Everything is fine */
> > >          assert(!local_error);
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 1479cddad9..397652f0ba 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
> > >      return f->last_error;
> > >  }
> > > 
> > > +/*
> > > + * Get last error for either stream f1 or f2 with optional Error*.
> > > + * The error returned (non-zero) can be either from f1 or f2.
> > > + *
> > > + * If any of the qemufile* is NULL, then skip the check on that file.
> > > + *
> > > + * When there is no error on both qemufile, zero is returned.
> > > + */
> > > +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
> > > +{
> > > +    int ret = 0;
> > > +
> > > +    if (f1) {
> > > +        ret = qemu_file_get_error_obj(f1, errp);
> > > +        /* If there's already error detected, return */
> > > +        if (ret) {
> > > +            return ret;
> > > +        }
> > > +    }
> > > +
> > > +    if (f2) {
> > > +        ret = qemu_file_get_error_obj(f2, errp);
> > > +    }
> > > +
> > > +    return ret;
> > > +}
> > > +
> > >  /*
> > >   * Set the last error for stream f with optional Error*
> > >   */
> > > diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> > > index 3f36d4dc8c..2564e5e1c7 100644
> > > --- a/migration/qemu-file.h
> > > +++ b/migration/qemu-file.h
> > > @@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
> > >  void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
> > >  int64_t qemu_file_get_rate_limit(QEMUFile *f);
> > >  int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
> > > +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
> > >  void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
> > >  void qemu_file_set_error(QEMUFile *f, int ret);
> > >  int qemu_file_shutdown(QEMUFile *f);
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index 2d32340d28..24b69a1008 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -2651,8 +2651,8 @@ retry:
> > >      while (true) {
> > >          section_type = qemu_get_byte(f);
> > > 
> > > -        if (qemu_file_get_error(f)) {
> > > -            ret = qemu_file_get_error(f);
> > > +        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
> > > +        if (ret) {
> > >              break;
> > >          }
> > > ---8<---
> > > 
> > > Does it look sane?  Let me know if there's still things missing.
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks!
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 20/20] tests: Add postcopy preempt test
  2022-02-23  7:50     ` Peter Xu
@ 2022-03-01  5:34       ` Peter Xu
  2022-03-01 17:00         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Xu @ 2022-03-01  5:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Wed, Feb 23, 2022 at 03:50:24PM +0800, Peter Xu wrote:
> On Tue, Feb 22, 2022 at 12:51:59PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Two tests are added: a normal postcopy preempt test, and a recovery test.
> > 
> > Yes, this is difficult; without hugepages the tests are limited; did you
> > see if this test actually causes pages to go down the fast path?
> 
> I didn't observe the test case explicitly, but I did observe in my own test
> that I ran that it goes with the fast path, or I can't get a huge speed up.
> 
> Meanwhile my own test is only using 2M huge pages, and I can observe
> interruptions of huge page sendings frequently.
> 
> But yeah let me try to capture something in this test too, at least to make
> sure the standalone socket is being used.  Covering of huge pages might be
> doable but obviously requires host privileges, so I'll leave that for later.

When I tried to observe the test case today, I found that the preempt new
tests are all running with the new channels, however funnily I found the
original vanilla test is using it too!

Looked into it, that's because the MigrateStart* pointer is freed in
test_migrate_start() but the test referenced it after that... so it's a
use-after-free bug in the test code.  I need to squash this:

---8<---
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 5053b40589..09a9ce4401 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -664,6 +664,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
                                     MigrateStart *args)
 {
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    /* NOTE: args will be freed in test_migrate_start(), cache it */
+    bool postcopy_preempt = args->postcopy_preempt;
     QTestState *from, *to;
 
     if (test_migrate_start(&from, &to, uri, args)) {
@@ -674,7 +676,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
     migrate_set_capability(to, "postcopy-ram", true);
     migrate_set_capability(to, "postcopy-blocktime", true);
 
-    if (args->postcopy_preempt) {
+    if (postcopy_preempt) {
         migrate_set_capability(from, "postcopy-preempt", true);
         migrate_set_capability(to, "postcopy-preempt", true);
     }
---8<---

That's tricky, and we could have done something better.. E.g., we could
pass in the MigrateStart** into test_migrate_start() so it can clear it
when free, that's not silent use-after-free but crashing, which is better
in this case.

I feel lucky I tried..

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 20/20] tests: Add postcopy preempt test
  2022-03-01  5:34       ` Peter Xu
@ 2022-03-01 17:00         ` Dr. David Alan Gilbert
  2022-03-02  6:41           ` Peter Xu
  0 siblings, 1 reply; 54+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-01 17:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Feb 23, 2022 at 03:50:24PM +0800, Peter Xu wrote:
> > On Tue, Feb 22, 2022 at 12:51:59PM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Two tests are added: a normal postcopy preempt test, and a recovery test.
> > > 
> > > Yes, this is difficult; without hugepages the tests are limited; did you
> > > see if this test actually causes pages to go down the fast path?
> > 
> > I didn't observe the test case explicitly, but I did observe in my own test
> > that I ran that it goes with the fast path, or I can't get a huge speed up.
> > 
> > Meanwhile my own test is only using 2M huge pages, and I can observe
> > interruptions of huge page sendings frequently.
> > 
> > But yeah let me try to capture something in this test too, at least to make
> > sure the standalone socket is being used.  Covering of huge pages might be
> > doable but obviously requires host privileges, so I'll leave that for later.
> 
> When I tried to observe the test case today, I found that the preempt new
> tests are all running with the new channels, however funnily I found the
> original vanilla test is using it too!
> 
> Looked into it, that's because the MigrateStart* pointer is freed in
> test_migrate_start() but the test referenced it after that... so it's a
> use-after-free bug in the test code.  I need to squash this:
> 
> ---8<---
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 5053b40589..09a9ce4401 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -664,6 +664,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
>                                      MigrateStart *args)
>  {
>      g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> +    /* NOTE: args will be freed in test_migrate_start(), cache it */
> +    bool postcopy_preempt = args->postcopy_preempt;
>      QTestState *from, *to;
>  
>      if (test_migrate_start(&from, &to, uri, args)) {
> @@ -674,7 +676,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
>      migrate_set_capability(to, "postcopy-ram", true);
>      migrate_set_capability(to, "postcopy-blocktime", true);
>  
> -    if (args->postcopy_preempt) {
> +    if (postcopy_preempt) {
>          migrate_set_capability(from, "postcopy-preempt", true);
>          migrate_set_capability(to, "postcopy-preempt", true);
>      }
> ---8<---

Ah OK, yes I guess that's needed.

> That's tricky, and we could have done something better.. E.g., we could
> pass in the MigrateStart** into test_migrate_start() so it can clear it
> when free, that's not silent use-after-free but crashing, which is better
> in this case.
> 
> I feel lucky I tried..

It could at least do with a comment on test_migrate_start?

Dave

> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 20/20] tests: Add postcopy preempt test
  2022-03-01 17:00         ` Dr. David Alan Gilbert
@ 2022-03-02  6:41           ` Peter Xu
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Xu @ 2022-03-02  6:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Leonardo Bras Soares Passos, qemu-devel, Juan Quintela

On Tue, Mar 01, 2022 at 05:00:15PM +0000, Dr. David Alan Gilbert wrote:
> > That's tricky, and we could have done something better.. E.g., we could
> > pass in the MigrateStart** into test_migrate_start() so it can clear it
> > when free, that's not silent use-after-free but crashing, which is better
> > in this case.
> > 
> > I feel lucky I tried..
> 
> It could at least do with a comment on test_migrate_start?

I've added one more patch there at the end of v2 for this:

[PATCH v2 25/25] tests: Pass in MigrateStart** into test_migrate_start()

Although it's at the end of the series, it can still be cleanly applied to
current master branch, too.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2022-03-02  6:47 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  6:27 [PATCH 00/20] migration: Postcopy Preemption Peter Xu
2022-02-16  6:27 ` [PATCH 01/20] migration: Dump sub-cmd name in loadvm_process_command tp Peter Xu
2022-02-16 15:42   ` Dr. David Alan Gilbert
2022-02-16  6:27 ` [PATCH 02/20] migration: Finer grained tracepoints for POSTCOPY_LISTEN Peter Xu
2022-02-16 15:43   ` Dr. David Alan Gilbert
2022-02-16  6:27 ` [PATCH 03/20] migration: Tracepoint change in postcopy-run bottom half Peter Xu
2022-02-16 19:00   ` Dr. David Alan Gilbert
2022-02-16  6:27 ` [PATCH 04/20] migration: Introduce postcopy channels on dest node Peter Xu
2022-02-21 15:49   ` Dr. David Alan Gilbert
2022-02-16  6:27 ` [PATCH 05/20] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
2022-02-16  6:27 ` [PATCH 06/20] migration: Add postcopy_thread_create() Peter Xu
2022-02-21 16:00   ` Dr. David Alan Gilbert
2022-02-16  6:27 ` [PATCH 07/20] migration: Move static var in ram_block_from_stream() into global Peter Xu
2022-02-16  6:27 ` [PATCH 08/20] migration: Add pss.postcopy_requested status Peter Xu
2022-02-16  6:27 ` [PATCH 09/20] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-02-16  6:27 ` [PATCH 10/20] migration: Enlarge postcopy recovery to capture !-EIO too Peter Xu
2022-02-21 16:15   ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 11/20] migration: postcopy_pause_fault_thread() never fails Peter Xu
2022-02-21 16:16   ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 12/20] migration: Export ram_load_postcopy() Peter Xu
2022-02-21 16:17   ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 13/20] migration: Move channel setup out of postcopy_try_recover() Peter Xu
2022-02-22 10:57   ` Dr. David Alan Gilbert
2022-02-23  6:40     ` Peter Xu
2022-02-23  9:47       ` Dr. David Alan Gilbert
2022-02-23 12:55         ` Peter Xu
2022-02-16  6:28 ` [PATCH 14/20] migration: Add migration_incoming_transport_cleanup() Peter Xu
2022-02-21 16:56   ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 15/20] migration: Allow migrate-recover to run multiple times Peter Xu
2022-02-21 17:03   ` Dr. David Alan Gilbert
2022-02-22  2:51     ` Peter Xu
2022-02-16  6:28 ` [PATCH 16/20] migration: Add postcopy-preempt capability Peter Xu
2022-02-16  6:28 ` [PATCH 17/20] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-02-21 18:39   ` Dr. David Alan Gilbert
2022-02-22  8:34     ` Peter Xu
2022-02-22 10:19       ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 18/20] migration: Postcopy preemption enablement Peter Xu
2022-02-22 10:52   ` Dr. David Alan Gilbert
2022-02-23  7:01     ` Peter Xu
2022-02-23  9:56       ` Dr. David Alan Gilbert
2022-02-23 13:05         ` Peter Xu
2022-02-16  6:28 ` [PATCH 19/20] migration: Postcopy recover with preempt enabled Peter Xu
2022-02-22 11:32   ` Dr. David Alan Gilbert
2022-02-23  7:45     ` Peter Xu
2022-02-23  9:52       ` Dr. David Alan Gilbert
2022-02-23 13:14         ` Peter Xu
2022-02-23 18:53           ` Dr. David Alan Gilbert
2022-02-16  6:28 ` [PATCH 20/20] tests: Add postcopy preempt test Peter Xu
2022-02-22 12:51   ` Dr. David Alan Gilbert
2022-02-23  7:50     ` Peter Xu
2022-03-01  5:34       ` Peter Xu
2022-03-01 17:00         ` Dr. David Alan Gilbert
2022-03-02  6:41           ` Peter Xu
2022-02-16  9:28 ` [PATCH 00/20] migration: Postcopy Preemption Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.