[PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered
@ 2020-09-03 15:26 Peter Xu
  2020-09-03 15:26 ` [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function Peter Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

We've seen conditional guest hangs on destination VM after postcopy recovered.
However the hang will resolve itself after a few minutes.

The problem is: after a postcopy recovery, the prioritized postcopy queue on
the source VM is actually missing.  So all the faulted threads before the
postcopy recovery happened will keep halted until (accidentally) the page got
copied by the background precopy migration stream.

The solution is to also refresh this information after postcopy recovery.  To
achieve this, we need to maintain a list of faulted addresses on the
destination node, so that we can resend the list when necessary.  This work is
done via patch 1-4.

With that, the last thing we need to do is to send this extra information to
source VM after recovered.  Very luckily, this synchronization can be
"emulated" by sending a bunch of page requests (although these pages have been
sent previously!) to source VM just like when we've got a page fault.  Even in
the 1st version of the postcopy code we'll handle duplicated pages well.  So
this fix does not even need a new capability bit and it'll work smoothly on old
QEMUs when we migrate from them to the new QEMUs.

Please review, thanks.

Peter Xu (5):
  migration: Rework migrate_send_rp_req_pages() function
  migration: Introduce migrate_send_rp_message_req_pages()
  migration: Pass incoming state into qemu_ufd_copy_ioctl()
  migration: Maintain postcopy faulted addresses
  migration: Sync requested pages after postcopy recovery

 migration/migration.c    | 71 +++++++++++++++++++++++++++++++++++-----
 migration/migration.h    | 23 +++++++++++--
 migration/postcopy-ram.c | 46 +++++++++++---------------
 migration/savevm.c       | 56 +++++++++++++++++++++++++++++++
 migration/trace-events   |  3 ++
 5 files changed, 163 insertions(+), 36 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function
  2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
@ 2020-09-03 15:26 ` Peter Xu
  2020-09-08  9:18   ` Dr. David Alan Gilbert
  2020-09-03 15:26 ` [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

We duplicated the logic of maintaining the last_rb variable at both callers of
this function.  Pass *rb pointer into the function so that we can avoid
duplicating the logic.  Also, when we have the rb pointer, it's also easier to
remove the original 2nd & 4th parameters, because both of them (name of the
ramblock when needed, or the page size) can be fetched from the ramblock
pointer too.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 26 ++++++++++++++++++--------
 migration/migration.h    |  4 ++--
 migration/postcopy-ram.c | 24 ++----------------------
 3 files changed, 22 insertions(+), 32 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58a5452471..6761e3f233 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -311,25 +311,35 @@ error:
     return ret;
 }
 
-/* Request a range of pages from the source VM at the given
- * start address.
- *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
- *           as the last request (a name must have been given previously)
+/* Request one page from the source VM at the given start address.
+ *   rb: the RAMBlock to request the page in
  *   Start: Address offset within the RB
  *   Len: Length in bytes required - must be a multiple of pagesize
  */
-int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
-                              ram_addr_t start, size_t len)
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
+                              ram_addr_t start)
 {
     uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
     size_t msglen = 12; /* start + len */
+    size_t len = qemu_ram_pagesize(rb);
     enum mig_rp_message_type msg_type;
+    const char *rbname;
+    int rbname_len;
 
     *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
     *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
 
-    if (rbname) {
-        int rbname_len = strlen(rbname);
+    /*
+     * We maintain the last ramblock that we requested for page.  Note that we
+     * don't need locking because this function will only be called within the
+     * postcopy ram fault thread.
+     */
+    if (rb != mis->last_rb) {
+        mis->last_rb = rb;
+
+        rbname = qemu_ram_get_idstr(rb);
+        rbname_len = strlen(rbname);
+
         assert(rbname_len < 256);
 
         bufc[msglen++] = rbname_len;
diff --git a/migration/migration.h b/migration/migration.h
index ae497bd45a..ca8dc4c773 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -328,8 +328,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
-int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
-                              ram_addr_t start, size_t len);
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
+                              ram_addr_t start);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1bb22f2b6c..11a70441a6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -684,14 +684,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    if (rb != mis->last_rb) {
-        mis->last_rb = rb;
-        migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                  aligned_rbo, pagesize);
-    } else {
-        /* Save some space */
-        migrate_send_rp_req_pages(mis, NULL, aligned_rbo, pagesize);
-    }
+    migrate_send_rp_req_pages(mis, rb, aligned_rbo);
     return 0;
 }
 
@@ -986,20 +979,7 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            if (rb != mis->last_rb) {
-                mis->last_rb = rb;
-                ret = migrate_send_rp_req_pages(mis,
-                                                qemu_ram_get_idstr(rb),
-                                                rb_offset,
-                                                qemu_ram_pagesize(rb));
-            } else {
-                /* Save some space */
-                ret = migrate_send_rp_req_pages(mis,
-                                                NULL,
-                                                rb_offset,
-                                                qemu_ram_pagesize(rb));
-            }
-
+            ret = migrate_send_rp_req_pages(mis, rb, rb_offset);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages()
  2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
  2020-09-03 15:26 ` [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function Peter Xu
@ 2020-09-03 15:26 ` Peter Xu
  2020-09-08  9:57   ` Dr. David Alan Gilbert
  2020-09-03 15:26 ` [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

This is another layer wrapper for sending a page request to the source VM,

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 10 ++++++++--
 migration/migration.h |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6761e3f233..6b43ffddbd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -316,8 +316,8 @@ error:
  *   Start: Address offset within the RB
  *   Len: Length in bytes required - must be a multiple of pagesize
  */
-int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
-                              ram_addr_t start)
+int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
+                                      RAMBlock *rb, ram_addr_t start)
 {
     uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
     size_t msglen = 12; /* start + len */
@@ -353,6 +353,12 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
     return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
+int migrate_send_rp_req_pages(MigrationIncomingState *mis,
+                              RAMBlock *rb, ram_addr_t start)
+{
+    return migrate_send_rp_message_req_pages(mis, rb, start);
+}
+
 static bool migration_colo_enabled;
 bool migration_incoming_colo_enabled(void)
 {
diff --git a/migration/migration.h b/migration/migration.h
index ca8dc4c773..f552725305 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -330,6 +330,8 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
                               ram_addr_t start);
+int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
+                                      RAMBlock *rb, ram_addr_t start);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl()
  2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
  2020-09-03 15:26 ` [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function Peter Xu
  2020-09-03 15:26 ` [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
@ 2020-09-03 15:26 ` Peter Xu
  2020-09-08  9:30   ` Dr. David Alan Gilbert
  2020-09-03 15:26 ` [PATCH 4/5] migration: Maintain postcopy faulted addresses Peter Xu
  2020-09-03 15:26 ` [PATCH 5/5] migration: Sync requested pages after postcopy recovery Peter Xu
  4 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

It'll be used in follow up patches to access more fields out of it.  Meanwhile
fetch the userfaultfd inside the function.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 11a70441a6..d333c3fd0e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1128,10 +1128,12 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
     return 0;
 }
 
-static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
+static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
                                void *from_addr, uint64_t pagesize, RAMBlock *rb)
 {
+    int userfault_fd = mis->userfault_fd;
     int ret;
+
     if (from_addr) {
         struct uffdio_copy copy_struct;
         copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
@@ -1185,7 +1187,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
      * which would be slightly cheaper, but we'd have to be careful
      * of the order of updating our page state.
      */
-    if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize, rb)) {
+    if (qemu_ufd_copy_ioctl(mis, host, from, pagesize, rb)) {
         int e = errno;
         error_report("%s: %s copy host: %p from: %p (size: %zd)",
                      __func__, strerror(e), host, from, pagesize);
@@ -1212,7 +1214,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
      * but it's not available for everything (e.g. hugetlbpages)
      */
     if (qemu_ram_is_uf_zeroable(rb)) {
-        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
+        if (qemu_ufd_copy_ioctl(mis, host, NULL, pagesize, rb)) {
             int e = errno;
             error_report("%s: %s zero host: %p",
                          __func__, strerror(e), host);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] migration: Maintain postcopy faulted addresses
  2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
                   ` (2 preceding siblings ...)
  2020-09-03 15:26 ` [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
@ 2020-09-03 15:26 ` Peter Xu
  2020-09-08 11:00   ` Dr. David Alan Gilbert
  2020-09-03 15:26 ` [PATCH 5/5] migration: Sync requested pages after postcopy recovery Peter Xu
  4 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

Maintain a list of faulted addresses on the destination host for which we're
waiting on.  This is implemented using a GTree rather than a real list to make
sure even there're plenty of vCPUs/threads that are faulting, the lookup will
still be fast with O(log(N)) (because we'll do that after placing each page).
It should bring a slight overhead, but ideally that shouldn't be a big problem
simply because in most cases the requested page list will be short.

Actually we did similar things for postcopy blocktime measurements.  This patch
didn't use that simply because:

  (1) blocktime measurement is towards vcpu threads only, but here we need to
      record all faulted addresses, including main thread and external
      thread (like, DPDK via vhost-user).

  (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
      don't want to add that extra dependency on the kernel version since not
      necessary.  E.g., we don't need to know which thread faulted on which
      page, we also don't care about multiple threads faulting on the same
      page.  But we only care about what addresses are faulted so waiting for a
      page copying from src.

  (3) blocktime measurement is not enabled by default.  However we need this by
      default especially for postcopy recover.

Another thing to mention is that this patch introduced a new mutex to serialize
the receivedmap and the page_requested tree, however that serialization does
not cover other procedures like UFFDIO_COPY.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 41 +++++++++++++++++++++++++++++++++++++++-
 migration/migration.h    | 19 ++++++++++++++++++-
 migration/postcopy-ram.c | 18 +++++++++++++++---
 migration/trace-events   |  2 ++
 4 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6b43ffddbd..e943d96c1b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -143,6 +143,13 @@ static int migration_maybe_pause(MigrationState *s,
                                  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
 
+static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
+{
+    uint64_t a = (uint64_t) ap, b = (uint64_t) bp;
+
+    return (a > b) - (a < b);
+}
+
 void migration_object_init(void)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -165,6 +172,8 @@ void migration_object_init(void)
     qemu_event_init(&current_incoming->main_thread_load_event, false);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
+    qemu_mutex_init(&current_incoming->page_request_mutex);
+    current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
     if (!migration_object_check(current_migration, &err)) {
         error_report_err(err);
@@ -238,6 +247,11 @@ void migration_incoming_state_destroy(void)
         mis->postcopy_remote_fds = NULL;
     }
 
+    if (mis->page_requested) {
+        g_tree_destroy(mis->page_requested);
+        mis->page_requested = NULL;
+    }
+
     qemu_event_reset(&mis->main_thread_load_event);
 
     if (mis->socket_address_list) {
@@ -354,8 +368,33 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
 }
 
 int migrate_send_rp_req_pages(MigrationIncomingState *mis,
-                              RAMBlock *rb, ram_addr_t start)
+                              RAMBlock *rb, ram_addr_t start, uint64_t haddr)
 {
+    uint64_t aligned = haddr & (-qemu_target_page_size());
+    bool received;
+
+    qemu_mutex_lock(&mis->page_request_mutex);
+    received = ramblock_recv_bitmap_test_byte_offset(rb, start);
+    if (!received && !g_tree_lookup(mis->page_requested, (gpointer) aligned)) {
+        /*
+         * The page has not been received, and it's not yet in the page request
+         * list.  Queue it.  Set the value of element to 1, so that things like
+         * g_tree_lookup() will return TRUE (1) when found.
+         */
+        g_tree_insert(mis->page_requested, (gpointer) aligned, (gpointer) 1);
+        mis->page_requested_count++;
+        trace_postcopy_page_req_add(aligned, mis->page_requested_count);
+    }
+    qemu_mutex_unlock(&mis->page_request_mutex);
+
+    /*
+     * If the page is there, skip sending the message.  We don't even need the
+     * lock because as long as the page arrived, it'll be there forever.
+     */
+    if (received) {
+        return 0;
+    }
+
     return migrate_send_rp_message_req_pages(mis, rb, start);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index f552725305..81311dc154 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -103,6 +103,23 @@ struct MigrationIncomingState {
 
     /* List of listening socket addresses  */
     SocketAddressList *socket_address_list;
+
+    /* A tree of pages that we requested to the source VM */
+    GTree *page_requested;
+    /* For debugging purpose only, but would be nice to keep */
+    int page_requested_count;
+    /*
+     * The mutex helps to maintain the requested pages that we sent to the
+     * source, IOW, to guarantee coherent between the page_requests tree and
+     * the per-ramblock receivedmap.  Note! This does not guarantee consistency
+     * of the real page copy procedures (using UFFDIO_[ZERO]COPY).  E.g., even
+     * if one bit in receivedmap is cleared, UFFDIO_COPY could have happened
+     * for that page already.  This is intended so that the mutex won't
+     * serialize and blocked by slow operations like UFFDIO_* ioctls.  However
+     * this should be enough to make sure the page_requested tree always
+     * contains valid information.
+     */
+    QemuMutex page_request_mutex;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
@@ -329,7 +346,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
-                              ram_addr_t start);
+                              ram_addr_t start, uint64_t haddr);
 int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
                                       RAMBlock *rb, ram_addr_t start);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d333c3fd0e..a30627e838 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -684,7 +684,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                         qemu_ram_get_idstr(rb), rb_offset);
         return postcopy_wake_shared(pcfd, client_addr, rb);
     }
-    migrate_send_rp_req_pages(mis, rb, aligned_rbo);
+    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
     return 0;
 }
 
@@ -979,7 +979,8 @@ retry:
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            ret = migrate_send_rp_req_pages(mis, rb, rb_offset);
+            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
+                                            msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
@@ -1149,10 +1150,21 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
         ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
     }
     if (!ret) {
+        qemu_mutex_lock(&mis->page_request_mutex);
         ramblock_recv_bitmap_set_range(rb, host_addr,
                                        pagesize / qemu_target_page_size());
+        /*
+         * If this page resolves a page fault for a previous recorded faulted
+         * address, take a special note to maintain the requested page list.
+         */
+        if (g_tree_lookup(mis->page_requested, (gconstpointer)host_addr)) {
+            g_tree_remove(mis->page_requested, (gconstpointer)host_addr);
+            mis->page_requested_count--;
+            trace_postcopy_page_req_del((uint64_t)host_addr,
+                                        mis->page_requested_count);
+        }
+        qemu_mutex_unlock(&mis->page_request_mutex);
         mark_postcopy_blocktime_end((uintptr_t)host_addr);
-
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 4ab0a503d2..b89ce02cb0 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -157,6 +157,7 @@ postcopy_pause_return_path(void) ""
 postcopy_pause_return_path_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_start_set_run(void) ""
+postcopy_page_req_add(uint64_t addr, int count) "new page req 0x%lx total %d"
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
 source_return_path_thread_entry(void) ""
@@ -267,6 +268,7 @@ postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu
 postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
 postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
+postcopy_page_req_del(uint64_t addr, int count) "resolved page req 0x%lx total %d"
 
 get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] migration: Sync requested pages after postcopy recovery
  2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
                   ` (3 preceding siblings ...)
  2020-09-03 15:26 ` [PATCH 4/5] migration: Maintain postcopy faulted addresses Peter Xu
@ 2020-09-03 15:26 ` Peter Xu
  2020-09-08 11:03   ` Dr. David Alan Gilbert
  4 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-09-03 15:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xiaohui Li, Dr . David Alan Gilbert, peterx, Juan Quintela

We synchronize the requested pages right after a postcopy recovery happens.
This helps to synchronize the prioritized pages on source so that the faulted
threads can be served faster.

Reported-by: Xiaohui Li <xiaohli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 56 ++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  1 +
 2 files changed, 57 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 304d98ff78..f998dd230d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2011,6 +2011,48 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     return LOADVM_QUIT;
 }
 
+/* We must be with page_request_mutex held */
+static gboolean postcopy_sync_page_req(gpointer key, gpointer value,
+                                       gpointer data)
+{
+    MigrationIncomingState *mis = data;
+    void *host_addr = (void *) key;
+    ram_addr_t rb_offset;
+    RAMBlock *rb;
+    int ret;
+
+    rb = qemu_ram_block_from_host(host_addr, true, &rb_offset);
+    if (!rb) {
+        /*
+         * This should _never_ happen.  However be nice for a migrating VM to
+         * not crash/assert.  Post an error (note: intended to not use *_once
+         * because we do want to see all the illegal addresses; and this can
+         * never be triggered by the guest so we're safe) and move on next.
+         */
+        error_report("%s: illegal host addr %p", __func__, host_addr);
+        /* Try the next entry */
+        return FALSE;
+    }
+
+    ret = migrate_send_rp_message_req_pages(mis, rb, rb_offset);
+    if (ret) {
+        /* Refer to above comment - just try our best to continue */
+        error_report("%s: send rp message failed for addr %p",
+                     __func__, host_addr);
+    }
+
+    trace_postcopy_page_req_sync((uint64_t)host_addr);
+
+    return FALSE;
+}
+
+static void migrate_send_rp_req_pages_pending(MigrationIncomingState *mis)
+{
+    qemu_mutex_lock(&mis->page_request_mutex);
+    g_tree_foreach(mis->page_requested, postcopy_sync_page_req, mis);
+    qemu_mutex_unlock(&mis->page_request_mutex);
+}
+
 static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
 {
     if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
@@ -2033,6 +2075,20 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
     /* Tell source that "we are ready" */
     migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
 
+    /*
+     * After a postcopy recovery, the source should have lost the postcopy
+     * queue, or potentially the requested pages could have been lost during
+     * the network down phase.  Let's re-sync with the source VM by re-sending
+     * all the pending pages that we eagerly need, so these threads won't get
+     * blocked too long due to the recovery.
+     *
+     * Without this procedure, the faulted destination VM threads (waiting for
+     * page requests right before the postcopy is interrupted) can keep hanging
+     * until the pages are sent by the source during the background copying of
+     * pages, or another thread faulted on the same address accidentally.
+     */
+    migrate_send_rp_req_pages_pending(mis);
+
     return 0;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index b89ce02cb0..54a6dd2761 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -49,6 +49,7 @@ vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
+postcopy_page_req_sync(uint64_t host_addr) "sync page req 0x%"PRIx64
 
 # vmstate.c
 vmstate_load_field_error(const char *field, int ret) "field \"%s\" load failed, ret = %d"
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function
  2020-09-03 15:26 ` [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function Peter Xu
@ 2020-09-08  9:18   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-08  9:18 UTC (permalink / raw)
  To: Peter Xu; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We duplicated the logic of maintaining the last_rb variable at both callers of
> this function.  Pass *rb pointer into the function so that we can avoid
> duplicating the logic.  Also, when we have the rb pointer, it's also easier to
> remove the original 2nd & 4th parameters, because both of them (name of the
> ramblock when needed, or the page size) can be fetched from the ramblock
> pointer too.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c    | 26 ++++++++++++++++++--------
>  migration/migration.h    |  4 ++--
>  migration/postcopy-ram.c | 24 ++----------------------
>  3 files changed, 22 insertions(+), 32 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 58a5452471..6761e3f233 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -311,25 +311,35 @@ error:
>      return ret;
>  }
>  
> -/* Request a range of pages from the source VM at the given
> - * start address.
> - *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> - *           as the last request (a name must have been given previously)
> +/* Request one page from the source VM at the given start address.
> + *   rb: the RAMBlock to request the page in
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> -                              ram_addr_t start, size_t len)
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> +                              ram_addr_t start)
>  {
>      uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>      size_t msglen = 12; /* start + len */
> +    size_t len = qemu_ram_pagesize(rb);
>      enum mig_rp_message_type msg_type;
> +    const char *rbname;
> +    int rbname_len;
>  
>      *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
>      *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
>  
> -    if (rbname) {
> -        int rbname_len = strlen(rbname);
> +    /*
> +     * We maintain the last ramblock that we requested for page.  Note that we
> +     * don't need locking because this function will only be called within the
> +     * postcopy ram fault thread.
> +     */
> +    if (rb != mis->last_rb) {
> +        mis->last_rb = rb;
> +
> +        rbname = qemu_ram_get_idstr(rb);
> +        rbname_len = strlen(rbname);
> +
>          assert(rbname_len < 256);
>  
>          bufc[msglen++] = rbname_len;
> diff --git a/migration/migration.h b/migration/migration.h
> index ae497bd45a..ca8dc4c773 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -328,8 +328,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>                            uint32_t value);
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
> -int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> -                              ram_addr_t start, size_t len);
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> +                              ram_addr_t start);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                   char *block_name);
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 1bb22f2b6c..11a70441a6 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -684,14 +684,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>                                          qemu_ram_get_idstr(rb), rb_offset);
>          return postcopy_wake_shared(pcfd, client_addr, rb);
>      }
> -    if (rb != mis->last_rb) {
> -        mis->last_rb = rb;
> -        migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
> -                                  aligned_rbo, pagesize);
> -    } else {
> -        /* Save some space */
> -        migrate_send_rp_req_pages(mis, NULL, aligned_rbo, pagesize);
> -    }
> +    migrate_send_rp_req_pages(mis, rb, aligned_rbo);
>      return 0;
>  }
>  
> @@ -986,20 +979,7 @@ retry:
>               * Send the request to the source - we want to request one
>               * of our host page sizes (which is >= TPS)
>               */
> -            if (rb != mis->last_rb) {
> -                mis->last_rb = rb;
> -                ret = migrate_send_rp_req_pages(mis,
> -                                                qemu_ram_get_idstr(rb),
> -                                                rb_offset,
> -                                                qemu_ram_pagesize(rb));
> -            } else {
> -                /* Save some space */
> -                ret = migrate_send_rp_req_pages(mis,
> -                                                NULL,
> -                                                rb_offset,
> -                                                qemu_ram_pagesize(rb));
> -            }
> -
> +            ret = migrate_send_rp_req_pages(mis, rb, rb_offset);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
>                  if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl()
  2020-09-03 15:26 ` [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
@ 2020-09-08  9:30   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-08  9:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> It'll be used in follow up patches to access more fields out of it.  Meanwhile
> fetch the userfaultfd inside the function.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/postcopy-ram.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 11a70441a6..d333c3fd0e 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1128,10 +1128,12 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>      return 0;
>  }
>  
> -static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> +static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
>                                 void *from_addr, uint64_t pagesize, RAMBlock *rb)
>  {
> +    int userfault_fd = mis->userfault_fd;
>      int ret;
> +
>      if (from_addr) {
>          struct uffdio_copy copy_struct;
>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> @@ -1185,7 +1187,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>       * which would be slightly cheaper, but we'd have to be careful
>       * of the order of updating our page state.
>       */
> -    if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize, rb)) {
> +    if (qemu_ufd_copy_ioctl(mis, host, from, pagesize, rb)) {
>          int e = errno;
>          error_report("%s: %s copy host: %p from: %p (size: %zd)",
>                       __func__, strerror(e), host, from, pagesize);
> @@ -1212,7 +1214,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
>       * but it's not available for everything (e.g. hugetlbpages)
>       */
>      if (qemu_ram_is_uf_zeroable(rb)) {
> -        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
> +        if (qemu_ufd_copy_ioctl(mis, host, NULL, pagesize, rb)) {
>              int e = errno;
>              error_report("%s: %s zero host: %p",
>                           __func__, strerror(e), host);
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages()
  2020-09-03 15:26 ` [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
@ 2020-09-08  9:57   ` Dr. David Alan Gilbert
  2020-09-08 20:20     ` Peter Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-08  9:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> This is another layer wrapper for sending a page request to the source VM,

Ah, it's not obvious why this is needed until 4/5 :-)


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/migration.c | 10 ++++++++--
>  migration/migration.h |  2 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6761e3f233..6b43ffddbd 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -316,8 +316,8 @@ error:
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> -                              ram_addr_t start)
> +int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
> +                                      RAMBlock *rb, ram_addr_t start)
>  {
>      uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>      size_t msglen = 12; /* start + len */
> @@ -353,6 +353,12 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
>      return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> +                              RAMBlock *rb, ram_addr_t start)
> +{
> +    return migrate_send_rp_message_req_pages(mis, rb, start);
> +}
> +
>  static bool migration_colo_enabled;
>  bool migration_incoming_colo_enabled(void)
>  {
> diff --git a/migration/migration.h b/migration/migration.h
> index ca8dc4c773..f552725305 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -330,6 +330,8 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
>                                ram_addr_t start);
> +int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
> +                                      RAMBlock *rb, ram_addr_t start);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
>                                   char *block_name);
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] migration: Maintain postcopy faulted addresses
  2020-09-03 15:26 ` [PATCH 4/5] migration: Maintain postcopy faulted addresses Peter Xu
@ 2020-09-08 11:00   ` Dr. David Alan Gilbert
  2020-09-08 19:42     ` Peter Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-08 11:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Maintain a list of faulted addresses on the destination host for which we're
> waiting on.  This is implemented using a GTree rather than a real list to make
> sure even there're plenty of vCPUs/threads that are faulting, the lookup will
> still be fast with O(log(N)) (because we'll do that after placing each page).
> It should bring a slight overhead, but ideally that shouldn't be a big problem
> simply because in most cases the requested page list will be short.
> 
> Actually we did similar things for postcopy blocktime measurements.  This patch
> didn't use that simply because:
> 
>   (1) blocktime measurement is towards vcpu threads only, but here we need to
>       record all faulted addresses, including main thread and external
>       thread (like, DPDK via vhost-user).
> 
>   (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
>       don't want to add that extra dependency on the kernel version since not
>       necessary.  E.g., we don't need to know which thread faulted on which
>       page, we also don't care about multiple threads faulting on the same
>       page.  But we only care about what addresses are faulted so waiting for a
>       page copying from src.
> 
>   (3) blocktime measurement is not enabled by default.  However we need this by
>       default especially for postcopy recover.
> 
> Another thing to mention is that this patch introduced a new mutex to serialize
> the receivedmap and the page_requested tree, however that serialization does
> not cover other procedures like UFFDIO_COPY.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

a couple of minor things:

> ---
>  migration/migration.c    | 41 +++++++++++++++++++++++++++++++++++++++-
>  migration/migration.h    | 19 ++++++++++++++++++-
>  migration/postcopy-ram.c | 18 +++++++++++++++---
>  migration/trace-events   |  2 ++
>  4 files changed, 75 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6b43ffddbd..e943d96c1b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -143,6 +143,13 @@ static int migration_maybe_pause(MigrationState *s,
>                                   int new_state);
>  static void migrate_fd_cancel(MigrationState *s);
>  
> +static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
> +{
> +    uint64_t a = (uint64_t) ap, b = (uint64_t) bp;
> +
> +    return (a > b) - (a < b);
> +}
> +
>  void migration_object_init(void)
>  {
>      MachineState *ms = MACHINE(qdev_get_machine());
> @@ -165,6 +172,8 @@ void migration_object_init(void)
>      qemu_event_init(&current_incoming->main_thread_load_event, false);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
> +    qemu_mutex_init(&current_incoming->page_request_mutex);
> +    current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>  
>      if (!migration_object_check(current_migration, &err)) {
>          error_report_err(err);
> @@ -238,6 +247,11 @@ void migration_incoming_state_destroy(void)
>          mis->postcopy_remote_fds = NULL;
>      }
>  
> +    if (mis->page_requested) {
> +        g_tree_destroy(mis->page_requested);
> +        mis->page_requested = NULL;
> +    }
> +

I think you want a:
       qemu_mutex_destroy(&current_incoming->page_request_mutex);

>      qemu_event_reset(&mis->main_thread_load_event);
>  
>      if (mis->socket_address_list) {
> @@ -354,8 +368,33 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
>  }
>  
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> -                              RAMBlock *rb, ram_addr_t start)
> +                              RAMBlock *rb, ram_addr_t start, uint64_t haddr)
>  {
> +    uint64_t aligned = haddr & (-qemu_target_page_size());
> +    bool received;
> +
> +    qemu_mutex_lock(&mis->page_request_mutex);

You could use WITH_QEMU_LOCK_GUARD


Dave

> +    received = ramblock_recv_bitmap_test_byte_offset(rb, start);
> +    if (!received && !g_tree_lookup(mis->page_requested, (gpointer) aligned)) {
> +        /*
> +         * The page has not been received, and it's not yet in the page request
> +         * list.  Queue it.  Set the value of element to 1, so that things like
> +         * g_tree_lookup() will return TRUE (1) when found.
> +         */
> +        g_tree_insert(mis->page_requested, (gpointer) aligned, (gpointer) 1);
> +        mis->page_requested_count++;
> +        trace_postcopy_page_req_add(aligned, mis->page_requested_count);
> +    }
> +    qemu_mutex_unlock(&mis->page_request_mutex);
> +
> +    /*
> +     * If the page is there, skip sending the message.  We don't even need the
> +     * lock because as long as the page arrived, it'll be there forever.
> +     */
> +    if (received) {
> +        return 0;
> +    }
> +
>      return migrate_send_rp_message_req_pages(mis, rb, start);
>  }
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index f552725305..81311dc154 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -103,6 +103,23 @@ struct MigrationIncomingState {
>  
>      /* List of listening socket addresses  */
>      SocketAddressList *socket_address_list;
> +
> +    /* A tree of pages that we requested to the source VM */
> +    GTree *page_requested;
> +    /* For debugging purpose only, but would be nice to keep */
> +    int page_requested_count;
> +    /*
> +     * The mutex helps to maintain the requested pages that we sent to the
> +     * source, IOW, to guarantee coherent between the page_requests tree and
> +     * the per-ramblock receivedmap.  Note! This does not guarantee consistency
> +     * of the real page copy procedures (using UFFDIO_[ZERO]COPY).  E.g., even
> +     * if one bit in receivedmap is cleared, UFFDIO_COPY could have happened
> +     * for that page already.  This is intended so that the mutex won't
> +     * serialize and blocked by slow operations like UFFDIO_* ioctls.  However
> +     * this should be enough to make sure the page_requested tree always
> +     * contains valid information.
> +     */
> +    QemuMutex page_request_mutex;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> @@ -329,7 +346,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> -                              ram_addr_t start);
> +                              ram_addr_t start, uint64_t haddr);
>  int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
>                                        RAMBlock *rb, ram_addr_t start);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index d333c3fd0e..a30627e838 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -684,7 +684,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>                                          qemu_ram_get_idstr(rb), rb_offset);
>          return postcopy_wake_shared(pcfd, client_addr, rb);
>      }
> -    migrate_send_rp_req_pages(mis, rb, aligned_rbo);
> +    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
>      return 0;
>  }
>  
> @@ -979,7 +979,8 @@ retry:
>               * Send the request to the source - we want to request one
>               * of our host page sizes (which is >= TPS)
>               */
> -            ret = migrate_send_rp_req_pages(mis, rb, rb_offset);
> +            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
> +                                            msg.arg.pagefault.address);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
>                  if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> @@ -1149,10 +1150,21 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>      }
>      if (!ret) {
> +        qemu_mutex_lock(&mis->page_request_mutex);
>          ramblock_recv_bitmap_set_range(rb, host_addr,
>                                         pagesize / qemu_target_page_size());
> +        /*
> +         * If this page resolves a page fault for a previous recorded faulted
> +         * address, take a special note to maintain the requested page list.
> +         */
> +        if (g_tree_lookup(mis->page_requested, (gconstpointer)host_addr)) {
> +            g_tree_remove(mis->page_requested, (gconstpointer)host_addr);
> +            mis->page_requested_count--;
> +            trace_postcopy_page_req_del((uint64_t)host_addr,
> +                                        mis->page_requested_count);
> +        }
> +        qemu_mutex_unlock(&mis->page_request_mutex);
>          mark_postcopy_blocktime_end((uintptr_t)host_addr);
> -
>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 4ab0a503d2..b89ce02cb0 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -157,6 +157,7 @@ postcopy_pause_return_path(void) ""
>  postcopy_pause_return_path_continued(void) ""
>  postcopy_pause_continued(void) ""
>  postcopy_start_set_run(void) ""
> +postcopy_page_req_add(uint64_t addr, int count) "new page req 0x%lx total %d"
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
>  source_return_path_thread_entry(void) ""
> @@ -267,6 +268,7 @@ postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu
>  postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
>  postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
>  postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
> +postcopy_page_req_del(uint64_t addr, int count) "resolved page req 0x%lx total %d"
>  
>  get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
>  
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] migration: Sync requested pages after postcopy recovery
  2020-09-03 15:26 ` [PATCH 5/5] migration: Sync requested pages after postcopy recovery Peter Xu
@ 2020-09-08 11:03   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-08 11:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We synchronize the requested pages right after a postcopy recovery happens.
> This helps to synchronize the prioritized pages on source so that the faulted
> threads can be served faster.
> 
> Reported-by: Xiaohui Li <xiaohli@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/savevm.c     | 56 ++++++++++++++++++++++++++++++++++++++++++
>  migration/trace-events |  1 +
>  2 files changed, 57 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 304d98ff78..f998dd230d 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2011,6 +2011,48 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>      return LOADVM_QUIT;
>  }
>  
> +/* We must be with page_request_mutex held */
> +static gboolean postcopy_sync_page_req(gpointer key, gpointer value,
> +                                       gpointer data)
> +{
> +    MigrationIncomingState *mis = data;
> +    void *host_addr = (void *) key;
> +    ram_addr_t rb_offset;
> +    RAMBlock *rb;
> +    int ret;
> +
> +    rb = qemu_ram_block_from_host(host_addr, true, &rb_offset);
> +    if (!rb) {
> +        /*
> +         * This should _never_ happen.  However be nice for a migrating VM to
> +         * not crash/assert.  Post an error (note: intended to not use *_once
> +         * because we do want to see all the illegal addresses; and this can
> +         * never be triggered by the guest so we're safe) and move on next.
> +         */
> +        error_report("%s: illegal host addr %p", __func__, host_addr);
> +        /* Try the next entry */
> +        return FALSE;
> +    }
> +
> +    ret = migrate_send_rp_message_req_pages(mis, rb, rb_offset);
> +    if (ret) {
> +        /* Refer to above comment - just try our best to continue */
> +        error_report("%s: send rp message failed for addr %p",
> +                     __func__, host_addr);
> +    }
> +
> +    trace_postcopy_page_req_sync((uint64_t)host_addr);
> +
> +    return FALSE;
> +}

OK,

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

but...

> +static void migrate_send_rp_req_pages_pending(MigrationIncomingState *mis)
> +{
> +    qemu_mutex_lock(&mis->page_request_mutex);
> +    g_tree_foreach(mis->page_requested, postcopy_sync_page_req, mis);
> +    qemu_mutex_unlock(&mis->page_request_mutex);

could have used the lock macro there.

Dave

> +}
> +
>  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>  {
>      if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> @@ -2033,6 +2075,20 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>      /* Tell source that "we are ready" */
>      migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
>  
> +    /*
> +     * After a postcopy recovery, the source should have lost the postcopy
> +     * queue, or potentially the requested pages could have been lost during
> +     * the network down phase.  Let's re-sync with the source VM by re-sending
> +     * all the pending pages that we eagerly need, so these threads won't get
> +     * blocked too long due to the recovery.
> +     *
> +     * Without this procedure, the faulted destination VM threads (waiting for
> +     * page requests right before the postcopy is interrupted) can keep hanging
> +     * until the pages are sent by the source during the background copying of
> +     * pages, or another thread faulted on the same address accidentally.
> +     */
> +    migrate_send_rp_req_pages_pending(mis);
> +
>      return 0;
>  }
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index b89ce02cb0..54a6dd2761 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -49,6 +49,7 @@ vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
>  vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
>  postcopy_pause_incoming(void) ""
>  postcopy_pause_incoming_continued(void) ""
> +postcopy_page_req_sync(uint64_t host_addr) "sync page req 0x%"PRIx64
>  
>  # vmstate.c
>  vmstate_load_field_error(const char *field, int ret) "field \"%s\" load failed, ret = %d"
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] migration: Maintain postcopy faulted addresses
  2020-09-08 11:00   ` Dr. David Alan Gilbert
@ 2020-09-08 19:42     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2020-09-08 19:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

On Tue, Sep 08, 2020 at 12:00:39PM +0100, Dr. David Alan Gilbert wrote:
> > @@ -238,6 +247,11 @@ void migration_incoming_state_destroy(void)
> >          mis->postcopy_remote_fds = NULL;
> >      }
> >  
> > +    if (mis->page_requested) {
> > +        g_tree_destroy(mis->page_requested);
> > +        mis->page_requested = NULL;
> > +    }
> > +
> 
> I think you want a:
>        qemu_mutex_destroy(&current_incoming->page_request_mutex);

I explicitly didn't do that because I saw that we've got quite a few things
that were not destroyed here, just in case I introduce some bug on multi-free
of the mutex.  However... after a closer look, I don't see a reason to not free
them at all...  Namely:

    - postcopy_pause_sem_dst
    - postcopy_pause_sem_fault
    - rp_mutex
    - main_thread_load_event (instead of _reset it in this function, we might
      want to use _destroy)

I'll prepare another standalone patch for that.

> 
> >      qemu_event_reset(&mis->main_thread_load_event);
> >  
> >      if (mis->socket_address_list) {
> > @@ -354,8 +368,33 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
> >  }
> >  
> >  int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> > -                              RAMBlock *rb, ram_addr_t start)
> > +                              RAMBlock *rb, ram_addr_t start, uint64_t haddr)
> >  {
> > +    uint64_t aligned = haddr & (-qemu_target_page_size());
> > +    bool received;
> > +
> > +    qemu_mutex_lock(&mis->page_request_mutex);
> 
> You could use WITH_QEMU_LOCK_GUARD

Sure.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages()
  2020-09-08  9:57   ` Dr. David Alan Gilbert
@ 2020-09-08 20:20     ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2020-09-08 20:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Xiaohui Li, qemu-devel, Juan Quintela

On Tue, Sep 08, 2020 at 10:57:57AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > This is another layer wrapper for sending a page request to the source VM,
> 
> Ah, it's not obvious why this is needed until 4/5 :-)

Yeah. :) I'll try to move this to be before that one in the next version.

> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-08 20:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 15:26 [PATCH 0/5] migration/postcopy: Sync faulted addresses after network recovered Peter Xu
2020-09-03 15:26 ` [PATCH 1/5] migration: Rework migrate_send_rp_req_pages() function Peter Xu
2020-09-08  9:18   ` Dr. David Alan Gilbert
2020-09-03 15:26 ` [PATCH 2/5] migration: Introduce migrate_send_rp_message_req_pages() Peter Xu
2020-09-08  9:57   ` Dr. David Alan Gilbert
2020-09-08 20:20     ` Peter Xu
2020-09-03 15:26 ` [PATCH 3/5] migration: Pass incoming state into qemu_ufd_copy_ioctl() Peter Xu
2020-09-08  9:30   ` Dr. David Alan Gilbert
2020-09-03 15:26 ` [PATCH 4/5] migration: Maintain postcopy faulted addresses Peter Xu
2020-09-08 11:00   ` Dr. David Alan Gilbert
2020-09-08 19:42     ` Peter Xu
2020-09-03 15:26 ` [PATCH 5/5] migration: Sync requested pages after postcopy recovery Peter Xu
2020-09-08 11:03   ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).