All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 0/9] migration queue
@ 2019-09-25 15:01 Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 1/9] migration: fix vmdesc leak on vmstate_save() error Dr. David Alan Gilbert (git)
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit 240ab11fb72049d6373cbbec8d788f8e411a00bc:

  Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190924' into staging (2019-09-24 15:36:31 +0100)

are available in the Git repository at:

  git://github.com/dagrh/qemu.git tags/pull-migration-20190925a

for you to fetch changes up to 3748fef9b95a9bc1602f3c4ed2a329d8ef47e63c:

  migration/postcopy: Recognise the recovery states as 'in_postcopy' (2019-09-25 15:51:19 +0100)

----------------------------------------------------------------
Migration pull 2019-09-25

  me: test fixes from (should stop hangs in postcopy tests).
  me: An RDMA cleanup hang fix
  Wei: Tidy ups around postcopy
  Marc-Andre: mem leak fix

----------------------------------------------------------------
Dr. David Alan Gilbert (5):
      migration/rdma: Don't moan about disconnects at the end
      migration/rdma.c: Swap synchronize_rcu for call_rcu
      tests/migration: Fail on unexpected migration states
      tests/migration/postcopy: trim migration bandwidth
      migration/postcopy: Recognise the recovery states as 'in_postcopy'

Marc-André Lureau (1):
      migration: fix vmdesc leak on vmstate_save() error

Wei Yang (3):
      migration/postcopy: not necessary to do discard when canonicalizing bitmap
      migration/postcopy: unsentmap is not necessary for postcopy
      migration: remove sent parameter in get_queued_page_not_dirty

 include/exec/ram_addr.h |  6 ----
 migration/migration.c   |  9 ++++-
 migration/qjson.h       |  2 ++
 migration/ram.c         | 94 ++++++++-----------------------------------------
 migration/rdma.c        | 51 ++++++++++++++++++---------
 migration/savevm.c      |  3 +-
 migration/trace-events  |  2 +-
 tests/migration-test.c  | 25 +++++++++----
 8 files changed, 80 insertions(+), 112 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PULL 1/9] migration: fix vmdesc leak on vmstate_save() error
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 2/9] migration/postcopy: not necessary to do discard when canonicalizing bitmap Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: Marc-André Lureau <marcandre.lureau@redhat.com>

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20190912122514.22504-2-marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qjson.h  | 2 ++
 migration/savevm.c | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/qjson.h b/migration/qjson.h
index 41664f2d71..1786bb5864 100644
--- a/migration/qjson.h
+++ b/migration/qjson.h
@@ -24,4 +24,6 @@ void json_start_object(QJSON *json, const char *name);
 const char *qjson_get_str(QJSON *json);
 void qjson_finish(QJSON *json);
 
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(QJSON, qjson_destroy)
+
 #endif /* QEMU_QJSON_H */
diff --git a/migration/savevm.c b/migration/savevm.c
index ee06f91d42..bb9462a54d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1314,7 +1314,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    QJSON *vmdesc;
+    g_autoptr(QJSON) vmdesc = NULL;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
@@ -1375,7 +1375,6 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_be32(f, vmdesc_len);
         qemu_put_buffer(f, (uint8_t *)qjson_get_str(vmdesc), vmdesc_len);
     }
-    qjson_destroy(vmdesc);
 
     return 0;
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 2/9] migration/postcopy: not necessary to do discard when canonicalizing bitmap
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 1/9] migration: fix vmdesc leak on vmstate_save() error Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 3/9] migration/postcopy: unsentmap is not necessary for postcopy Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: Wei Yang <richardw.yang@linux.intel.com>

All pages, either partially sent or partially dirty, will be discarded in
postcopy_send_discard_bm_ram(), since we update the unsentmap to be
unsentmap = unsentmap | dirty in ram_postcopy_send_discard_bitmap().

This is not necessary to do discard when canonicalizing bitmap. And by
doing so, we separate the page discard into two individual steps:

  * canonicalize bitmap
  * discard page

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190819061843.28642-2-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 01df326767..57d1a4627e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2928,7 +2928,7 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
 }
 
 /**
- * postcopy_chunk_hostpages_pass: canocalize bitmap in hostpages
+ * postcopy_chunk_hostpages_pass: canonicalize bitmap in hostpages
  *
  * Helper for postcopy_chunk_hostpages; it's called twice to
  * canonicalize the two bitmaps, that are similar, but one is
@@ -2991,18 +2991,6 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
                                                              host_ratio);
             run_start = QEMU_ALIGN_UP(run_start, host_ratio);
 
-            /* Tell the destination to discard this page */
-            if (unsent_pass || !test_bit(fixup_start_addr, unsentmap)) {
-                /* For the unsent_pass we:
-                 *     discard partially sent pages
-                 * For the !unsent_pass (dirty) we:
-                 *     discard partially dirty pages that were sent
-                 *     (any partially sent pages were already discarded
-                 *     by the previous unsent_pass)
-                 */
-                postcopy_discard_send_range(ms, fixup_start_addr, host_ratio);
-            }
-
             /* Clean up the bitmap */
             for (page = fixup_start_addr;
                  page < fixup_start_addr + host_ratio; page++) {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 3/9] migration/postcopy: unsentmap is not necessary for postcopy
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 1/9] migration: fix vmdesc leak on vmstate_save() error Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 2/9] migration/postcopy: not necessary to do discard when canonicalizing bitmap Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 4/9] migration: remove sent parameter in get_queued_page_not_dirty Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: Wei Yang <richardw.yang@linux.intel.com>

Commit f3f491fcd6dd594ba695 ('Postcopy: Maintain unsentmap') introduced
unsentmap to track not yet sent pages.

This is not necessary since:

    * unsentmap is a sub-set of bmap before postcopy start
    * unsentmap is the summation of bmap and unsentmap after canonicalizing

This patch just removes it.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190819061843.28642-3-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/exec/ram_addr.h |  6 ----
 migration/ram.c         | 80 ++++++++---------------------------------
 2 files changed, 14 insertions(+), 72 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index a327a80cfe..e96e621de5 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -44,12 +44,6 @@ struct RAMBlock {
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
-    /* bitmap of pages that haven't been sent even once
-     * only maintained and used in postcopy at the moment
-     * where it's used to send the dirtymap at the start
-     * of the postcopy phase
-     */
-    unsigned long *unsentmap;
     /* bitmap of already received pages in postcopy */
     unsigned long *receivedmap;
 
diff --git a/migration/ram.c b/migration/ram.c
index 57d1a4627e..a8b1aa2597 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2348,7 +2348,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
             dirty = test_bit(page, block->bmap);
             if (!dirty) {
                 trace_get_queued_page_not_dirty(block->idstr, (uint64_t)offset,
-                       page, test_bit(page, block->unsentmap));
+                       page, test_bit(page, block->bmap));
             } else {
                 trace_get_queued_page(block->idstr, (uint64_t)offset, page);
             }
@@ -2619,10 +2619,6 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
         }
 
         pages += tmppages;
-        if (pss->block->unsentmap) {
-            clear_bit(pss->page, pss->block->unsentmap);
-        }
-
         pss->page++;
     } while ((pss->page & (pagesize_bits - 1)) &&
              offset_in_ramblock(pss->block, pss->page << TARGET_PAGE_BITS));
@@ -2776,8 +2772,6 @@ static void ram_save_cleanup(void *opaque)
         block->clear_bmap = NULL;
         g_free(block->bmap);
         block->bmap = NULL;
-        g_free(block->unsentmap);
-        block->unsentmap = NULL;
     }
 
     xbzrle_cleanup();
@@ -2857,8 +2851,6 @@ void ram_postcopy_migrated_memory_release(MigrationState *ms)
  * Returns zero on success
  *
  * Callback from postcopy_each_ram_send_discard for each RAMBlock
- * Note: At this point the 'unsentmap' is the processed bitmap combined
- *       with the dirtymap; so a '1' means it's either dirty or unsent.
  *
  * @ms: current migration state
  * @block: RAMBlock to discard
@@ -2867,17 +2859,17 @@ static int postcopy_send_discard_bm_ram(MigrationState *ms, RAMBlock *block)
 {
     unsigned long end = block->used_length >> TARGET_PAGE_BITS;
     unsigned long current;
-    unsigned long *unsentmap = block->unsentmap;
+    unsigned long *bitmap = block->bmap;
 
     for (current = 0; current < end; ) {
-        unsigned long one = find_next_bit(unsentmap, end, current);
+        unsigned long one = find_next_bit(bitmap, end, current);
         unsigned long zero, discard_length;
 
         if (one >= end) {
             break;
         }
 
-        zero = find_next_zero_bit(unsentmap, end, one + 1);
+        zero = find_next_zero_bit(bitmap, end, one + 1);
 
         if (zero >= end) {
             discard_length = end - one;
@@ -2938,16 +2930,12 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
  * clean, not a mix.  This function canonicalizes the bitmaps.
  *
  * @ms: current migration state
- * @unsent_pass: if true we need to canonicalize partially unsent host pages
- *               otherwise we need to canonicalize partially dirty host pages
  * @block: block that contains the page we want to canonicalize
  */
-static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
-                                          RAMBlock *block)
+static void postcopy_chunk_hostpages_pass(MigrationState *ms, RAMBlock *block)
 {
     RAMState *rs = ram_state;
     unsigned long *bitmap = block->bmap;
-    unsigned long *unsentmap = block->unsentmap;
     unsigned int host_ratio = block->page_size / TARGET_PAGE_SIZE;
     unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
     unsigned long run_start;
@@ -2957,13 +2945,8 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
         return;
     }
 
-    if (unsent_pass) {
-        /* Find a sent page */
-        run_start = find_next_zero_bit(unsentmap, pages, 0);
-    } else {
-        /* Find a dirty page */
-        run_start = find_next_bit(bitmap, pages, 0);
-    }
+    /* Find a dirty page */
+    run_start = find_next_bit(bitmap, pages, 0);
 
     while (run_start < pages) {
 
@@ -2973,11 +2956,7 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
          */
         if (QEMU_IS_ALIGNED(run_start, host_ratio)) {
             /* Find the end of this run */
-            if (unsent_pass) {
-                run_start = find_next_bit(unsentmap, pages, run_start + 1);
-            } else {
-                run_start = find_next_zero_bit(bitmap, pages, run_start + 1);
-            }
+            run_start = find_next_zero_bit(bitmap, pages, run_start + 1);
             /*
              * If the end isn't at the start of a host page, then the
              * run doesn't finish at the end of a host page
@@ -2994,9 +2973,6 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
             /* Clean up the bitmap */
             for (page = fixup_start_addr;
                  page < fixup_start_addr + host_ratio; page++) {
-                /* All pages in this host page are now not sent */
-                set_bit(page, unsentmap);
-
                 /*
                  * Remark them as dirty, updating the count for any pages
                  * that weren't previously dirty.
@@ -3005,13 +2981,8 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
             }
         }
 
-        if (unsent_pass) {
-            /* Find the next sent page for the next iteration */
-            run_start = find_next_zero_bit(unsentmap, pages, run_start);
-        } else {
-            /* Find the next dirty page for the next iteration */
-            run_start = find_next_bit(bitmap, pages, run_start);
-        }
+        /* Find the next dirty page for the next iteration */
+        run_start = find_next_bit(bitmap, pages, run_start);
     }
 }
 
@@ -3033,13 +3004,10 @@ static int postcopy_chunk_hostpages(MigrationState *ms, RAMBlock *block)
 {
     postcopy_discard_send_init(ms, block->idstr);
 
-    /* First pass: Discard all partially sent host pages */
-    postcopy_chunk_hostpages_pass(ms, true, block);
     /*
-     * Second pass: Ensure that all partially dirty host pages are made
-     * fully dirty.
+     * Ensure that all partially dirty host pages are made fully dirty.
      */
-    postcopy_chunk_hostpages_pass(ms, false, block);
+    postcopy_chunk_hostpages_pass(ms, block);
 
     postcopy_discard_send_finish(ms);
     return 0;
@@ -3077,19 +3045,6 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
     rs->last_page = 0;
 
     RAMBLOCK_FOREACH_NOT_IGNORED(block) {
-        unsigned long pages = block->used_length >> TARGET_PAGE_BITS;
-        unsigned long *bitmap = block->bmap;
-        unsigned long *unsentmap = block->unsentmap;
-
-        if (!unsentmap) {
-            /* We don't have a safe way to resize the sentmap, so
-             * if the bitmap was resized it will be NULL at this
-             * point.
-             */
-            error_report("migration ram resized during precopy phase");
-            rcu_read_unlock();
-            return -EINVAL;
-        }
         /* Deal with TPS != HPS and huge pages */
         ret = postcopy_chunk_hostpages(ms, block);
         if (ret) {
@@ -3097,12 +3052,9 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
             return ret;
         }
 
-        /*
-         * Update the unsentmap to be unsentmap = unsentmap | dirty
-         */
-        bitmap_or(unsentmap, unsentmap, bitmap, pages);
 #ifdef DEBUG_POSTCOPY
-        ram_debug_dump_bitmap(unsentmap, true, pages);
+        ram_debug_dump_bitmap(block->bmap, true,
+                              block->used_length >> TARGET_PAGE_BITS);
 #endif
     }
     trace_ram_postcopy_send_discard_bitmap();
@@ -3270,10 +3222,6 @@ static void ram_list_init_bitmaps(void)
             bitmap_set(block->bmap, 0, pages);
             block->clear_bmap_shift = shift;
             block->clear_bmap = bitmap_new(clear_bmap_size(pages, shift));
-            if (migrate_postcopy_ram()) {
-                block->unsentmap = bitmap_new(pages);
-                bitmap_set(block->unsentmap, 0, pages);
-            }
         }
     }
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 4/9] migration: remove sent parameter in get_queued_page_not_dirty
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 3/9] migration/postcopy: unsentmap is not necessary for postcopy Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 5/9] migration/rdma: Don't moan about disconnects at the end Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: Wei Yang <richardw.yang@linux.intel.com>

This is a cleanup for previous removal of unsentmap.

The sent parameter is not necessary now.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190819061843.28642-4-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c        | 2 +-
 migration/trace-events | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a8b1aa2597..22423f08cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2348,7 +2348,7 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
             dirty = test_bit(page, block->bmap);
             if (!dirty) {
                 trace_get_queued_page_not_dirty(block->idstr, (uint64_t)offset,
-                       page, test_bit(page, block->bmap));
+                                                page);
             } else {
                 trace_get_queued_page(block->idstr, (uint64_t)offset, page);
             }
diff --git a/migration/trace-events b/migration/trace-events
index 00ffcd5930..858d415d56 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -76,7 +76,7 @@ qemu_file_fclose(void) ""
 
 # ram.c
 get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
-get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned long page_abs, int sent) "%s/0x%" PRIx64 " page_abs=0x%lx (sent=%d)"
+get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 5/9] migration/rdma: Don't moan about disconnects at the end
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 4/9] migration: remove sent parameter in get_queued_page_not_dirty Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 6/9] migration/rdma.c: Swap synchronize_rcu for call_rcu Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

If we've already finished the migration or something has
already gone wrong, don't moan about the migration stream disconnecting.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190913163507.1403-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 78e6b72bac..0fcf02f48e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3253,10 +3253,14 @@ static void rdma_cm_poll_handler(void *opaque)
 
     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
-        error_report("receive cm event, cm event is %d", cm_event->event);
-        rdma->error_state = -EPIPE;
-        if (rdma->return_path) {
-            rdma->return_path->error_state = -EPIPE;
+        if (!rdma->error_state &&
+            migration_incoming_get_current()->state !=
+              MIGRATION_STATUS_COMPLETED) {
+            error_report("receive cm event, cm event is %d", cm_event->event);
+            rdma->error_state = -EPIPE;
+            if (rdma->return_path) {
+                rdma->return_path->error_state = -EPIPE;
+            }
         }
 
         if (mis->migration_incoming_co) {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 6/9] migration/rdma.c: Swap synchronize_rcu for call_rcu
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 5/9] migration/rdma: Don't moan about disconnects at the end Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 7/9] tests/migration: Fail on unexpected migration states Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

This fixes a deadlock that can occur on the migration source after
a failed RDMA migration;  as the source tries to cleanup it
clears a pair of pointers and uses synchronize_rcu to wait; this
is happening on the main thread.  With the CPUs running
a CPU thread can be an rcu reader and attempt to grab the main lock
(kvm_handle_io->address_space_write->flatview_write->flatview_write_continue->
prepare_mmio_access->qemu_mutex_lock_iothread_impl)

Replace the synchronize_rcu with a call_rcu to postpone the freeing.

Fixes: 74637e6f08fceda98806 ("migration: implement bi-directional RDMA QIOChannel")

( https://bugzilla.redhat.com/show_bug.cgi?id=1746787 )

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190913163507.1403-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0fcf02f48e..4c74e88a37 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3017,11 +3017,35 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
     }
 }
 
+struct rdma_close_rcu {
+    struct rcu_head rcu;
+    RDMAContext *rdmain;
+    RDMAContext *rdmaout;
+};
+
+/* callback from qio_channel_rdma_close via call_rcu */
+static void qio_channel_rdma_close_rcu(struct rdma_close_rcu *rcu)
+{
+    if (rcu->rdmain) {
+        qemu_rdma_cleanup(rcu->rdmain);
+    }
+
+    if (rcu->rdmaout) {
+        qemu_rdma_cleanup(rcu->rdmaout);
+    }
+
+    g_free(rcu->rdmain);
+    g_free(rcu->rdmaout);
+    g_free(rcu);
+}
+
 static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     RDMAContext *rdmain, *rdmaout;
+    struct rdma_close_rcu *rcu = g_new(struct rdma_close_rcu, 1);
+
     trace_qemu_rdma_close();
 
     rdmain = rioc->rdmain;
@@ -3034,18 +3058,9 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
         atomic_rcu_set(&rioc->rdmaout, NULL);
     }
 
-    synchronize_rcu();
-
-    if (rdmain) {
-        qemu_rdma_cleanup(rdmain);
-    }
-
-    if (rdmaout) {
-        qemu_rdma_cleanup(rdmaout);
-    }
-
-    g_free(rdmain);
-    g_free(rdmaout);
+    rcu->rdmain = rdmain;
+    rcu->rdmaout = rdmaout;
+    call_rcu(rcu, qio_channel_rdma_close_rcu, rcu);
 
     return 0;
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 7/9] tests/migration: Fail on unexpected migration states
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 6/9] migration/rdma.c: Swap synchronize_rcu for call_rcu Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 8/9] tests/migration/postcopy: trim migration bandwidth Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We've got various places where we wait for a migration to enter
a given state; but if we enter an unexpected state we tend to fail
in odd ways; add a mechanism for explicitly testing for any state
which we shouldn't be in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190923131022.15498-2-dgilbert@redhat.com>
Reviewed-by: Cleber Rosa <crosa@redhat.com>
Tested-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/migration-test.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 258aa064d4..9c62ee5331 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -255,15 +255,19 @@ static void read_blocktime(QTestState *who)
 }
 
 static void wait_for_migration_status(QTestState *who,
-                                      const char *goal)
+                                      const char *goal,
+                                      const char **ungoals)
 {
     while (true) {
         bool completed;
         char *status;
+        const char **ungoal;
 
         status = migrate_query_status(who);
         completed = strcmp(status, goal) == 0;
-        g_assert_cmpstr(status, !=,  "failed");
+        for (ungoal = ungoals; *ungoal; ungoal++) {
+            g_assert_cmpstr(status, !=,  *ungoal);
+        }
         g_free(status);
         if (completed) {
             return;
@@ -274,7 +278,8 @@ static void wait_for_migration_status(QTestState *who,
 
 static void wait_for_migration_complete(QTestState *who)
 {
-    wait_for_migration_status(who, "completed");
+    wait_for_migration_status(who, "completed",
+                              (const char * []) { "failed", NULL });
 }
 
 static void wait_for_migration_pass(QTestState *who)
@@ -809,7 +814,9 @@ static void test_postcopy_recovery(void)
      * Wait until postcopy is really started; we can only run the
      * migrate-pause command during a postcopy
      */
-    wait_for_migration_status(from, "postcopy-active");
+    wait_for_migration_status(from, "postcopy-active",
+                              (const char * []) { "failed",
+                                                  "completed", NULL });
 
     /*
      * Manually stop the postcopy migration. This emulates a network
@@ -822,7 +829,9 @@ static void test_postcopy_recovery(void)
      * migrate-recover command can only succeed if destination machine
      * is in the paused state
      */
-    wait_for_migration_status(to, "postcopy-paused");
+    wait_for_migration_status(to, "postcopy-paused",
+                              (const char * []) { "failed", "active",
+                                                  "completed", NULL });
 
     /*
      * Create a new socket to emulate a new channel that is different
@@ -836,7 +845,9 @@ static void test_postcopy_recovery(void)
      * Try to rebuild the migration channel using the resume flag and
      * the newly created channel
      */
-    wait_for_migration_status(from, "postcopy-paused");
+    wait_for_migration_status(from, "postcopy-paused",
+                              (const char * []) { "failed", "active",
+                                                  "completed", NULL });
     migrate(from, uri, "{'resume': true}");
     g_free(uri);
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 8/9] tests/migration/postcopy: trim migration bandwidth
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 7/9] tests/migration: Fail on unexpected migration states Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-25 15:01 ` [PULL 9/9] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
  2019-09-26 15:13 ` [PULL 0/9] migration queue Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On slow hosts with tcg we were sometimes finding that the migration
would complete during precopy and never get into the postcopy test.
Trim back the bandwidth a bit to make that much less likely.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190923131022.15498-3-dgilbert@redhat.com>
Reviewed-by: Cleber Rosa <crosa@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Cleber Rosa <crosa@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/migration-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 9c62ee5331..221a33d083 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -753,7 +753,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
      * quickly, but that it doesn't complete precopy even on a slow
      * machine, so also set the downtime.
      */
-    migrate_set_parameter_int(from, "max-bandwidth", 100000000);
+    migrate_set_parameter_int(from, "max-bandwidth", 30000000);
     migrate_set_parameter_int(from, "downtime-limit", 1);
 
     /* Wait for the first serial output from the source */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 9/9] migration/postcopy: Recognise the recovery states as 'in_postcopy'
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 8/9] tests/migration/postcopy: trim migration bandwidth Dr. David Alan Gilbert (git)
@ 2019-09-25 15:01 ` Dr. David Alan Gilbert (git)
  2019-09-26 15:13 ` [PULL 0/9] migration queue Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-09-25 15:01 UTC (permalink / raw)
  To: qemu-devel, quintela, peterx, marcandre.lureau, richardw.yang,
	alex.benee

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Various parts of the migration code do different things when they're
in postcopy mode; prior to this patch this has been 'postcopy-active'.
This patch extends 'in_postcopy' to include 'postcopy-paused' and
'postcopy-recover'.

In particular, when you set the max-postcopy-bandwidth parameter, this
only affects the current migration fd if we're 'in_postcopy';
this leads to a race in the postcopy recovery test where it increases
the speed from 4k/sec to unlimited, but that increase can get ignored
if the change is made between the point at which the reconnection
happens and it transitions back to active.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190923174942.12182-1-dgilbert@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 01863a95f5..5f7e4d15e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
 {
     MigrationState *s = migrate_get_current();
 
-    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
+        return true;
+    default:
+        return false;
+    }
 }
 
 bool migration_in_postcopy_after_devices(MigrationState *s)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PULL 0/9] migration queue
  2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2019-09-25 15:01 ` [PULL 9/9] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
@ 2019-09-26 15:13 ` Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2019-09-26 15:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: Juan Quintela, QEMU Developers, Peter Xu, alex.benee, Wei Yang,
	Marc-André Lureau

On Wed, 25 Sep 2019 at 16:06, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The following changes since commit 240ab11fb72049d6373cbbec8d788f8e411a00bc:
>
>   Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190924' into staging (2019-09-24 15:36:31 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/dagrh/qemu.git tags/pull-migration-20190925a
>
> for you to fetch changes up to 3748fef9b95a9bc1602f3c4ed2a329d8ef47e63c:
>
>   migration/postcopy: Recognise the recovery states as 'in_postcopy' (2019-09-25 15:51:19 +0100)
>
> ----------------------------------------------------------------
> Migration pull 2019-09-25
>
>   me: test fixes from (should stop hangs in postcopy tests).
>   me: An RDMA cleanup hang fix
>   Wei: Tidy ups around postcopy
>   Marc-Andre: mem leak fix
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-26 15:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25 15:01 [PULL 0/9] migration queue Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 1/9] migration: fix vmdesc leak on vmstate_save() error Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 2/9] migration/postcopy: not necessary to do discard when canonicalizing bitmap Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 3/9] migration/postcopy: unsentmap is not necessary for postcopy Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 4/9] migration: remove sent parameter in get_queued_page_not_dirty Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 5/9] migration/rdma: Don't moan about disconnects at the end Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 6/9] migration/rdma.c: Swap synchronize_rcu for call_rcu Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 7/9] tests/migration: Fail on unexpected migration states Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 8/9] tests/migration/postcopy: trim migration bandwidth Dr. David Alan Gilbert (git)
2019-09-25 15:01 ` [PULL 9/9] migration/postcopy: Recognise the recovery states as 'in_postcopy' Dr. David Alan Gilbert (git)
2019-09-26 15:13 ` [PULL 0/9] migration queue Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.