All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
@ 2018-07-05  3:17 Peter Xu
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
                   ` (10 more replies)
  0 siblings, 11 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

Based-on: <20180627132246.5576-1-peterx@redhat.com>

Based on the series to unbreak postcopy:
  Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
  Message-Id: <20180627132246.5576-1-peterx@redhat.com>

This series introduce a new postcopy recovery test.  The new test
actually helped me to identify two bugs there so fix them as well
before 3.0 release.

Patch 1: a trivial cleanup for existing postcopy ram load, which I
         found a bit confusing during debugging the problem.

Patch 2-3: two bug fixes that address different issues.  Please see
           the commit log for more information.

Patch 4-9: add the postcopy recovery unit test.

Please review.  Thanks,

Peter Xu (9):
  migration: simplify check to use qemu file buffer
  migration: loosen recovery check when load vm
  migration: fix incorrect bitmap size calculation
  tests: introduce migrate_postcopy_* helpers
  tests: allow migrate() to take extra flags
  tests: introduce migrate_query*() helpers
  tests: introduce wait_for_migration_status()
  tests: add postcopy recovery test
  tests: hide stderr for postcopy recovery test

 migration/ram.c        |  21 +++--
 migration/savevm.c     |  16 ++--
 tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
 3 files changed, 176 insertions(+), 59 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05  9:01   ` Dr. David Alan Gilbert
  2018-07-05 12:59   ` Juan Quintela
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

Firstly, renaming the old matching_page_sizes variable to
matching_target_page_size, which suites more to what it did (it only
checks against target page size rather than multiple page sizes).
Meanwhile, simplify the check logic a bit, and enhance the comments.
Should have no functional change.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 23cea47090..fbeb23f750 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3580,7 +3580,7 @@ static int ram_load_postcopy(QEMUFile *f)
 {
     int flags = 0, ret = 0;
     bool place_needed = false;
-    bool matching_page_sizes = false;
+    bool matching_target_page_size = false;
     MigrationIncomingState *mis = migration_incoming_get_current();
     /* Temporary page that is later 'placed' */
     void *postcopy_host_page = postcopy_get_tmp_page(mis);
@@ -3620,7 +3620,7 @@ static int ram_load_postcopy(QEMUFile *f)
                 ret = -EINVAL;
                 break;
             }
-            matching_page_sizes = block->page_size == TARGET_PAGE_SIZE;
+            matching_target_page_size = block->page_size == TARGET_PAGE_SIZE;
             /*
              * Postcopy requires that we place whole host pages atomically;
              * these may be huge pages for RAMBlocks that are backed by
@@ -3668,12 +3668,17 @@ static int ram_load_postcopy(QEMUFile *f)
 
         case RAM_SAVE_FLAG_PAGE:
             all_zero = false;
-            if (!place_needed || !matching_page_sizes) {
+            if (!matching_target_page_size) {
+                /* For huge pages, we always use temporary buffer */
                 qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
             } else {
-                /* Avoids the qemu_file copy during postcopy, which is
-                 * going to do a copy later; can only do it when we
-                 * do this read in one go (matching page sizes)
+                /*
+                 * For small pages that matches target page size, we
+                 * avoid the qemu_file copy.  Instead we directly use
+                 * the buffer of QEMUFile to place the page.  Note: we
+                 * cannot do any QEMUFile operation before using that
+                 * buffer to make sure the buffer is valid when
+                 * placing the page.
                  */
                 qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
                                          TARGET_PAGE_SIZE);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05  9:15   ` Dr. David Alan Gilbert
  2018-07-05 13:01   ` Juan Quintela
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

We were checking against -EIO, assuming that it will cover all IO
failures.  But actually it is not.  One example is that in
qemu_loadvm_section_start_full() we can have tons of places that will
return -EINVAL even if the error is caused by IO failures on the
network.

Let's loosen the recovery check logic here to cover all the error cases
happened by removing the explicit check against -EIO.  After all we
won't lose anything here if any other failure happened.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 851d74e8b6..efcc795071 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2276,18 +2276,14 @@ out:
         qemu_file_set_error(f, ret);
 
         /*
-         * Detect whether it is:
-         *
-         * 1. postcopy running (after receiving all device data, which
-         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
-         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
-         *    still receiving device states).
-         * 2. network failure (-EIO)
-         *
-         * If so, we try to wait for a recovery.
+         * If we are during an active postcopy, then we pause instead
+         * of bail out to at least keep the VM's dirty data.  Note
+         * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
+         * during which we're still receiving device states and we
+         * still haven't yet started the VM on destination.
          */
         if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-            ret == -EIO && postcopy_pause_incoming(mis)) {
+            postcopy_pause_incoming(mis)) {
             /* Reset f to point to the newly created channel */
             f = mis->from_src_file;
             goto retry;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05  9:38   ` Dr. David Alan Gilbert
  2018-07-05 13:01   ` Juan Quintela
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

The calculation on size of received bitmap is incorrect for postcopy
recovery.  Here we wanted to let the size to cover all the valid bits in
the bitmap, we should use DIV_ROUND_UP() instead of a division.

For example, a RAMBlock with size=4K (which contains only one single 4K
page) will have nbits=1, then nbits/8=0, then the real bitmap won't be
sent to source at all.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index fbeb23f750..203c691ded 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -235,7 +235,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
     bitmap_to_le(le_bitmap, block->receivedmap, nbits);
 
     /* Size of the bitmap, in bytes */
-    size = nbits / 8;
+    size = DIV_ROUND_UP(nbits, 8);
 
     /*
      * size is always aligned to 8 bytes for 64bit machines, but it
@@ -3944,7 +3944,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     int ret = -EINVAL;
     QEMUFile *file = s->rp_state.from_dst_file;
     unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
-    uint64_t local_size = nbits / 8;
+    uint64_t local_size = DIV_ROUND_UP(nbits, 8);
     uint64_t size, end_mark;
 
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (2 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05  9:31   ` Balamuruhan S
                     ` (2 more replies)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
                   ` (6 subsequent siblings)
  10 siblings, 3 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

Separate the old postcopy UNIX socket test into three steps, provide a
helper for each step.  With these helpers, we can do more compliated
tests like postcopy recovery, while keep the codes shared.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 3a85446f95..2155869b96 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
     qobject_unref(rsp);
 }
 
-static void migrate_start_postcopy(QTestState *who)
+static void migrate_postcopy_start(QTestState *from, QTestState *to)
 {
     QDict *rsp;
 
-    rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
+    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
     g_assert(qdict_haskey(rsp, "return"));
     qobject_unref(rsp);
+
+    if (!got_stop) {
+        qtest_qmp_eventwait(from, "STOP");
+    }
+
+    qtest_qmp_eventwait(to, "RESUME");
 }
 
 static void test_migrate_start(QTestState **from, QTestState **to,
@@ -505,7 +511,8 @@ static void test_deprecated(void)
     qtest_quit(from);
 }
 
-static void test_postcopy(void)
+static void migrate_postcopy_prepare(QTestState **from_ptr,
+                                     QTestState **to_ptr)
 {
     char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
     QTestState *from, *to;
@@ -527,28 +534,37 @@ static void test_postcopy(void)
     wait_for_serial("src_serial");
 
     migrate(from, uri);
+    g_free(uri);
 
     wait_for_migration_pass(from);
 
-    migrate_start_postcopy(from);
-
-    if (!got_stop) {
-        qtest_qmp_eventwait(from, "STOP");
-    }
+    *from_ptr = from;
+    *to_ptr = to;
+}
 
-    qtest_qmp_eventwait(to, "RESUME");
+static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+{
+    wait_for_migration_complete(from);
 
+    /* Make sure we get at least one "B" on destination */
     wait_for_serial("dest_serial");
-    wait_for_migration_complete(from);
 
     if (uffd_feature_thread_id) {
         read_blocktime(to);
     }
-    g_free(uri);
 
     test_migrate_end(from, to, true);
 }
 
+static void test_postcopy(void)
+{
+    QTestState *from, *to;
+
+    migrate_postcopy_prepare(&from, &to);
+    migrate_postcopy_start(from, to);
+    migrate_postcopy_complete(from, to);
+}
+
 static void test_baddest(void)
 {
     QTestState *from, *to;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (3 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05 10:18   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

For example, we can pass in '"resume": true' to resume a migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 2155869b96..af82a04789 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -337,14 +337,14 @@ static void migrate_set_capability(QTestState *who, const char *capability,
     qobject_unref(rsp);
 }
 
-static void migrate(QTestState *who, const char *uri)
+static void migrate(QTestState *who, const char *uri, const char *extra)
 {
     QDict *rsp;
     gchar *cmd;
 
     cmd = g_strdup_printf("{ 'execute': 'migrate',"
-                          "'arguments': { 'uri': '%s' } }",
-                          uri);
+                          "  'arguments': { 'uri': '%s' %s } }",
+                          uri, extra ? extra : "");
     rsp = qtest_qmp(who, cmd);
     g_free(cmd);
     g_assert(qdict_haskey(rsp, "return"));
@@ -533,7 +533,7 @@ static void migrate_postcopy_prepare(QTestState **from_ptr,
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
-    migrate(from, uri);
+    migrate(from, uri, NULL);
     g_free(uri);
 
     wait_for_migration_pass(from);
@@ -573,7 +573,7 @@ static void test_baddest(void)
     bool failed;
 
     test_migrate_start(&from, &to, "tcp:0:0", true);
-    migrate(from, "tcp:0:0");
+    migrate(from, "tcp:0:0", NULL);
     do {
         rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
         rsp_return = qdict_get_qdict(rsp, "return");
@@ -615,7 +615,7 @@ static void test_precopy_unix(void)
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
-    migrate(from, uri);
+    migrate(from, uri, NULL);
 
     wait_for_migration_pass(from);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (4 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05 10:23   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

Introduce helpers to query migration states and use it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 64 ++++++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index af82a04789..1d85ccbef1 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -168,6 +168,37 @@ static QDict *wait_command(QTestState *who, const char *command)
     return response;
 }
 
+/*
+ * Note: caller is responsible to free the returned object via
+ * qobject_unref() after use
+ */
+static QDict *migrate_query(QTestState *who)
+{
+    QDict *rsp, *rsp_return;
+
+    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
+    rsp_return = qdict_get_qdict(rsp, "return");
+    g_assert(rsp_return);
+    qobject_ref(rsp_return);
+    qobject_unref(rsp);
+
+    return rsp_return;
+}
+
+/*
+ * Note: caller is responsible to free the returned object via
+ * g_free() after use
+ */
+static gchar *migrate_query_status(QTestState *who)
+{
+    QDict *rsp_return = migrate_query(who);
+    gchar *status = g_strdup(qdict_get_str(rsp_return, "status"));
+
+    g_assert(status);
+    qobject_unref(rsp_return);
+
+    return status;
+}
 
 /*
  * It's tricky to use qemu's migration event capability with qtest,
@@ -176,11 +207,10 @@ static QDict *wait_command(QTestState *who, const char *command)
 
 static uint64_t get_migration_pass(QTestState *who)
 {
-    QDict *rsp, *rsp_return, *rsp_ram;
+    QDict *rsp_return, *rsp_ram;
     uint64_t result;
 
-    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
-    rsp_return = qdict_get_qdict(rsp, "return");
+    rsp_return = migrate_query(who);
     if (!qdict_haskey(rsp_return, "ram")) {
         /* Still in setup */
         result = 0;
@@ -188,33 +218,29 @@ static uint64_t get_migration_pass(QTestState *who)
         rsp_ram = qdict_get_qdict(rsp_return, "ram");
         result = qdict_get_try_int(rsp_ram, "dirty-sync-count", 0);
     }
-    qobject_unref(rsp);
+    qobject_unref(rsp_return);
     return result;
 }
 
 static void read_blocktime(QTestState *who)
 {
-    QDict *rsp, *rsp_return;
+    QDict *rsp_return;
 
-    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
-    rsp_return = qdict_get_qdict(rsp, "return");
+    rsp_return = migrate_query(who);
     g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
-    qobject_unref(rsp);
+    qobject_unref(rsp_return);
 }
 
 static void wait_for_migration_complete(QTestState *who)
 {
     while (true) {
-        QDict *rsp, *rsp_return;
         bool completed;
-        const char *status;
+        char *status;
 
-        rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
-        rsp_return = qdict_get_qdict(rsp, "return");
-        status = qdict_get_str(rsp_return, "status");
+        status = migrate_query_status(who);
         completed = strcmp(status, "completed") == 0;
         g_assert_cmpstr(status, !=,  "failed");
-        qobject_unref(rsp);
+        g_free(status);
         if (completed) {
             return;
         }
@@ -569,20 +595,16 @@ static void test_baddest(void)
 {
     QTestState *from, *to;
     QDict *rsp, *rsp_return;
-    const char *status;
+    char *status;
     bool failed;
 
     test_migrate_start(&from, &to, "tcp:0:0", true);
     migrate(from, "tcp:0:0", NULL);
     do {
-        rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
-        rsp_return = qdict_get_qdict(rsp, "return");
-
-        status = qdict_get_str(rsp_return, "status");
-
+        status = migrate_query_status(from);
         g_assert(!strcmp(status, "setup") || !(strcmp(status, "failed")));
         failed = !strcmp(status, "failed");
-        qobject_unref(rsp);
+        g_free(status);
     } while (!failed);
 
     /* Is the machine currently running? */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status()
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (5 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05 10:27   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

It's generalized from wait_for_migration_complete() to allow us to wait
for any migration status besides failure.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 1d85ccbef1..761bf62ffe 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -231,14 +231,15 @@ static void read_blocktime(QTestState *who)
     qobject_unref(rsp_return);
 }
 
-static void wait_for_migration_complete(QTestState *who)
+static void wait_for_migration_status(QTestState *who,
+                                      const char *goal)
 {
     while (true) {
         bool completed;
         char *status;
 
         status = migrate_query_status(who);
-        completed = strcmp(status, "completed") == 0;
+        completed = strcmp(status, goal) == 0;
         g_assert_cmpstr(status, !=,  "failed");
         g_free(status);
         if (completed) {
@@ -248,6 +249,11 @@ static void wait_for_migration_complete(QTestState *who)
     }
 }
 
+static void wait_for_migration_complete(QTestState *who)
+{
+    wait_for_migration_status(who, "completed");
+}
+
 static void wait_for_migration_pass(QTestState *who)
 {
     uint64_t initial_pass = get_migration_pass(who);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (6 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05 10:30   ` Dr. David Alan Gilbert
  2018-07-05 13:08   ` Juan Quintela
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

Test the postcopy recovery procedure by emulating a network failure
using migrate-pause command.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 761bf62ffe..182fc88e93 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -352,6 +352,29 @@ static void migrate_set_parameter(QTestState *who, const char *parameter,
     migrate_check_parameter(who, parameter, value);
 }
 
+static void migrate_pause(QTestState *who)
+{
+    QDict *rsp;
+
+    rsp = wait_command(who, "{ 'execute': 'migrate-pause' }");
+    g_assert(qdict_haskey(rsp, "return"));
+    qobject_unref(rsp);
+}
+
+static void migrate_recover(QTestState *who, const char *uri)
+{
+    QDict *rsp;
+    gchar *cmd = g_strdup_printf(
+        "{ 'execute': 'migrate-recover', "
+        "  'id': 'recover-cmd', "
+        "  'arguments': { 'uri': '%s' } }", uri);
+
+    rsp = wait_command(who, cmd);
+    g_assert(qdict_haskey(rsp, "return"));
+    g_free(cmd);
+    qobject_unref(rsp);
+}
+
 static void migrate_set_capability(QTestState *who, const char *capability,
                                    const char *value)
 {
@@ -597,6 +620,53 @@ static void test_postcopy(void)
     migrate_postcopy_complete(from, to);
 }
 
+static void test_postcopy_recovery(void)
+{
+    QTestState *from, *to;
+    char *uri;
+
+    migrate_postcopy_prepare(&from, &to);
+
+    /* Turn postcopy speed down, 4K/s is slow enough on any machines */
+    migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
+
+    /* Now we start the postcopy */
+    migrate_postcopy_start(from, to);
+
+    /*
+     * Wait until postcopy is really started; we can only run the
+     * migrate-pause command during a postcopy
+     */
+    wait_for_migration_status(from, "postcopy-active");
+
+    /*
+     * Manually stop the postcopy migration. This emulates a network
+     * failure with the migration socket
+     */
+    migrate_pause(from);
+
+    /*
+     * Create a new socket to emulate a new channel that is different
+     * from the broken migration channel; tell the destination to
+     * listen to the new port
+     */
+    uri = g_strdup_printf("unix:%s/migsocket-recover", tmpfs);
+    migrate_recover(to, uri);
+
+    /*
+     * Try to rebuild the migration channel using the resume flag and
+     * the newly created channel
+     */
+    wait_for_migration_status(from, "postcopy-paused");
+    migrate(from, uri, ", 'resume': true");
+    g_free(uri);
+
+    /* Restore the postcopy bandwidth to unlimited */
+    migrate_set_parameter(from, "max-postcopy-bandwidth", "0");
+
+    migrate_postcopy_complete(from, to);
+}
+
 static void test_baddest(void)
 {
     QTestState *from, *to;
@@ -683,6 +753,7 @@ int main(int argc, char **argv)
     module_call_init(MODULE_INIT_QOM);
 
     qtest_add_func("/migration/postcopy/unix", test_postcopy);
+    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
     qtest_add_func("/migration/deprecated", test_deprecated);
     qtest_add_func("/migration/bad_dest", test_baddest);
     qtest_add_func("/migration/precopy/unix", test_precopy_unix);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for postcopy recovery test
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (7 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
@ 2018-07-05  3:17 ` Peter Xu
  2018-07-05 10:36   ` Dr. David Alan Gilbert
  2018-07-05 13:09   ` Juan Quintela
  2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
  2018-07-10  1:56 ` Balamuruhan S
  10 siblings, 2 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx

We dumped something when network failure happens.  We should avoid those
messages to be dumped when running the tests:

  $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
  /x86_64/migration/postcopy/recovery: qemu-system-x86_64: check_section_footer: Read section footer failed: -5
  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
  OK

After the patch:

  $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
  /x86_64/migration/postcopy/recovery: OK

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/migration-test.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 182fc88e93..96e69dab99 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -567,12 +567,13 @@ static void test_deprecated(void)
 }
 
 static void migrate_postcopy_prepare(QTestState **from_ptr,
-                                     QTestState **to_ptr)
+                                     QTestState **to_ptr,
+                                     bool hide_error)
 {
     char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
     QTestState *from, *to;
 
-    test_migrate_start(&from, &to, uri, false);
+    test_migrate_start(&from, &to, uri, hide_error);
 
     migrate_set_capability(from, "postcopy-ram", "true");
     migrate_set_capability(to, "postcopy-ram", "true");
@@ -615,7 +616,7 @@ static void test_postcopy(void)
 {
     QTestState *from, *to;
 
-    migrate_postcopy_prepare(&from, &to);
+    migrate_postcopy_prepare(&from, &to, false);
     migrate_postcopy_start(from, to);
     migrate_postcopy_complete(from, to);
 }
@@ -625,7 +626,7 @@ static void test_postcopy_recovery(void)
     QTestState *from, *to;
     char *uri;
 
-    migrate_postcopy_prepare(&from, &to);
+    migrate_postcopy_prepare(&from, &to, true);
 
     /* Turn postcopy speed down, 4K/s is slow enough on any machines */
     migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
@ 2018-07-05  9:01   ` Dr. David Alan Gilbert
  2018-07-05  9:11     ` Peter Xu
  2018-07-05 12:59   ` Juan Quintela
  1 sibling, 1 reply; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05  9:01 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Firstly, renaming the old matching_page_sizes variable to
> matching_target_page_size, which suites more to what it did (it only
> checks against target page size rather than multiple page sizes).
> Meanwhile, simplify the check logic a bit, and enhance the comments.
> Should have no functional change.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 23cea47090..fbeb23f750 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3580,7 +3580,7 @@ static int ram_load_postcopy(QEMUFile *f)
>  {
>      int flags = 0, ret = 0;
>      bool place_needed = false;
> -    bool matching_page_sizes = false;
> +    bool matching_target_page_size = false;
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      /* Temporary page that is later 'placed' */
>      void *postcopy_host_page = postcopy_get_tmp_page(mis);
> @@ -3620,7 +3620,7 @@ static int ram_load_postcopy(QEMUFile *f)
>                  ret = -EINVAL;
>                  break;
>              }
> -            matching_page_sizes = block->page_size == TARGET_PAGE_SIZE;
> +            matching_target_page_size = block->page_size == TARGET_PAGE_SIZE;

That's OK, although 'matches_target_page_size' would be better wording.
('matching_target_page_size' implies we know something about the sources
target page size)

>              /*
>               * Postcopy requires that we place whole host pages atomically;
>               * these may be huge pages for RAMBlocks that are backed by
> @@ -3668,12 +3668,17 @@ static int ram_load_postcopy(QEMUFile *f)
>  
>          case RAM_SAVE_FLAG_PAGE:
>              all_zero = false;
> -            if (!place_needed || !matching_page_sizes) {
> +            if (!matching_target_page_size) {
> +                /* For huge pages, we always use temporary buffer */
>                  qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
>              } else {
> -                /* Avoids the qemu_file copy during postcopy, which is
> -                 * going to do a copy later; can only do it when we
> -                 * do this read in one go (matching page sizes)
> +                /*
> +                 * For small pages that matches target page size, we
> +                 * avoid the qemu_file copy.  Instead we directly use
> +                 * the buffer of QEMUFile to place the page.  Note: we
> +                 * cannot do any QEMUFile operation before using that
> +                 * buffer to make sure the buffer is valid when
> +                 * placing the page.
>                   */
>                  qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
>                                           TARGET_PAGE_SIZE);

But, yes I think that's right; since this is in ram_load_postcopy
the only time we have !place_needed is in the not-last subpage of a 
hugepage (or host page > target_page), so that's already covered
by the !matching_page_sizes check.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer
  2018-07-05  9:01   ` Dr. David Alan Gilbert
@ 2018-07-05  9:11     ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela

On Thu, Jul 05, 2018 at 10:01:09AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Firstly, renaming the old matching_page_sizes variable to
> > matching_target_page_size, which suites more to what it did (it only
> > checks against target page size rather than multiple page sizes).
> > Meanwhile, simplify the check logic a bit, and enhance the comments.
> > Should have no functional change.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/ram.c | 17 +++++++++++------
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 23cea47090..fbeb23f750 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3580,7 +3580,7 @@ static int ram_load_postcopy(QEMUFile *f)
> >  {
> >      int flags = 0, ret = 0;
> >      bool place_needed = false;
> > -    bool matching_page_sizes = false;
> > +    bool matching_target_page_size = false;
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> >      /* Temporary page that is later 'placed' */
> >      void *postcopy_host_page = postcopy_get_tmp_page(mis);
> > @@ -3620,7 +3620,7 @@ static int ram_load_postcopy(QEMUFile *f)
> >                  ret = -EINVAL;
> >                  break;
> >              }
> > -            matching_page_sizes = block->page_size == TARGET_PAGE_SIZE;
> > +            matching_target_page_size = block->page_size == TARGET_PAGE_SIZE;
> 
> That's OK, although 'matches_target_page_size' would be better wording.
> ('matching_target_page_size' implies we know something about the sources
> target page size)

Sure thing.

> 
> >              /*
> >               * Postcopy requires that we place whole host pages atomically;
> >               * these may be huge pages for RAMBlocks that are backed by
> > @@ -3668,12 +3668,17 @@ static int ram_load_postcopy(QEMUFile *f)
> >  
> >          case RAM_SAVE_FLAG_PAGE:
> >              all_zero = false;
> > -            if (!place_needed || !matching_page_sizes) {
> > +            if (!matching_target_page_size) {
> > +                /* For huge pages, we always use temporary buffer */
> >                  qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
> >              } else {
> > -                /* Avoids the qemu_file copy during postcopy, which is
> > -                 * going to do a copy later; can only do it when we
> > -                 * do this read in one go (matching page sizes)
> > +                /*
> > +                 * For small pages that matches target page size, we
> > +                 * avoid the qemu_file copy.  Instead we directly use
> > +                 * the buffer of QEMUFile to place the page.  Note: we
> > +                 * cannot do any QEMUFile operation before using that
> > +                 * buffer to make sure the buffer is valid when
> > +                 * placing the page.
> >                   */
> >                  qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
> >                                           TARGET_PAGE_SIZE);
> 
> But, yes I think that's right; since this is in ram_load_postcopy
> the only time we have !place_needed is in the not-last subpage of a 
> hugepage (or host page > target_page), so that's already covered
> by the !matching_page_sizes check.
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
@ 2018-07-05  9:15   ` Dr. David Alan Gilbert
  2018-07-05  9:31     ` Peter Xu
  2018-07-05 13:01   ` Juan Quintela
  1 sibling, 1 reply; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05  9:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We were checking against -EIO, assuming that it will cover all IO
> failures.  But actually it is not.  One example is that in
> qemu_loadvm_section_start_full() we can have tons of places that will
> return -EINVAL even if the error is caused by IO failures on the
> network.

I suspect we should fix those; but OK; I think the only cases
in there that could hit that are the get_counted_string and the
check_section_footer; they could both be fixed to return an errno
like the other cases.

> Let's loosen the recovery check logic here to cover all the error cases
> happened by removing the explicit check against -EIO.  After all we
> won't lose anything here if any other failure happened.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/savevm.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 851d74e8b6..efcc795071 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2276,18 +2276,14 @@ out:
>          qemu_file_set_error(f, ret);
>  
>          /*
> -         * Detect whether it is:
> -         *
> -         * 1. postcopy running (after receiving all device data, which
> -         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
> -         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
> -         *    still receiving device states).
> -         * 2. network failure (-EIO)
> -         *
> -         * If so, we try to wait for a recovery.
> +         * If we are during an active postcopy, then we pause instead
> +         * of bail out to at least keep the VM's dirty data.  Note
> +         * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
> +         * during which we're still receiving device states and we
> +         * still haven't yet started the VM on destination.
>           */
>          if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> -            ret == -EIO && postcopy_pause_incoming(mis)) {
> +            postcopy_pause_incoming(mis)) {

Hmm, OK; It does make me a bit nervous we might trigger this
too often, but I see the other cases where we're missing stuff
we should trigger it.  We might find we need to go the other way
and blacklist some errors.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>              /* Reset f to point to the newly created channel */
>              f = mis->from_src_file;
>              goto retry;
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm
  2018-07-05  9:15   ` Dr. David Alan Gilbert
@ 2018-07-05  9:31     ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-05  9:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela

On Thu, Jul 05, 2018 at 10:15:47AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > We were checking against -EIO, assuming that it will cover all IO
> > failures.  But actually it is not.  One example is that in
> > qemu_loadvm_section_start_full() we can have tons of places that will
> > return -EINVAL even if the error is caused by IO failures on the
> > network.
> 
> I suspect we should fix those; but OK; I think the only cases
> in there that could hit that are the get_counted_string and the
> check_section_footer; they could both be fixed to return an errno
> like the other cases.
> 
> > Let's loosen the recovery check logic here to cover all the error cases
> > happened by removing the explicit check against -EIO.  After all we
> > won't lose anything here if any other failure happened.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/savevm.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 851d74e8b6..efcc795071 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2276,18 +2276,14 @@ out:
> >          qemu_file_set_error(f, ret);
> >  
> >          /*
> > -         * Detect whether it is:
> > -         *
> > -         * 1. postcopy running (after receiving all device data, which
> > -         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
> > -         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
> > -         *    still receiving device states).
> > -         * 2. network failure (-EIO)
> > -         *
> > -         * If so, we try to wait for a recovery.
> > +         * If we are during an active postcopy, then we pause instead
> > +         * of bail out to at least keep the VM's dirty data.  Note
> > +         * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
> > +         * during which we're still receiving device states and we
> > +         * still haven't yet started the VM on destination.
> >           */
> >          if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> > -            ret == -EIO && postcopy_pause_incoming(mis)) {
> > +            postcopy_pause_incoming(mis)) {
> 
> Hmm, OK; It does make me a bit nervous we might trigger this
> too often, but I see the other cases where we're missing stuff
> we should trigger it.  We might find we need to go the other way
> and blacklist some errors.

AFAIU the difference will be only that we pause instead of quit QEMU.
IMHO the whitelist/blacklist will be more meaningful when there is a
third choice for us (besides "quit" and "go into pause state").

> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
@ 2018-07-05  9:31   ` Balamuruhan S
  2018-07-06  2:19     ` Peter Xu
  2018-07-05  9:59   ` Dr. David Alan Gilbert
  2018-07-05 13:03   ` Juan Quintela
  2 siblings, 1 reply; 46+ messages in thread
From: Balamuruhan S @ 2018-07-05  9:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 11:17:50AM +0800, Peter Xu wrote:
> Separate the old postcopy UNIX socket test into three steps, provide a
> helper for each step.  With these helpers, we can do more compliated
> tests like postcopy recovery, while keep the codes shared.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 3a85446f95..2155869b96 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
>      qobject_unref(rsp);
>  }
> 
> -static void migrate_start_postcopy(QTestState *who)
> +static void migrate_postcopy_start(QTestState *from, QTestState *to)
>  {
>      QDict *rsp;
> 
> -    rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
> +    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
>      g_assert(qdict_haskey(rsp, "return"));
>      qobject_unref(rsp);
> +
> +    if (!got_stop) {
> +        qtest_qmp_eventwait(from, "STOP");
> +    }
> +
> +    qtest_qmp_eventwait(to, "RESUME");
>  }
> 
>  static void test_migrate_start(QTestState **from, QTestState **to,
> @@ -505,7 +511,8 @@ static void test_deprecated(void)
>      qtest_quit(from);
>  }
> 
> -static void test_postcopy(void)
> +static void migrate_postcopy_prepare(QTestState **from_ptr,
> +                                     QTestState **to_ptr)

if we have uri as one of the configurable argument, it would be still
better, so that in future we can call migrate_postcopy_prepare() with
any different uri.

-- Bala

>  {
>      char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>      QTestState *from, *to;
> @@ -527,28 +534,37 @@ static void test_postcopy(void)
>      wait_for_serial("src_serial");
> 
>      migrate(from, uri);
> +    g_free(uri);
> 
>      wait_for_migration_pass(from);
> 
> -    migrate_start_postcopy(from);
> -
> -    if (!got_stop) {
> -        qtest_qmp_eventwait(from, "STOP");
> -    }
> +    *from_ptr = from;
> +    *to_ptr = to;
> +}
> 
> -    qtest_qmp_eventwait(to, "RESUME");
> +static void migrate_postcopy_complete(QTestState *from, QTestState *to)
> +{
> +    wait_for_migration_complete(from);
> 
> +    /* Make sure we get at least one "B" on destination */
>      wait_for_serial("dest_serial");
> -    wait_for_migration_complete(from);
> 
>      if (uffd_feature_thread_id) {
>          read_blocktime(to);
>      }
> -    g_free(uri);
> 
>      test_migrate_end(from, to, true);
>  }
> 
> +static void test_postcopy(void)
> +{
> +    QTestState *from, *to;
> +
> +    migrate_postcopy_prepare(&from, &to);
> +    migrate_postcopy_start(from, to);
> +    migrate_postcopy_complete(from, to);
> +}
> +




>  static void test_baddest(void)
>  {
>      QTestState *from, *to;
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
@ 2018-07-05  9:38   ` Dr. David Alan Gilbert
  2018-07-05 13:01   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05  9:38 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> The calculation on size of received bitmap is incorrect for postcopy
> recovery.  Here we wanted to let the size to cover all the valid bits in
> the bitmap, we should use DIV_ROUND_UP() instead of a division.
> 
> For example, a RAMBlock with size=4K (which contains only one single 4K
> page) will have nbits=1, then nbits/8=0, then the real bitmap won't be
> sent to source at all.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/ram.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index fbeb23f750..203c691ded 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -235,7 +235,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
>      bitmap_to_le(le_bitmap, block->receivedmap, nbits);
>  
>      /* Size of the bitmap, in bytes */
> -    size = nbits / 8;
> +    size = DIV_ROUND_UP(nbits, 8);
>  
>      /*
>       * size is always aligned to 8 bytes for 64bit machines, but it
> @@ -3944,7 +3944,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
>      int ret = -EINVAL;
>      QEMUFile *file = s->rp_state.from_dst_file;
>      unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
> -    uint64_t local_size = nbits / 8;
> +    uint64_t local_size = DIV_ROUND_UP(nbits, 8);
>      uint64_t size, end_mark;
>  
>      trace_ram_dirty_bitmap_reload_begin(block->idstr);
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
  2018-07-05  9:31   ` Balamuruhan S
@ 2018-07-05  9:59   ` Dr. David Alan Gilbert
  2018-07-05 13:03   ` Juan Quintela
  2 siblings, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05  9:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Separate the old postcopy UNIX socket test into three steps, provide a
> helper for each step.  With these helpers, we can do more compliated
> tests like postcopy recovery, while keep the codes shared.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 3a85446f95..2155869b96 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
>      qobject_unref(rsp);
>  }
>  
> -static void migrate_start_postcopy(QTestState *who)
> +static void migrate_postcopy_start(QTestState *from, QTestState *to)
>  {
>      QDict *rsp;
>  
> -    rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
> +    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
>      g_assert(qdict_haskey(rsp, "return"));
>      qobject_unref(rsp);
> +
> +    if (!got_stop) {
> +        qtest_qmp_eventwait(from, "STOP");
> +    }
> +
> +    qtest_qmp_eventwait(to, "RESUME");
>  }
>  
>  static void test_migrate_start(QTestState **from, QTestState **to,
> @@ -505,7 +511,8 @@ static void test_deprecated(void)
>      qtest_quit(from);
>  }
>  
> -static void test_postcopy(void)
> +static void migrate_postcopy_prepare(QTestState **from_ptr,
> +                                     QTestState **to_ptr)
>  {
>      char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>      QTestState *from, *to;
> @@ -527,28 +534,37 @@ static void test_postcopy(void)
>      wait_for_serial("src_serial");
>  
>      migrate(from, uri);
> +    g_free(uri);
>  
>      wait_for_migration_pass(from);
>  
> -    migrate_start_postcopy(from);
> -
> -    if (!got_stop) {
> -        qtest_qmp_eventwait(from, "STOP");
> -    }
> +    *from_ptr = from;
> +    *to_ptr = to;
> +}
>  
> -    qtest_qmp_eventwait(to, "RESUME");
> +static void migrate_postcopy_complete(QTestState *from, QTestState *to)
> +{
> +    wait_for_migration_complete(from);
>  
> +    /* Make sure we get at least one "B" on destination */
>      wait_for_serial("dest_serial");
> -    wait_for_migration_complete(from);

That change is OK, but it shouldn't be necessary with postcopy; we
should be seeing B's before the migration completes.

It's OK however, so:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>      if (uffd_feature_thread_id) {
>          read_blocktime(to);
>      }
> -    g_free(uri);
>  
>      test_migrate_end(from, to, true);
>  }
>  
> +static void test_postcopy(void)
> +{
> +    QTestState *from, *to;
> +
> +    migrate_postcopy_prepare(&from, &to);
> +    migrate_postcopy_start(from, to);
> +    migrate_postcopy_complete(from, to);
> +}
> +
>  static void test_baddest(void)
>  {
>      QTestState *from, *to;
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
@ 2018-07-05 10:18   ` Dr. David Alan Gilbert
  2018-07-05 13:05   ` Juan Quintela
  2018-07-06 10:36   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05 10:18 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> For example, we can pass in '"resume": true' to resume a migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

OK, it's a little odd, I wondered whether it would be better just to
pass the whole arguments string in, but that makes it more work for hte
claler to do the formatting, so:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  tests/migration-test.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 2155869b96..af82a04789 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -337,14 +337,14 @@ static void migrate_set_capability(QTestState *who, const char *capability,
>      qobject_unref(rsp);
>  }
>  
> -static void migrate(QTestState *who, const char *uri)
> +static void migrate(QTestState *who, const char *uri, const char *extra)
>  {
>      QDict *rsp;
>      gchar *cmd;
>  
>      cmd = g_strdup_printf("{ 'execute': 'migrate',"
> -                          "'arguments': { 'uri': '%s' } }",
> -                          uri);
> +                          "  'arguments': { 'uri': '%s' %s } }",
> +                          uri, extra ? extra : "");
>      rsp = qtest_qmp(who, cmd);
>      g_free(cmd);
>      g_assert(qdict_haskey(rsp, "return"));
> @@ -533,7 +533,7 @@ static void migrate_postcopy_prepare(QTestState **from_ptr,
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
>  
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
>      g_free(uri);
>  
>      wait_for_migration_pass(from);
> @@ -573,7 +573,7 @@ static void test_baddest(void)
>      bool failed;
>  
>      test_migrate_start(&from, &to, "tcp:0:0", true);
> -    migrate(from, "tcp:0:0");
> +    migrate(from, "tcp:0:0", NULL);
>      do {
>          rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
>          rsp_return = qdict_get_qdict(rsp, "return");
> @@ -615,7 +615,7 @@ static void test_precopy_unix(void)
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
>  
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
>  
>      wait_for_migration_pass(from);
>  
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
@ 2018-07-05 10:23   ` Dr. David Alan Gilbert
  2018-07-05 13:07     ` Juan Quintela
  2018-07-05 10:59   ` Balamuruhan S
  2018-07-05 13:06   ` Juan Quintela
  2 siblings, 1 reply; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05 10:23 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Introduce helpers to query migration states and use it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/migration-test.c | 64 ++++++++++++++++++++++++++++--------------
>  1 file changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index af82a04789..1d85ccbef1 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -168,6 +168,37 @@ static QDict *wait_command(QTestState *who, const char *command)
>      return response;
>  }
>  
> +/*
> + * Note: caller is responsible to free the returned object via
> + * qobject_unref() after use
> + */
> +static QDict *migrate_query(QTestState *who)
> +{
> +    QDict *rsp, *rsp_return;
> +
> +    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");

having a 'migrate_query' function that issues 'query-migrate' is a bit
odd; I think I'd have called it query_migrate just to match; however
that's just a nit-pick:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> +    rsp_return = qdict_get_qdict(rsp, "return");
> +    g_assert(rsp_return);
> +    qobject_ref(rsp_return);
> +    qobject_unref(rsp);
> +
> +    return rsp_return;
> +}
> +
> +/*
> + * Note: caller is responsible to free the returned object via
> + * g_free() after use
> + */
> +static gchar *migrate_query_status(QTestState *who)
> +{
> +    QDict *rsp_return = migrate_query(who);
> +    gchar *status = g_strdup(qdict_get_str(rsp_return, "status"));
> +
> +    g_assert(status);
> +    qobject_unref(rsp_return);
> +
> +    return status;
> +}
>  
>  /*
>   * It's tricky to use qemu's migration event capability with qtest,
> @@ -176,11 +207,10 @@ static QDict *wait_command(QTestState *who, const char *command)
>  
>  static uint64_t get_migration_pass(QTestState *who)
>  {
> -    QDict *rsp, *rsp_return, *rsp_ram;
> +    QDict *rsp_return, *rsp_ram;
>      uint64_t result;
>  
> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -    rsp_return = qdict_get_qdict(rsp, "return");
> +    rsp_return = migrate_query(who);
>      if (!qdict_haskey(rsp_return, "ram")) {
>          /* Still in setup */
>          result = 0;
> @@ -188,33 +218,29 @@ static uint64_t get_migration_pass(QTestState *who)
>          rsp_ram = qdict_get_qdict(rsp_return, "ram");
>          result = qdict_get_try_int(rsp_ram, "dirty-sync-count", 0);
>      }
> -    qobject_unref(rsp);
> +    qobject_unref(rsp_return);
>      return result;
>  }
>  
>  static void read_blocktime(QTestState *who)
>  {
> -    QDict *rsp, *rsp_return;
> +    QDict *rsp_return;
>  
> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -    rsp_return = qdict_get_qdict(rsp, "return");
> +    rsp_return = migrate_query(who);
>      g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
> -    qobject_unref(rsp);
> +    qobject_unref(rsp_return);
>  }
>  
>  static void wait_for_migration_complete(QTestState *who)
>  {
>      while (true) {
> -        QDict *rsp, *rsp_return;
>          bool completed;
> -        const char *status;
> +        char *status;
>  
> -        rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -        rsp_return = qdict_get_qdict(rsp, "return");
> -        status = qdict_get_str(rsp_return, "status");
> +        status = migrate_query_status(who);
>          completed = strcmp(status, "completed") == 0;
>          g_assert_cmpstr(status, !=,  "failed");
> -        qobject_unref(rsp);
> +        g_free(status);
>          if (completed) {
>              return;
>          }
> @@ -569,20 +595,16 @@ static void test_baddest(void)
>  {
>      QTestState *from, *to;
>      QDict *rsp, *rsp_return;
> -    const char *status;
> +    char *status;
>      bool failed;
>  
>      test_migrate_start(&from, &to, "tcp:0:0", true);
>      migrate(from, "tcp:0:0", NULL);
>      do {
> -        rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
> -        rsp_return = qdict_get_qdict(rsp, "return");
> -
> -        status = qdict_get_str(rsp_return, "status");
> -
> +        status = migrate_query_status(from);
>          g_assert(!strcmp(status, "setup") || !(strcmp(status, "failed")));
>          failed = !strcmp(status, "failed");
> -        qobject_unref(rsp);
> +        g_free(status);
>      } while (!failed);
>  
>      /* Is the machine currently running? */
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status()
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
@ 2018-07-05 10:27   ` Dr. David Alan Gilbert
  2018-07-05 13:07   ` Juan Quintela
  2018-07-06 10:41   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05 10:27 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> It's generalized from wait_for_migration_complete() to allow us to wait
> for any migration status besides failure.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  tests/migration-test.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 1d85ccbef1..761bf62ffe 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -231,14 +231,15 @@ static void read_blocktime(QTestState *who)
>      qobject_unref(rsp_return);
>  }
>  
> -static void wait_for_migration_complete(QTestState *who)
> +static void wait_for_migration_status(QTestState *who,
> +                                      const char *goal)
>  {
>      while (true) {
>          bool completed;
>          char *status;
>  
>          status = migrate_query_status(who);
> -        completed = strcmp(status, "completed") == 0;
> +        completed = strcmp(status, goal) == 0;
>          g_assert_cmpstr(status, !=,  "failed");
>          g_free(status);
>          if (completed) {
> @@ -248,6 +249,11 @@ static void wait_for_migration_complete(QTestState *who)
>      }
>  }
>  
> +static void wait_for_migration_complete(QTestState *who)
> +{
> +    wait_for_migration_status(who, "completed");
> +}
> +
>  static void wait_for_migration_pass(QTestState *who)
>  {
>      uint64_t initial_pass = get_migration_pass(who);
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
@ 2018-07-05 10:30   ` Dr. David Alan Gilbert
  2018-07-05 13:08   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05 10:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Test the postcopy recovery procedure by emulating a network failure
> using migrate-pause command.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

OK, that's nice.


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  tests/migration-test.c | 71 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 761bf62ffe..182fc88e93 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -352,6 +352,29 @@ static void migrate_set_parameter(QTestState *who, const char *parameter,
>      migrate_check_parameter(who, parameter, value);
>  }
>  
> +static void migrate_pause(QTestState *who)
> +{
> +    QDict *rsp;
> +
> +    rsp = wait_command(who, "{ 'execute': 'migrate-pause' }");
> +    g_assert(qdict_haskey(rsp, "return"));
> +    qobject_unref(rsp);
> +}
> +
> +static void migrate_recover(QTestState *who, const char *uri)
> +{
> +    QDict *rsp;
> +    gchar *cmd = g_strdup_printf(
> +        "{ 'execute': 'migrate-recover', "
> +        "  'id': 'recover-cmd', "
> +        "  'arguments': { 'uri': '%s' } }", uri);
> +
> +    rsp = wait_command(who, cmd);
> +    g_assert(qdict_haskey(rsp, "return"));
> +    g_free(cmd);
> +    qobject_unref(rsp);
> +}
> +
>  static void migrate_set_capability(QTestState *who, const char *capability,
>                                     const char *value)
>  {
> @@ -597,6 +620,53 @@ static void test_postcopy(void)
>      migrate_postcopy_complete(from, to);
>  }
>  
> +static void test_postcopy_recovery(void)
> +{
> +    QTestState *from, *to;
> +    char *uri;
> +
> +    migrate_postcopy_prepare(&from, &to);
> +
> +    /* Turn postcopy speed down, 4K/s is slow enough on any machines */
> +    migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
> +
> +    /* Now we start the postcopy */
> +    migrate_postcopy_start(from, to);
> +
> +    /*
> +     * Wait until postcopy is really started; we can only run the
> +     * migrate-pause command during a postcopy
> +     */
> +    wait_for_migration_status(from, "postcopy-active");
> +
> +    /*
> +     * Manually stop the postcopy migration. This emulates a network
> +     * failure with the migration socket
> +     */
> +    migrate_pause(from);
> +
> +    /*
> +     * Create a new socket to emulate a new channel that is different
> +     * from the broken migration channel; tell the destination to
> +     * listen to the new port
> +     */
> +    uri = g_strdup_printf("unix:%s/migsocket-recover", tmpfs);
> +    migrate_recover(to, uri);
> +
> +    /*
> +     * Try to rebuild the migration channel using the resume flag and
> +     * the newly created channel
> +     */
> +    wait_for_migration_status(from, "postcopy-paused");
> +    migrate(from, uri, ", 'resume': true");
> +    g_free(uri);
> +
> +    /* Restore the postcopy bandwidth to unlimited */
> +    migrate_set_parameter(from, "max-postcopy-bandwidth", "0");
> +
> +    migrate_postcopy_complete(from, to);
> +}
> +
>  static void test_baddest(void)
>  {
>      QTestState *from, *to;
> @@ -683,6 +753,7 @@ int main(int argc, char **argv)
>      module_call_init(MODULE_INIT_QOM);
>  
>      qtest_add_func("/migration/postcopy/unix", test_postcopy);
> +    qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
>      qtest_add_func("/migration/deprecated", test_deprecated);
>      qtest_add_func("/migration/bad_dest", test_baddest);
>      qtest_add_func("/migration/precopy/unix", test_precopy_unix);
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for postcopy recovery test
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
@ 2018-07-05 10:36   ` Dr. David Alan Gilbert
  2018-07-05 13:09   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-05 10:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> We dumped something when network failure happens.  We should avoid those
> messages to be dumped when running the tests:
> 
>   $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
>   /x86_64/migration/postcopy/recovery: qemu-system-x86_64: check_section_footer: Read section footer failed: -5
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
>   OK



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> After the patch:
> 
>   $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
>   /x86_64/migration/postcopy/recovery: OK
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/migration-test.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 182fc88e93..96e69dab99 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -567,12 +567,13 @@ static void test_deprecated(void)
>  }
>  
>  static void migrate_postcopy_prepare(QTestState **from_ptr,
> -                                     QTestState **to_ptr)
> +                                     QTestState **to_ptr,
> +                                     bool hide_error)
>  {
>      char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>      QTestState *from, *to;
>  
> -    test_migrate_start(&from, &to, uri, false);
> +    test_migrate_start(&from, &to, uri, hide_error);
>  
>      migrate_set_capability(from, "postcopy-ram", "true");
>      migrate_set_capability(to, "postcopy-ram", "true");
> @@ -615,7 +616,7 @@ static void test_postcopy(void)
>  {
>      QTestState *from, *to;
>  
> -    migrate_postcopy_prepare(&from, &to);
> +    migrate_postcopy_prepare(&from, &to, false);
>      migrate_postcopy_start(from, to);
>      migrate_postcopy_complete(from, to);
>  }
> @@ -625,7 +626,7 @@ static void test_postcopy_recovery(void)
>      QTestState *from, *to;
>      char *uri;
>  
> -    migrate_postcopy_prepare(&from, &to);
> +    migrate_postcopy_prepare(&from, &to, true);
>  
>      /* Turn postcopy speed down, 4K/s is slow enough on any machines */
>      migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
  2018-07-05 10:23   ` Dr. David Alan Gilbert
@ 2018-07-05 10:59   ` Balamuruhan S
  2018-07-05 13:06   ` Juan Quintela
  2 siblings, 0 replies; 46+ messages in thread
From: Balamuruhan S @ 2018-07-05 10:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 11:17:52AM +0800, Peter Xu wrote:
> Introduce helpers to query migration states and use it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Looks good to me.

Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> ---
>  tests/migration-test.c | 64 ++++++++++++++++++++++++++++--------------
>  1 file changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index af82a04789..1d85ccbef1 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -168,6 +168,37 @@ static QDict *wait_command(QTestState *who, const char *command)
>      return response;
>  }
> 
> +/*
> + * Note: caller is responsible to free the returned object via
> + * qobject_unref() after use
> + */
> +static QDict *migrate_query(QTestState *who)
> +{
> +    QDict *rsp, *rsp_return;
> +
> +    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> +    rsp_return = qdict_get_qdict(rsp, "return");
> +    g_assert(rsp_return);
> +    qobject_ref(rsp_return);
> +    qobject_unref(rsp);
> +
> +    return rsp_return;
> +}
> +
> +/*
> + * Note: caller is responsible to free the returned object via
> + * g_free() after use
> + */
> +static gchar *migrate_query_status(QTestState *who)
> +{
> +    QDict *rsp_return = migrate_query(who);
> +    gchar *status = g_strdup(qdict_get_str(rsp_return, "status"));
> +
> +    g_assert(status);
> +    qobject_unref(rsp_return);
> +
> +    return status;
> +}
> 
>  /*
>   * It's tricky to use qemu's migration event capability with qtest,
> @@ -176,11 +207,10 @@ static QDict *wait_command(QTestState *who, const char *command)
> 
>  static uint64_t get_migration_pass(QTestState *who)
>  {
> -    QDict *rsp, *rsp_return, *rsp_ram;
> +    QDict *rsp_return, *rsp_ram;
>      uint64_t result;
> 
> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -    rsp_return = qdict_get_qdict(rsp, "return");
> +    rsp_return = migrate_query(who);
>      if (!qdict_haskey(rsp_return, "ram")) {
>          /* Still in setup */
>          result = 0;
> @@ -188,33 +218,29 @@ static uint64_t get_migration_pass(QTestState *who)
>          rsp_ram = qdict_get_qdict(rsp_return, "ram");
>          result = qdict_get_try_int(rsp_ram, "dirty-sync-count", 0);
>      }
> -    qobject_unref(rsp);
> +    qobject_unref(rsp_return);
>      return result;
>  }
> 
>  static void read_blocktime(QTestState *who)
>  {
> -    QDict *rsp, *rsp_return;
> +    QDict *rsp_return;
> 
> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -    rsp_return = qdict_get_qdict(rsp, "return");
> +    rsp_return = migrate_query(who);
>      g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
> -    qobject_unref(rsp);
> +    qobject_unref(rsp_return);
>  }
> 
>  static void wait_for_migration_complete(QTestState *who)
>  {
>      while (true) {
> -        QDict *rsp, *rsp_return;
>          bool completed;
> -        const char *status;
> +        char *status;
> 
> -        rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
> -        rsp_return = qdict_get_qdict(rsp, "return");
> -        status = qdict_get_str(rsp_return, "status");
> +        status = migrate_query_status(who);
>          completed = strcmp(status, "completed") == 0;
>          g_assert_cmpstr(status, !=,  "failed");
> -        qobject_unref(rsp);
> +        g_free(status);
>          if (completed) {
>              return;
>          }
> @@ -569,20 +595,16 @@ static void test_baddest(void)
>  {
>      QTestState *from, *to;
>      QDict *rsp, *rsp_return;
> -    const char *status;
> +    char *status;
>      bool failed;
> 
>      test_migrate_start(&from, &to, "tcp:0:0", true);
>      migrate(from, "tcp:0:0", NULL);
>      do {
> -        rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
> -        rsp_return = qdict_get_qdict(rsp, "return");
> -
> -        status = qdict_get_str(rsp_return, "status");
> -
> +        status = migrate_query_status(from);
>          g_assert(!strcmp(status, "setup") || !(strcmp(status, "failed")));
>          failed = !strcmp(status, "failed");
> -        qobject_unref(rsp);
> +        g_free(status);
>      } while (!failed);
> 
>      /* Is the machine currently running? */
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
  2018-07-05  9:01   ` Dr. David Alan Gilbert
@ 2018-07-05 12:59   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 12:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Firstly, renaming the old matching_page_sizes variable to
> matching_target_page_size, which suites more to what it did (it only
> checks against target page size rather than multiple page sizes).
> Meanwhile, simplify the check logic a bit, and enhance the comments.
> Should have no functional change.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

agree with dave suggestion, even:
same_target_page_size O:-)

> ---
>  migration/ram.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 23cea47090..fbeb23f750 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3580,7 +3580,7 @@ static int ram_load_postcopy(QEMUFile *f)
>  {
>      int flags = 0, ret = 0;
>      bool place_needed = false;
> -    bool matching_page_sizes = false;
> +    bool matching_target_page_size = false;
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      /* Temporary page that is later 'placed' */
>      void *postcopy_host_page = postcopy_get_tmp_page(mis);
> @@ -3620,7 +3620,7 @@ static int ram_load_postcopy(QEMUFile *f)
>                  ret = -EINVAL;
>                  break;
>              }
> -            matching_page_sizes = block->page_size == TARGET_PAGE_SIZE;
> +            matching_target_page_size = block->page_size == TARGET_PAGE_SIZE;
>              /*
>               * Postcopy requires that we place whole host pages atomically;
>               * these may be huge pages for RAMBlocks that are backed by
> @@ -3668,12 +3668,17 @@ static int ram_load_postcopy(QEMUFile *f)
>  
>          case RAM_SAVE_FLAG_PAGE:
>              all_zero = false;
> -            if (!place_needed || !matching_page_sizes) {
> +            if (!matching_target_page_size) {
> +                /* For huge pages, we always use temporary buffer */
>                  qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
>              } else {
> -                /* Avoids the qemu_file copy during postcopy, which is
> -                 * going to do a copy later; can only do it when we
> -                 * do this read in one go (matching page sizes)
> +                /*
> +                 * For small pages that matches target page size, we
> +                 * avoid the qemu_file copy.  Instead we directly use
> +                 * the buffer of QEMUFile to place the page.  Note: we
> +                 * cannot do any QEMUFile operation before using that
> +                 * buffer to make sure the buffer is valid when
> +                 * placing the page.
>                   */
>                  qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
>                                           TARGET_PAGE_SIZE);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
  2018-07-05  9:15   ` Dr. David Alan Gilbert
@ 2018-07-05 13:01   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:01 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> We were checking against -EIO, assuming that it will cover all IO
> failures.  But actually it is not.  One example is that in
> qemu_loadvm_section_start_full() we can have tons of places that will
> return -EINVAL even if the error is caused by IO failures on the
> network.
>
> Let's loosen the recovery check logic here to cover all the error cases
> happened by removing the explicit check against -EIO.  After all we
> won't lose anything here if any other failure happened.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
  2018-07-05  9:38   ` Dr. David Alan Gilbert
@ 2018-07-05 13:01   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:01 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> The calculation on size of received bitmap is incorrect for postcopy
> recovery.  Here we wanted to let the size to cover all the valid bits in
> the bitmap, we should use DIV_ROUND_UP() instead of a division.
>
> For example, a RAMBlock with size=4K (which contains only one single 4K
> page) will have nbits=1, then nbits/8=0, then the real bitmap won't be
> sent to source at all.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
  2018-07-05  9:31   ` Balamuruhan S
  2018-07-05  9:59   ` Dr. David Alan Gilbert
@ 2018-07-05 13:03   ` Juan Quintela
  2 siblings, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:03 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Separate the old postcopy UNIX socket test into three steps, provide a
> helper for each step.  With these helpers, we can do more compliated
> tests like postcopy recovery, while keep the codes shared.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
  2018-07-05 10:18   ` Dr. David Alan Gilbert
@ 2018-07-05 13:05   ` Juan Quintela
  2018-07-06 10:36   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:05 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> For example, we can pass in '"resume": true' to resume a migration.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

sometime I miss a "hmp_to_qmp()" function O:-)


> ---
>  tests/migration-test.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 2155869b96..af82a04789 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -337,14 +337,14 @@ static void migrate_set_capability(QTestState *who, const char *capability,
>      qobject_unref(rsp);
>  }
>  
> -static void migrate(QTestState *who, const char *uri)
> +static void migrate(QTestState *who, const char *uri, const char *extra)
>  {
>      QDict *rsp;
>      gchar *cmd;
>  
>      cmd = g_strdup_printf("{ 'execute': 'migrate',"
> -                          "'arguments': { 'uri': '%s' } }",
> -                          uri);
> +                          "  'arguments': { 'uri': '%s' %s } }",
> +                          uri, extra ? extra : "");
>      rsp = qtest_qmp(who, cmd);
>      g_free(cmd);
>      g_assert(qdict_haskey(rsp, "return"));
> @@ -533,7 +533,7 @@ static void migrate_postcopy_prepare(QTestState **from_ptr,
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
>  
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
>      g_free(uri);
>  
>      wait_for_migration_pass(from);
> @@ -573,7 +573,7 @@ static void test_baddest(void)
>      bool failed;
>  
>      test_migrate_start(&from, &to, "tcp:0:0", true);
> -    migrate(from, "tcp:0:0");
> +    migrate(from, "tcp:0:0", NULL);
>      do {
>          rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
>          rsp_return = qdict_get_qdict(rsp, "return");
> @@ -615,7 +615,7 @@ static void test_precopy_unix(void)
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
>  
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
>  
>      wait_for_migration_pass(from);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
  2018-07-05 10:23   ` Dr. David Alan Gilbert
  2018-07-05 10:59   ` Balamuruhan S
@ 2018-07-05 13:06   ` Juan Quintela
  2 siblings, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Introduce helpers to query migration states and use it.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers
  2018-07-05 10:23   ` Dr. David Alan Gilbert
@ 2018-07-05 13:07     ` Juan Quintela
  0 siblings, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Peter Xu, qemu-devel

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Peter Xu (peterx@redhat.com) wrote:
>> Introduce helpers to query migration states and use it.
>> 
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>  tests/migration-test.c | 64 ++++++++++++++++++++++++++++--------------
>>  1 file changed, 43 insertions(+), 21 deletions(-)
>> 
>> diff --git a/tests/migration-test.c b/tests/migration-test.c
>> index af82a04789..1d85ccbef1 100644
>> --- a/tests/migration-test.c
>> +++ b/tests/migration-test.c
>> @@ -168,6 +168,37 @@ static QDict *wait_command(QTestState *who, const char *command)
>>      return response;
>>  }
>>  
>> +/*
>> + * Note: caller is responsible to free the returned object via
>> + * qobject_unref() after use
>> + */
>> +static QDict *migrate_query(QTestState *who)
>> +{
>> +    QDict *rsp, *rsp_return;
>> +
>> +    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
>
> having a 'migrate_query' function that issues 'query-migrate' is a bit
> odd; I think I'd have called it query_migrate just to match; however
> that's just a nit-pick:

I guess he is trying to get almost everything to be migrate_<foo>.

Not that the whole file is consistent at all, but well.


>
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>> +    rsp_return = qdict_get_qdict(rsp, "return");
>> +    g_assert(rsp_return);
>> +    qobject_ref(rsp_return);
>> +    qobject_unref(rsp);
>> +
>> +    return rsp_return;
>> +}
>> +
>> +/*
>> + * Note: caller is responsible to free the returned object via
>> + * g_free() after use
>> + */
>> +static gchar *migrate_query_status(QTestState *who)
>> +{
>> +    QDict *rsp_return = migrate_query(who);
>> +    gchar *status = g_strdup(qdict_get_str(rsp_return, "status"));
>> +
>> +    g_assert(status);
>> +    qobject_unref(rsp_return);
>> +
>> +    return status;
>> +}
>>  
>>  /*
>>   * It's tricky to use qemu's migration event capability with qtest,
>> @@ -176,11 +207,10 @@ static QDict *wait_command(QTestState *who, const char *command)
>>  
>>  static uint64_t get_migration_pass(QTestState *who)
>>  {
>> -    QDict *rsp, *rsp_return, *rsp_ram;
>> +    QDict *rsp_return, *rsp_ram;
>>      uint64_t result;
>>  
>> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
>> -    rsp_return = qdict_get_qdict(rsp, "return");
>> +    rsp_return = migrate_query(who);
>>      if (!qdict_haskey(rsp_return, "ram")) {
>>          /* Still in setup */
>>          result = 0;
>> @@ -188,33 +218,29 @@ static uint64_t get_migration_pass(QTestState *who)
>>          rsp_ram = qdict_get_qdict(rsp_return, "ram");
>>          result = qdict_get_try_int(rsp_ram, "dirty-sync-count", 0);
>>      }
>> -    qobject_unref(rsp);
>> +    qobject_unref(rsp_return);
>>      return result;
>>  }
>>  
>>  static void read_blocktime(QTestState *who)
>>  {
>> -    QDict *rsp, *rsp_return;
>> +    QDict *rsp_return;
>>  
>> -    rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
>> -    rsp_return = qdict_get_qdict(rsp, "return");
>> +    rsp_return = migrate_query(who);
>>      g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
>> -    qobject_unref(rsp);
>> +    qobject_unref(rsp_return);
>>  }
>>  
>>  static void wait_for_migration_complete(QTestState *who)
>>  {
>>      while (true) {
>> -        QDict *rsp, *rsp_return;
>>          bool completed;
>> -        const char *status;
>> +        char *status;
>>  
>> -        rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
>> -        rsp_return = qdict_get_qdict(rsp, "return");
>> -        status = qdict_get_str(rsp_return, "status");
>> +        status = migrate_query_status(who);
>>          completed = strcmp(status, "completed") == 0;
>>          g_assert_cmpstr(status, !=,  "failed");
>> -        qobject_unref(rsp);
>> +        g_free(status);
>>          if (completed) {
>>              return;
>>          }
>> @@ -569,20 +595,16 @@ static void test_baddest(void)
>>  {
>>      QTestState *from, *to;
>>      QDict *rsp, *rsp_return;
>> -    const char *status;
>> +    char *status;
>>      bool failed;
>>  
>>      test_migrate_start(&from, &to, "tcp:0:0", true);
>>      migrate(from, "tcp:0:0", NULL);
>>      do {
>> -        rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
>> -        rsp_return = qdict_get_qdict(rsp, "return");
>> -
>> -        status = qdict_get_str(rsp_return, "status");
>> -
>> +        status = migrate_query_status(from);
>>          g_assert(!strcmp(status, "setup") || !(strcmp(status, "failed")));
>>          failed = !strcmp(status, "failed");
>> -        qobject_unref(rsp);
>> +        g_free(status);
>>      } while (!failed);
>>  
>>      /* Is the machine currently running? */
>> -- 
>> 2.17.1
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status()
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
  2018-07-05 10:27   ` Dr. David Alan Gilbert
@ 2018-07-05 13:07   ` Juan Quintela
  2018-07-06 10:41   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> It's generalized from wait_for_migration_complete() to allow us to wait
> for any migration status besides failure.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
  2018-07-05 10:30   ` Dr. David Alan Gilbert
@ 2018-07-05 13:08   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:08 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Test the postcopy recovery procedure by emulating a network failure
> using migrate-pause command.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for postcopy recovery test
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
  2018-07-05 10:36   ` Dr. David Alan Gilbert
@ 2018-07-05 13:09   ` Juan Quintela
  1 sibling, 0 replies; 46+ messages in thread
From: Juan Quintela @ 2018-07-05 13:09 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> We dumped something when network failure happens.  We should avoid those
> messages to be dumped when running the tests:
>
>   $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
>   /x86_64/migration/postcopy/recovery: qemu-system-x86_64: check_section_footer: Read section footer failed: -5
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
>   OK
>
> After the patch:
>
>   $ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
>   /x86_64/migration/postcopy/recovery: OK
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-05  9:31   ` Balamuruhan S
@ 2018-07-06  2:19     ` Peter Xu
  2018-07-06  6:17       ` Balamuruhan S
  0 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2018-07-06  2:19 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 03:01:44PM +0530, Balamuruhan S wrote:
> On Thu, Jul 05, 2018 at 11:17:50AM +0800, Peter Xu wrote:
> > Separate the old postcopy UNIX socket test into three steps, provide a
> > helper for each step.  With these helpers, we can do more compliated
> > tests like postcopy recovery, while keep the codes shared.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/tests/migration-test.c b/tests/migration-test.c
> > index 3a85446f95..2155869b96 100644
> > --- a/tests/migration-test.c
> > +++ b/tests/migration-test.c
> > @@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
> >      qobject_unref(rsp);
> >  }
> > 
> > -static void migrate_start_postcopy(QTestState *who)
> > +static void migrate_postcopy_start(QTestState *from, QTestState *to)
> >  {
> >      QDict *rsp;
> > 
> > -    rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
> > +    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
> >      g_assert(qdict_haskey(rsp, "return"));
> >      qobject_unref(rsp);
> > +
> > +    if (!got_stop) {
> > +        qtest_qmp_eventwait(from, "STOP");
> > +    }
> > +
> > +    qtest_qmp_eventwait(to, "RESUME");
> >  }
> > 
> >  static void test_migrate_start(QTestState **from, QTestState **to,
> > @@ -505,7 +511,8 @@ static void test_deprecated(void)
> >      qtest_quit(from);
> >  }
> > 
> > -static void test_postcopy(void)
> > +static void migrate_postcopy_prepare(QTestState **from_ptr,
> > +                                     QTestState **to_ptr)
> 
> if we have uri as one of the configurable argument, it would be still
> better, so that in future we can call migrate_postcopy_prepare() with
> any different uri.

Yes, though we'd better have a first user of it.  Now postcopy only
covers unix socket test.  We don't have a large matrix yet.

We can add that when we start to add more complicated tests for
postcopy.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers
  2018-07-06  2:19     ` Peter Xu
@ 2018-07-06  6:17       ` Balamuruhan S
  0 siblings, 0 replies; 46+ messages in thread
From: Balamuruhan S @ 2018-07-06  6:17 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Fri, Jul 06, 2018 at 10:19:51AM +0800, Peter Xu wrote:
> On Thu, Jul 05, 2018 at 03:01:44PM +0530, Balamuruhan S wrote:
> > On Thu, Jul 05, 2018 at 11:17:50AM +0800, Peter Xu wrote:
> > > Separate the old postcopy UNIX socket test into three steps, provide a
> > > helper for each step.  With these helpers, we can do more compliated
> > > tests like postcopy recovery, while keep the codes shared.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
> > >  1 file changed, 27 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/tests/migration-test.c b/tests/migration-test.c
> > > index 3a85446f95..2155869b96 100644
> > > --- a/tests/migration-test.c
> > > +++ b/tests/migration-test.c
> > > @@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
> > >      qobject_unref(rsp);
> > >  }
> > > 
> > > -static void migrate_start_postcopy(QTestState *who)
> > > +static void migrate_postcopy_start(QTestState *from, QTestState *to)
> > >  {
> > >      QDict *rsp;
> > > 
> > > -    rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
> > > +    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
> > >      g_assert(qdict_haskey(rsp, "return"));
> > >      qobject_unref(rsp);
> > > +
> > > +    if (!got_stop) {
> > > +        qtest_qmp_eventwait(from, "STOP");
> > > +    }
> > > +
> > > +    qtest_qmp_eventwait(to, "RESUME");
> > >  }
> > > 
> > >  static void test_migrate_start(QTestState **from, QTestState **to,
> > > @@ -505,7 +511,8 @@ static void test_deprecated(void)
> > >      qtest_quit(from);
> > >  }
> > > 
> > > -static void test_postcopy(void)
> > > +static void migrate_postcopy_prepare(QTestState **from_ptr,
> > > +                                     QTestState **to_ptr)
> > 
> > if we have uri as one of the configurable argument, it would be still
> > better, so that in future we can call migrate_postcopy_prepare() with
> > any different uri.
> 
> Yes, though we'd better have a first user of it.  Now postcopy only
> covers unix socket test.  We don't have a large matrix yet.
> 
> We can add that when we start to add more complicated tests for
> postcopy.  Thanks,

sure Peter.

Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (8 preceding siblings ...)
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
@ 2018-07-06  9:17 ` Dr. David Alan Gilbert
  2018-07-06 10:56   ` Dr. David Alan Gilbert
  2018-07-10  1:56 ` Balamuruhan S
  10 siblings, 1 reply; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-06  9:17 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> Based-on: <20180627132246.5576-1-peterx@redhat.com>
> 
> Based on the series to unbreak postcopy:
>   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
>   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> 
> This series introduce a new postcopy recovery test.  The new test
> actually helped me to identify two bugs there so fix them as well
> before 3.0 release.
> 
> Patch 1: a trivial cleanup for existing postcopy ram load, which I
>          found a bit confusing during debugging the problem.
> 
> Patch 2-3: two bug fixes that address different issues.  Please see
>            the commit log for more information.
> 
> Patch 4-9: add the postcopy recovery unit test.
> 
> Please review.  Thanks,

Queued

> Peter Xu (9):
>   migration: simplify check to use qemu file buffer
>   migration: loosen recovery check when load vm
>   migration: fix incorrect bitmap size calculation
>   tests: introduce migrate_postcopy_* helpers
>   tests: allow migrate() to take extra flags
>   tests: introduce migrate_query*() helpers
>   tests: introduce wait_for_migration_status()
>   tests: add postcopy recovery test
>   tests: hide stderr for postcopy recovery test
> 
>  migration/ram.c        |  21 +++--
>  migration/savevm.c     |  16 ++--
>  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
>  3 files changed, 176 insertions(+), 59 deletions(-)
> 
> -- 
> 2.17.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
  2018-07-05 10:18   ` Dr. David Alan Gilbert
  2018-07-05 13:05   ` Juan Quintela
@ 2018-07-06 10:36   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Balamuruhan S @ 2018-07-06 10:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 11:17:51AM +0800, Peter Xu wrote:
> For example, we can pass in '"resume": true' to resume a migration.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>

>  tests/migration-test.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 2155869b96..af82a04789 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -337,14 +337,14 @@ static void migrate_set_capability(QTestState *who, const char *capability,
>      qobject_unref(rsp);
>  }
> 
> -static void migrate(QTestState *who, const char *uri)
> +static void migrate(QTestState *who, const char *uri, const char *extra)
>  {
>      QDict *rsp;
>      gchar *cmd;
> 
>      cmd = g_strdup_printf("{ 'execute': 'migrate',"
> -                          "'arguments': { 'uri': '%s' } }",
> -                          uri);
> +                          "  'arguments': { 'uri': '%s' %s } }",
> +                          uri, extra ? extra : "");
>      rsp = qtest_qmp(who, cmd);
>      g_free(cmd);
>      g_assert(qdict_haskey(rsp, "return"));
> @@ -533,7 +533,7 @@ static void migrate_postcopy_prepare(QTestState **from_ptr,
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
> 
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
>      g_free(uri);
> 
>      wait_for_migration_pass(from);
> @@ -573,7 +573,7 @@ static void test_baddest(void)
>      bool failed;
> 
>      test_migrate_start(&from, &to, "tcp:0:0", true);
> -    migrate(from, "tcp:0:0");
> +    migrate(from, "tcp:0:0", NULL);
>      do {
>          rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
>          rsp_return = qdict_get_qdict(rsp, "return");
> @@ -615,7 +615,7 @@ static void test_precopy_unix(void)
>      /* Wait for the first serial output from the source */
>      wait_for_serial("src_serial");
> 
> -    migrate(from, uri);
> +    migrate(from, uri, NULL);
> 
>      wait_for_migration_pass(from);
> 
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status()
  2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
  2018-07-05 10:27   ` Dr. David Alan Gilbert
  2018-07-05 13:07   ` Juan Quintela
@ 2018-07-06 10:41   ` Balamuruhan S
  2 siblings, 0 replies; 46+ messages in thread
From: Balamuruhan S @ 2018-07-06 10:41 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 11:17:53AM +0800, Peter Xu wrote:
> It's generalized from wait_for_migration_complete() to allow us to wait
> for any migration status besides failure.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>

>  tests/migration-test.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 1d85ccbef1..761bf62ffe 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -231,14 +231,15 @@ static void read_blocktime(QTestState *who)
>      qobject_unref(rsp_return);
>  }
> 
> -static void wait_for_migration_complete(QTestState *who)
> +static void wait_for_migration_status(QTestState *who,
> +                                      const char *goal)
>  {
>      while (true) {
>          bool completed;
>          char *status;
> 
>          status = migrate_query_status(who);
> -        completed = strcmp(status, "completed") == 0;
> +        completed = strcmp(status, goal) == 0;
>          g_assert_cmpstr(status, !=,  "failed");
>          g_free(status);
>          if (completed) {
> @@ -248,6 +249,11 @@ static void wait_for_migration_complete(QTestState *who)
>      }
>  }
> 
> +static void wait_for_migration_complete(QTestState *who)
> +{
> +    wait_for_migration_status(who, "completed");
> +}
> +
>  static void wait_for_migration_pass(QTestState *who)
>  {
>      uint64_t initial_pass = get_migration_pass(who);
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
@ 2018-07-06 10:56   ` Dr. David Alan Gilbert
  2018-07-06 11:45     ` Balamuruhan S
                       ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-06 10:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > 
> > Based on the series to unbreak postcopy:
> >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > 
> > This series introduce a new postcopy recovery test.  The new test
> > actually helped me to identify two bugs there so fix them as well
> > before 3.0 release.
> > 
> > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> >          found a bit confusing during debugging the problem.
> > 
> > Patch 2-3: two bug fixes that address different issues.  Please see
> >            the commit log for more information.
> > 
> > Patch 4-9: add the postcopy recovery unit test.
> > 
> > Please review.  Thanks,
> 
> Queued

Hi Peter,
  There's a problem in there somewhere;  I'm getting
an intermittent failure of the test if I run a make check -j 8    on my
laptop.  Just running two copies of tests/migration-test in parallel
sometimes triggers it (but not if I turn on QTEST_LOG!).
But it's always failing with:

  ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))

Dave

> > Peter Xu (9):
> >   migration: simplify check to use qemu file buffer
> >   migration: loosen recovery check when load vm
> >   migration: fix incorrect bitmap size calculation
> >   tests: introduce migrate_postcopy_* helpers
> >   tests: allow migrate() to take extra flags
> >   tests: introduce migrate_query*() helpers
> >   tests: introduce wait_for_migration_status()
> >   tests: add postcopy recovery test
> >   tests: hide stderr for postcopy recovery test
> > 
> >  migration/ram.c        |  21 +++--
> >  migration/savevm.c     |  16 ++--
> >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> >  3 files changed, 176 insertions(+), 59 deletions(-)
> > 
> > -- 
> > 2.17.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-06 10:56   ` Dr. David Alan Gilbert
@ 2018-07-06 11:45     ` Balamuruhan S
  2018-07-06 12:46     ` Balamuruhan S
  2018-07-10  3:27     ` Peter Xu
  2 siblings, 0 replies; 46+ messages in thread
From: Balamuruhan S @ 2018-07-06 11:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > Based on the series to unbreak postcopy:
> > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > This series introduce a new postcopy recovery test.  The new test
> > > actually helped me to identify two bugs there so fix them as well
> > > before 3.0 release.
> > > 
> > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > >          found a bit confusing during debugging the problem.
> > > 
> > > Patch 2-3: two bug fixes that address different issues.  Please see
> > >            the commit log for more information.
> > > 
> > > Patch 4-9: add the postcopy recovery unit test.
> > > 
> > > Please review.  Thanks,
> > 
> > Queued
> 
> Hi Peter,
>   There's a problem in there somewhere;  I'm getting
> an intermittent failure of the test if I run a make check -j 8    on my
> laptop.  Just running two copies of tests/migration-test in parallel
> sometimes triggers it (but not if I turn on QTEST_LOG!).
> But it's always failing with:
> 
>   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))

Hi Peter and Dave,

I have tested postcopy migration pause/recover after applying this
patchset on upstream Qemu,

Observation 1:

We loose the target after triggering migrate_pause,

source

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
-machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
-device virtio-blk-pci,drive=rootdisk -drive \
file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio \
-net user -redir tcp:2001::22 

qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.

source Monitor
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_set_parameter max-postcopy-bandwidth 4096
(qemu) migrate -d tcp:127.0.0.1:4444
(qemu) migrate_start_postcopy
(qemu) migrate_pause

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
release-ram: off block: off return-path: off pause-before-switchover:
off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off
late-block-activate: off 
Migration status: postcopy-paused
total time: 371289 milliseconds
expected downtime: 656414 milliseconds
setup: 93 milliseconds
transferred ram: 690856 kbytes
throughput: 46.65 mbps
remaining ram: 3716864 kbytes
total ram: 67109120 kbytes
duplicate: 16631167 pages
skipped: 0 pages
normal: 135905 pages
normal bytes: 543620 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
dirty pages rate: 626209 pages
postcopy request count: 395

source remains to be in postcopy-paused state as the target is lost.

target

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
-machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
-device virtio-blk-pci,drive=rootdisk -drive \
file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
\
-monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio \
-net user -redir tcp:2001::22 -incoming tcp:127.0.0.1:4444

Error observed
check_section_footer: Read section footer failed: -5
qemu-system-ppc64: postcopy_ram_listen_thread: loadvm failed: -22
[  188.815436] Unable to handle kernel paging request for instruction
fetch

Target Monitor

(qemu) migrate_set_capability postcopy-ram on


Observation 2:
Unlike error observed by Dave in Qtest it hangs for me waiting for
the migration to complete, but the source remains to be in
migration-paused state.

# time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64
# ./tests/migration-test
/ppc64/migration/deprecated: OK
/ppc64/migration/bad_dest: OK
/ppc64/migration/postcopy/unix: OK
/ppc64/migration/postcopy/recovery: ^C

real    21m55.176s
user    2m28.800s
sys 4m55.980s

-- Bala

> 
> Dave
> 
> > > Peter Xu (9):
> > >   migration: simplify check to use qemu file buffer
> > >   migration: loosen recovery check when load vm
> > >   migration: fix incorrect bitmap size calculation
> > >   tests: introduce migrate_postcopy_* helpers
> > >   tests: allow migrate() to take extra flags
> > >   tests: introduce migrate_query*() helpers
> > >   tests: introduce wait_for_migration_status()
> > >   tests: add postcopy recovery test
> > >   tests: hide stderr for postcopy recovery test
> > > 
> > >  migration/ram.c        |  21 +++--
> > >  migration/savevm.c     |  16 ++--
> > >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 176 insertions(+), 59 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-06 10:56   ` Dr. David Alan Gilbert
  2018-07-06 11:45     ` Balamuruhan S
@ 2018-07-06 12:46     ` Balamuruhan S
  2018-07-12  8:50       ` Dr. David Alan Gilbert
  2018-07-10  3:27     ` Peter Xu
  2 siblings, 1 reply; 46+ messages in thread
From: Balamuruhan S @ 2018-07-06 12:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: peterx, qemu-devel

On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > Based on the series to unbreak postcopy:
> > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > This series introduce a new postcopy recovery test.  The new test
> > > actually helped me to identify two bugs there so fix them as well
> > > before 3.0 release.
> > > 
> > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > >          found a bit confusing during debugging the problem.
> > > 
> > > Patch 2-3: two bug fixes that address different issues.  Please see
> > >            the commit log for more information.
> > > 
> > > Patch 4-9: add the postcopy recovery unit test.
> > > 
> > > Please review.  Thanks,
> > 
> > Queued
> 
> Hi Peter,
>   There's a problem in there somewhere;  I'm getting
> an intermittent failure of the test if I run a make check -j 8    on my
> laptop.  Just running two copies of tests/migration-test in parallel
> sometimes triggers it (but not if I turn on QTEST_LOG!).
> But it's always failing with:
> 
>   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
> 
> Dave

Hi Peter, Dave,

I have applied this patchset in upstream Qemu to test postcopy
pause/recovery.

I observed error after triggering recovery command from source monitor
where the target is lost and the source remains to be in `postcopy-pause`
state.

Please find my observation below,

Source:

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
-redir tcp:2000::22

qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.

Source Monitor:

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_set_parameter max-postcopy-bandwidth 4096
(qemu) migrate -d tcp:127.0.0.1:4444
(qemu) migrate_start_postcopy
(qemu) migrate_pause
(qemu) migrate -r tcp:127.0.0.1:4446

After triggering recovery, target is lost with the error mentioned below
and source remains to be in `postcopy-paused` state

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off \
compress: off events: off postcopy-ram: on x-colo: off release-ram: off
block: off return-path: off pause-before-switchover: off x-multifd: off \
dirty-bitmaps: off
postcopy-blocktime: off late-block-activate: off 
Migration status: postcopy-recover
total time: 78818 milliseconds
expected downtime: 300 milliseconds
setup: 169 milliseconds
transferred ram: 177749 kbytes
throughput: 63.72 mbps
remaining ram: 28061376 kbytes
total ram: 67109120 kbytes
duplicate: 9742102 pages
skipped: 0 pages
normal: 22986 pages
normal bytes: 91944 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
dirty pages rate: 1273187 pages
postcopy request count: 236


Target:

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
-redir tcp:2001::22 -incoming tcp:127.0.0.1:4444


qemu-system-ppc64: check_section_footer: Read section footer failed: -5
qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
qemu-system-ppc64: Not a migration stream
qemu-system-ppc64: load of migration failed: Invalid argument


Target Monitor:

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_recover tcp:127.0.0.1:4446
(qemu) Connection closed by foreign host.

QTest:

Also with respect to Qtest, I have tested it and the recovery test
doesn't complete as it waits on the source for "completed" but due to this
issue source remains to be in `postcopy-paused`

`migrate_postcopy_complete(from, to);`

but it actually doesn't end.

As it did not complete, I cancelled it forcefully

# time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 ./tests/migration-test
/ppc64/migration/deprecated: OK
/ppc64/migration/bad_dest: OK
/ppc64/migration/postcopy/unix: OK
/ppc64/migration/postcopy/recovery: ^C

real    21m55.176s
user    2m28.800s
sys 4m55.980s

-- Bala
> 
> > > Peter Xu (9):
> > >   migration: simplify check to use qemu file buffer
> > >   migration: loosen recovery check when load vm
> > >   migration: fix incorrect bitmap size calculation
> > >   tests: introduce migrate_postcopy_* helpers
> > >   tests: allow migrate() to take extra flags
> > >   tests: introduce migrate_query*() helpers
> > >   tests: introduce wait_for_migration_status()
> > >   tests: add postcopy recovery test
> > >   tests: hide stderr for postcopy recovery test
> > > 
> > >  migration/ram.c        |  21 +++--
> > >  migration/savevm.c     |  16 ++--
> > >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 176 insertions(+), 59 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
                   ` (9 preceding siblings ...)
  2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
@ 2018-07-10  1:56 ` Balamuruhan S
  2018-07-10  3:07   ` Peter Xu
  10 siblings, 1 reply; 46+ messages in thread
From: Balamuruhan S @ 2018-07-10  1:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel

On Thu, Jul 05, 2018 at 11:17:46AM +0800, Peter Xu wrote:
> Based-on: <20180627132246.5576-1-peterx@redhat.com>
> 
> Based on the series to unbreak postcopy:
>   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
>   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> 
> This series introduce a new postcopy recovery test.  The new test
> actually helped me to identify two bugs there so fix them as well
> before 3.0 release.
> 
> Patch 1: a trivial cleanup for existing postcopy ram load, which I
>          found a bit confusing during debugging the problem.
> 
> Patch 2-3: two bug fixes that address different issues.  Please see
>            the commit log for more information.
> 
> Patch 4-9: add the postcopy recovery unit test.
> 
> Please review.  Thanks,

Hi Peter, Dave,

I am sorry, I have missed to include Peter's postcopy-recover fix patchset,

migration: delay postcopy paused state
migration: move income process out of multifd
migration: unbreak postcopy recovery
migration: unify incoming processing

Postcopy migration with pause and recover is working fine.

# QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64
# ./tests/migration-test
/ppc64/migration/deprecated: OK
/ppc64/migration/bad_dest: OK
/ppc64/migration/postcopy/unix: OK
/ppc64/migration/postcopy/recovery: OK
/ppc64/migration/precopy/unix: OK

But qtest patches in this patchset have to be rebased as commit
5fd4a9c97397bc0819a919de7a62ec972ec85260 (tests/migration: Skip tests
for ppc tcg) have gone in.

# git am ../postcopy_pause/4.patch 
Applying: tests: introduce migrate_postcopy_* helpers
error: patch failed: tests/migration-test.c:351
error: tests/migration-test.c: patch does not apply
Patch failed at 0001 tests: introduce migrate_postcopy_* helpers
The copy of the patch that failed is found in:
   /home/bala/qemu/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


I have manually reverted it to apply and test your patchset.

This Patchset is working without any issues.

Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> 
> Peter Xu (9):
>   migration: simplify check to use qemu file buffer
>   migration: loosen recovery check when load vm
>   migration: fix incorrect bitmap size calculation
>   tests: introduce migrate_postcopy_* helpers
>   tests: allow migrate() to take extra flags
>   tests: introduce migrate_query*() helpers
>   tests: introduce wait_for_migration_status()
>   tests: add postcopy recovery test
>   tests: hide stderr for postcopy recovery test
> 
>  migration/ram.c        |  21 +++--
>  migration/savevm.c     |  16 ++--
>  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
>  3 files changed, 176 insertions(+), 59 deletions(-)
> 
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-10  1:56 ` Balamuruhan S
@ 2018-07-10  3:07   ` Peter Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Peter Xu @ 2018-07-10  3:07 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: qemu-devel

On Tue, Jul 10, 2018 at 07:26:40AM +0530, Balamuruhan S wrote:
> On Thu, Jul 05, 2018 at 11:17:46AM +0800, Peter Xu wrote:
> > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > 
> > Based on the series to unbreak postcopy:
> >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > 
> > This series introduce a new postcopy recovery test.  The new test
> > actually helped me to identify two bugs there so fix them as well
> > before 3.0 release.
> > 
> > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> >          found a bit confusing during debugging the problem.
> > 
> > Patch 2-3: two bug fixes that address different issues.  Please see
> >            the commit log for more information.
> > 
> > Patch 4-9: add the postcopy recovery unit test.
> > 
> > Please review.  Thanks,
> 
> Hi Peter, Dave,
> 
> I am sorry, I have missed to include Peter's postcopy-recover fix patchset,
> 
> migration: delay postcopy paused state
> migration: move income process out of multifd
> migration: unbreak postcopy recovery
> migration: unify incoming processing
> 
> Postcopy migration with pause and recover is working fine.
> 
> # QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64
> # ./tests/migration-test
> /ppc64/migration/deprecated: OK
> /ppc64/migration/bad_dest: OK
> /ppc64/migration/postcopy/unix: OK
> /ppc64/migration/postcopy/recovery: OK
> /ppc64/migration/precopy/unix: OK
> 
> But qtest patches in this patchset have to be rebased as commit
> 5fd4a9c97397bc0819a919de7a62ec972ec85260 (tests/migration: Skip tests
> for ppc tcg) have gone in.
> 
> # git am ../postcopy_pause/4.patch 
> Applying: tests: introduce migrate_postcopy_* helpers
> error: patch failed: tests/migration-test.c:351
> error: tests/migration-test.c: patch does not apply
> Patch failed at 0001 tests: introduce migrate_postcopy_* helpers
> The copy of the patch that failed is found in:
>    /home/bala/qemu/.git/rebase-apply/patch
> When you have resolved this problem, run "git am --resolved".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> 
> I have manually reverted it to apply and test your patchset.
> 
> This Patchset is working without any issues.
> 
> Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>

Thanks for the quick follow up!  I'll have a look today at the problem
that Dave reported.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-06 10:56   ` Dr. David Alan Gilbert
  2018-07-06 11:45     ` Balamuruhan S
  2018-07-06 12:46     ` Balamuruhan S
@ 2018-07-10  3:27     ` Peter Xu
  2018-07-10  8:53       ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 46+ messages in thread
From: Peter Xu @ 2018-07-10  3:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela

On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > Based on the series to unbreak postcopy:
> > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > This series introduce a new postcopy recovery test.  The new test
> > > actually helped me to identify two bugs there so fix them as well
> > > before 3.0 release.
> > > 
> > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > >          found a bit confusing during debugging the problem.
> > > 
> > > Patch 2-3: two bug fixes that address different issues.  Please see
> > >            the commit log for more information.
> > > 
> > > Patch 4-9: add the postcopy recovery unit test.
> > > 
> > > Please review.  Thanks,
> > 
> > Queued
> 
> Hi Peter,
>   There's a problem in there somewhere;  I'm getting
> an intermittent failure of the test if I run a make check -j 8    on my
> laptop.  Just running two copies of tests/migration-test in parallel
> sometimes triggers it (but not if I turn on QTEST_LOG!).
> But it's always failing with:
> 
>   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))

Hmm, so this should be a race.  I suspect it's because destination VM
hasn't reached the correct state when sending the recovery command.

Could you help to try these two tiny patches to see whether it can fix
the problem?

================

commit d875ea1a98932174e3fa202859b65df26def174d
Author: Peter Xu <peterx@redhat.com>
Date:   Tue Jul 10 11:17:24 2018 +0800

    migration: show pause/recover state on dst host

    These two states will be missing when doing "query-migrate" on
    destination VM.  Add these states so that we can get the query results
    as expected.

    Signed-off-by: Peter Xu <peterx@redhat.com>

diff --git a/migration/migration.c b/migration/migration.c
index 0404c53215..8d56d56930 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -911,6 +911,8 @@ static void fill_destination_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_CANCELLED:
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_FAILED:
     case MIGRATION_STATUS_COLO:
         info->has_status = true;

================

commit 9fa7fc773961cd0ea0b5f70a166def0d8aebf464
Author: Peter Xu <peterx@redhat.com>
Date:   Tue Jul 10 11:18:48 2018 +0800

    tests: don't send recovery cmd until dst pauses

    Signed-off-by: Peter Xu <peterx@redhat.com>

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 96e69dab99..45558446f1 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -646,6 +646,13 @@ static void test_postcopy_recovery(void)
      */
     migrate_pause(from);

+    /*
+     * Wait for destination side to reach postcopy-paused state.  The
+     * migrate-recover command can only succeed if destination machine
+     * is in the paused state
+     */
+    wait_for_migration_status(to, "postcopy-paused");
+
     /*
      * Create a new socket to emulate a new channel that is different
      * from the broken migration channel; tell the destination to

================

Thanks!

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-10  3:27     ` Peter Xu
@ 2018-07-10  8:53       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-10  8:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > Based on the series to unbreak postcopy:
> > > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > This series introduce a new postcopy recovery test.  The new test
> > > > actually helped me to identify two bugs there so fix them as well
> > > > before 3.0 release.
> > > > 
> > > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > > >          found a bit confusing during debugging the problem.
> > > > 
> > > > Patch 2-3: two bug fixes that address different issues.  Please see
> > > >            the commit log for more information.
> > > > 
> > > > Patch 4-9: add the postcopy recovery unit test.
> > > > 
> > > > Please review.  Thanks,
> > > 
> > > Queued
> > 
> > Hi Peter,
> >   There's a problem in there somewhere;  I'm getting
> > an intermittent failure of the test if I run a make check -j 8    on my
> > laptop.  Just running two copies of tests/migration-test in parallel
> > sometimes triggers it (but not if I turn on QTEST_LOG!).
> > But it's always failing with:
> > 
> >   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
> 
> Hmm, so this should be a race.  I suspect it's because destination VM
> hasn't reached the correct state when sending the recovery command.
> 
> Could you help to try these two tiny patches to see whether it can fix
> the problem?

Yes, this seems to work; even running 6 in parallel.

Dave

> ================
> 
> commit d875ea1a98932174e3fa202859b65df26def174d
> Author: Peter Xu <peterx@redhat.com>
> Date:   Tue Jul 10 11:17:24 2018 +0800
> 
>     migration: show pause/recover state on dst host
> 
>     These two states will be missing when doing "query-migrate" on
>     destination VM.  Add these states so that we can get the query results
>     as expected.
> 
>     Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0404c53215..8d56d56930 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -911,6 +911,8 @@ static void fill_destination_migration_info(MigrationInfo *info)
>      case MIGRATION_STATUS_CANCELLED:
>      case MIGRATION_STATUS_ACTIVE:
>      case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
>      case MIGRATION_STATUS_FAILED:
>      case MIGRATION_STATUS_COLO:
>          info->has_status = true;
> 
> ================
> 
> commit 9fa7fc773961cd0ea0b5f70a166def0d8aebf464
> Author: Peter Xu <peterx@redhat.com>
> Date:   Tue Jul 10 11:18:48 2018 +0800
> 
>     tests: don't send recovery cmd until dst pauses
> 
>     Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 96e69dab99..45558446f1 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -646,6 +646,13 @@ static void test_postcopy_recovery(void)
>       */
>      migrate_pause(from);
> 
> +    /*
> +     * Wait for destination side to reach postcopy-paused state.  The
> +     * migrate-recover command can only succeed if destination machine
> +     * is in the paused state
> +     */
> +    wait_for_migration_status(to, "postcopy-paused");
> +
>      /*
>       * Create a new socket to emulate a new channel that is different
>       * from the broken migration channel; tell the destination to
> 
> ================
> 
> Thanks!
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
  2018-07-06 12:46     ` Balamuruhan S
@ 2018-07-12  8:50       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 46+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-12  8:50 UTC (permalink / raw)
  To: Balamuruhan S; +Cc: peterx, qemu-devel

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > Based on the series to unbreak postcopy:
> > > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > This series introduce a new postcopy recovery test.  The new test
> > > > actually helped me to identify two bugs there so fix them as well
> > > > before 3.0 release.
> > > > 
> > > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > > >          found a bit confusing during debugging the problem.
> > > > 
> > > > Patch 2-3: two bug fixes that address different issues.  Please see
> > > >            the commit log for more information.
> > > > 
> > > > Patch 4-9: add the postcopy recovery unit test.
> > > > 
> > > > Please review.  Thanks,
> > > 
> > > Queued
> > 
> > Hi Peter,
> >   There's a problem in there somewhere;  I'm getting
> > an intermittent failure of the test if I run a make check -j 8    on my
> > laptop.  Just running two copies of tests/migration-test in parallel
> > sometimes triggers it (but not if I turn on QTEST_LOG!).
> > But it's always failing with:
> > 
> >   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
> > 
> > Dave
> 
> Hi Peter, Dave,

Hi Bala,

> I have applied this patchset in upstream Qemu to test postcopy
> pause/recovery.

Are you still seeing this with the set that got merged into 3.0-rc0?
The second of your errors looks similar to problems with the race
we had before Peter fixed it; but the set that I merged passed a 'make
check' on a Power box.

Dave

> I observed error after triggering recovery command from source monitor
> where the target is lost and the source remains to be in `postcopy-pause`
> state.
> 
> Please find my observation below,
> 
> Source:
> 
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
> -drive file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
> -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2000::22
> 
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> 
> Source Monitor:
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_set_parameter max-postcopy-bandwidth 4096
> (qemu) migrate -d tcp:127.0.0.1:4444
> (qemu) migrate_start_postcopy
> (qemu) migrate_pause
> (qemu) migrate -r tcp:127.0.0.1:4446
> 
> After triggering recovery, target is lost with the error mentioned below
> and source remains to be in `postcopy-paused` state
> 
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> zero-blocks: off \
> compress: off events: off postcopy-ram: on x-colo: off release-ram: off
> block: off return-path: off pause-before-switchover: off x-multifd: off \
> dirty-bitmaps: off
> postcopy-blocktime: off late-block-activate: off 
> Migration status: postcopy-recover
> total time: 78818 milliseconds
> expected downtime: 300 milliseconds
> setup: 169 milliseconds
> transferred ram: 177749 kbytes
> throughput: 63.72 mbps
> remaining ram: 28061376 kbytes
> total ram: 67109120 kbytes
> duplicate: 9742102 pages
> skipped: 0 pages
> normal: 22986 pages
> normal bytes: 91944 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> dirty pages rate: 1273187 pages
> postcopy request count: 236
> 
> 
> Target:
> 
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
> -drive file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
> -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2001::22 -incoming tcp:127.0.0.1:4444
> 
> 
> qemu-system-ppc64: check_section_footer: Read section footer failed: -5
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> qemu-system-ppc64: Not a migration stream
> qemu-system-ppc64: load of migration failed: Invalid argument
> 
> 
> Target Monitor:
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_recover tcp:127.0.0.1:4446
> (qemu) Connection closed by foreign host.
> 
> QTest:
> 
> Also with respect to Qtest, I have tested it and the recovery test
> doesn't complete as it waits on the source for "completed" but due to this
> issue source remains to be in `postcopy-paused`
> 
> `migrate_postcopy_complete(from, to);`
> 
> but it actually doesn't end.
> 
> As it did not complete, I cancelled it forcefully
> 
> # time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 ./tests/migration-test
> /ppc64/migration/deprecated: OK
> /ppc64/migration/bad_dest: OK
> /ppc64/migration/postcopy/unix: OK
> /ppc64/migration/postcopy/recovery: ^C
> 
> real    21m55.176s
> user    2m28.800s
> sys 4m55.980s
> 
> -- Bala
> > 
> > > > Peter Xu (9):
> > > >   migration: simplify check to use qemu file buffer
> > > >   migration: loosen recovery check when load vm
> > > >   migration: fix incorrect bitmap size calculation
> > > >   tests: introduce migrate_postcopy_* helpers
> > > >   tests: allow migrate() to take extra flags
> > > >   tests: introduce migrate_query*() helpers
> > > >   tests: introduce wait_for_migration_status()
> > > >   tests: add postcopy recovery test
> > > >   tests: hide stderr for postcopy recovery test
> > > > 
> > > >  migration/ram.c        |  21 +++--
> > > >  migration/savevm.c     |  16 ++--
> > > >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > > >  3 files changed, 176 insertions(+), 59 deletions(-)
> > > > 
> > > > -- 
> > > > 2.17.1
> > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-07-12  8:50 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
2018-07-05  9:01   ` Dr. David Alan Gilbert
2018-07-05  9:11     ` Peter Xu
2018-07-05 12:59   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
2018-07-05  9:15   ` Dr. David Alan Gilbert
2018-07-05  9:31     ` Peter Xu
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
2018-07-05  9:38   ` Dr. David Alan Gilbert
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
2018-07-05  9:31   ` Balamuruhan S
2018-07-06  2:19     ` Peter Xu
2018-07-06  6:17       ` Balamuruhan S
2018-07-05  9:59   ` Dr. David Alan Gilbert
2018-07-05 13:03   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
2018-07-05 10:18   ` Dr. David Alan Gilbert
2018-07-05 13:05   ` Juan Quintela
2018-07-06 10:36   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
2018-07-05 10:23   ` Dr. David Alan Gilbert
2018-07-05 13:07     ` Juan Quintela
2018-07-05 10:59   ` Balamuruhan S
2018-07-05 13:06   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
2018-07-05 10:27   ` Dr. David Alan Gilbert
2018-07-05 13:07   ` Juan Quintela
2018-07-06 10:41   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
2018-07-05 10:30   ` Dr. David Alan Gilbert
2018-07-05 13:08   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
2018-07-05 10:36   ` Dr. David Alan Gilbert
2018-07-05 13:09   ` Juan Quintela
2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
2018-07-06 10:56   ` Dr. David Alan Gilbert
2018-07-06 11:45     ` Balamuruhan S
2018-07-06 12:46     ` Balamuruhan S
2018-07-12  8:50       ` Dr. David Alan Gilbert
2018-07-10  3:27     ` Peter Xu
2018-07-10  8:53       ` Dr. David Alan Gilbert
2018-07-10  1:56 ` Balamuruhan S
2018-07-10  3:07   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.