All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
@ 2017-12-05  6:52 Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile Peter Xu
                   ` (30 more replies)
  0 siblings, 31 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Tree is pushed here for better reference and testing (online tree
includes monitor OOB series):

  https://github.com/xzpeter/qemu/tree/postcopy-recover-all

This version removed quite a few patches related to migrate-incoming,
instead I introduced a new command "migrate-recover" to trigger the
recovery channel on destination side to simplify the code.

To test this two series altogether, please checkout above tree and
build.  Note: to test on small and single host, one need to disable
full bandwidth postcopy migration otherwise it'll complete very fast.
Basically a simple patch like this would help:

diff --git a/migration/migration.c b/migration/migration.c
index 4de3b551fe..c0206023d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      * will notice we're in POSTCOPY_ACTIVE and not actually
      * wrap their state up here
      */
-    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
+    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
     if (migrate_postcopy_ram()) {
         /* Ping just for debugging, helps line traces up */
         qemu_savevm_send_ping(ms->to_dst_file, 2);

This patch is included already in above github tree.  Please feel free
to drop this patch when want to test on big machines and between real
hosts.

Detailed Test Procedures (QMP only)
===================================

1. start source QEMU.

$qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
     -smp 4 -m 1G -qmp stdio \
     -name peter-vm,debug-threads=on \
     -netdev user,id=net0 \
     -device e1000,netdev=net0 \
     -global migration.x-max-bandwidth=4096 \
     -global migration.x-postcopy-ram=on \
     /images/fedora-25.qcow2

2. start destination QEMU.

$qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
     -smp 4 -m 1G -qmp stdio \
     -name peter-vm,debug-threads=on \
     -netdev user,id=net0 \
     -device e1000,netdev=net0 \
     -global migration.x-max-bandwidth=4096 \
     -global migration.x-postcopy-ram=on \
     -incoming tcp:0.0.0.0:5555 \
     /images/fedora-25.qcow2

3. On source, do QMP handshake as normal:

  {"execute": "qmp_capabilities"}
  {"return": {}}

4. On destination, do QMP handshake to enable OOB:

  {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
  {"return": {}}

5. On source, trigger initial migrate command, switch to postcopy:

  {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
  {"return": {}}
  {"execute": "query-migrate"}
  {"return": {"expected-downtime": 300, "status": "active", ...}}
  {"execute": "migrate-start-postcopy"}
  {"return": {}}
  {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
  {"execute": "query-migrate"}
  {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}

6. On source, manually trigger a "fake network down" using
   "migrate-cancel" command:

  {"execute": "migrate_cancel"}
  {"return": {}}

  During postcopy, it'll not really cancel the migration, but pause
  it.  On both sides, we should see this on stderr:

  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.

  It means now both sides are in postcopy-pause state.

7. (Optional) On destination side, let's try to hang the main thread
   using the new x-oob-test command, providing a "lock=true" param:

   {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
    "arguments": { "lock": true } }

   After sending this command, we should not see any "return", because
   main thread is blocked already.  But we can still use the monitor
   since the monitor now has dedicated IOThread.

8. On destination side, provide a new incoming port using the new
   command "migrate-recover" (note that if step 7 is carried out, we
   _must_ use OOB form, otherwise the command will hang.  With OOB,
   this command will return immediately):

  {"execute": "migrate-recover", "id": "recover-cmd",
   "arguments": { "uri": "tcp:localhost:5556" },
   "control": { "run-oob": true } }
  {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
   "event": "MIGRATION", "data": {"status": "setup"}}
  {"return": {}, "id": "recover-cmd"}

   We can see that the command will success even if main thread is
   locked up.

9. (Optional) This step is only needed if step 7 is carried out. On
   destination, let's unlock the main thread before resuming the
   migration, this time with "lock=false" to unlock the main thread
   (since system running needs the main thread). Note that we _must_
   use OOB command here too:

  {"execute": "x-oob-test", "id": "unlock-dispatcher",
   "arguments": { "lock": false }, "control": { "run-oob": true } }
  {"return": {}, "id": "unlock-dispatcher"}
  {"return": {}, "id": "lock-dispatcher-cmd"}

  Here the first "return" is the reply to the unlock command, the
  second "return" is the reply to the lock command.  After this
  command, main thread is released.

10. On source, resume the postcopy migration:

  {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }}
  {"return": {}}
  {"execute": "query-migrate"}
  {"return": {"status": "completed", ...}}

Here's the change log:

v5:
- add some more r-bs
- fix error path in ram_load_postcopy to always check on "ret" [Dave]
- move init/destroy of three new sems into migration object
  init/finalize functions
- dropped patch "migration: delay the postcopy-active state switch",
  meanwhile touch up patch 6 to check against
  POSTCOPY_INCOMING_RUNNING state when trying to switch to
  postcopy-pause state. [Dave]
- drop two patches that introduce qmp/hmp of migrate-pause, instead
  re-use migrate-cancel to do manual trigger of postcopy recovery.
- add a new patch to let migrate_cancel to pause migration if it's
  already in postcopy phase.
- add a new command "migrate-recover" to re-assign the incoming port,
  instead of reusing migrate-incoming.
- since now I used migrate-recover command instead of migrate-incoming
  itself, I dropped quite a few patches that are not really relevant
  now, so the series got smaller:
        migration: return incoming task tag for sockets
        migration: return incoming task tag for exec
        migration: return incoming task tag for fd
        migration: store listen task tag
        migration: allow migrate_incoming for paused VM

v4:
- fix two compile errors that patchew reported
- for QMP: do s/2.11/2.12/g
- fix migrate-incoming logic to be more strict

v3:
- add r-bs correspondingly
- in ram_load_postcopy() capture error if postcopy_place_page() failed
  [Dave]
- remove "break" if there is a "goto" before that [Dave]
- ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more
  print sizes [Dave]
- remove RAMState.ramblock_to_sync, instead use local counter [Dave]
- init tag in tcp_start_incoming_migration() [Dave]
- more traces when transmiting the recv bitmap [Dave]
- postcopy_pause_incoming(): do shutdown before taking rp lock [Dave]
- add one more patch to postpone the state switch of postcopy-active [Dave]
- refactor the migrate_incoming handling according to the email
  discussion [Dave]
- add manual trigger to pause postcopy (two new patches added to
  introduce "migrate-pause" command for QMP/HMP). [Dave]

v2:
- rebased to alexey's received bitmap v9
- add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21
- patch 1: use target page size to calc bitmap [Dave]
- patch 3: move trace_*() after EINTR check [Dave]
- patch 4: dropped since I can use bitmap_complement() [Dave]
- patch 7: check file error right after data is read in both
  qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),
  meanwhile also check in check_section_footer() [Dave]
- patch 8/9: fix error_report/commit message in both patches [Dave]
- patch 10: dropped (new parameter "x-postcopy-fast")
- patch 11: split the "postcopy-paused" patch into two, one to
  introduce the new state, the other to implement the logic. Also,
  print something when paused [Dave]
- patch 17: removed do_resume label, introduced migration_prepare()
  [Dave]
- patch 18: removed do_pause label using a new loop [Dave]
- patch 20: removed incorrect comment [Dave]
- patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add
  trace in loadvm_handle_recv_bitmap() [Dave]
- patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit
  machines. More info in the commit message update.
- patch 23: add one check on migration state [Dave]
- patch 24: use macro instead of magic 1 [Dave]
- patch 26: use more trace_*() instead of one, and use one sem to
  replace mutex+cond. [Dave]
- move sem init/destroy into migration_instance_init() and
  migration_instance_finalize (new function after rebase).
- patch 29: squashed this patch most into:
  "migration: implement "postcopy-pause" src logic" [Dave]
- split the two fix patches out of the series
- fixed two places where I misused "wake/woke/woken". [Dave]
- add new patch "bitmap: provide to_le/from_le helpers" to solve the
  bitmap endianess issue [Dave]
- appended migrate_incoming series to this series, since that one is
  depending on the paused state.  Using explicit g_source_remove() for
  listening ports [Dan]

FUTURE TODO LIST
- support migrate_cancel during PAUSED/RECOVER state
- when anything wrong happens during PAUSED/RECOVER, switching back to
  PAUSED state on both sides

As we all know that postcopy migration has a potential risk to lost
the VM if the network is broken during the migration. This series
tries to solve the problem by allowing the migration to pause at the
failure point, and do recovery after the link is reconnected.

There was existing work on this issue from Md Haris Iqbal:

https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html

This series is a totally re-work of the issue, based on Alexey
Perevalov's recved bitmap v8 series:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html

Two new status are added to support the migration (used on both
sides):

  MIGRATION_STATUS_POSTCOPY_PAUSED
  MIGRATION_STATUS_POSTCOPY_RECOVER

The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
network failure is detected. It is a phase that we'll be in for a long
time as long as the failure is detected, and we'll be there until a
recovery is triggered.  In this state, all the threads (on source:
send thread, return-path thread; destination: ram-load thread,
page-fault thread) will be halted.

The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
a recovery, both source/destination VM will jump into this stage, do
whatever it needs to prepare the recovery (e.g., currently the most
important thing is to synchronize the dirty bitmap, please see commit
messages for more information). After the preparation is ready, the
source will do the final handshake with destination, then both sides
will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.

New commands/messages are defined as well to satisfy the need:

MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
delivering received bitmaps

MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
handshake of postcopy recovery.

Here's some more details on how the whole failure/recovery routine is
happened:

- start migration
- ... (switch from precopy to postcopy)
- both sides are in "postcopy-active" state
- ... (failure happened, e.g., network unplugged)
- both sides switch to "postcopy-paused" state
  - all the migration threads are stopped on both sides
- ... (both VMs hanged)
- ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
  source side, "-r" means "recover")
- both sides switch to "postcopy-recover" state
  - on source: send-thread, return-path-thread will be waked up
  - on dest: ram-load-thread waked up, fault-thread still paused
- source calls new savevmhandler hook resume_prepare() (currently,
  only ram is providing the hook):
  - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
    - src sends MIG_CMD_RECV_BITMAP to dst
    - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
      - src uses the recved bitmap to rebuild dirty bitmap
- source do final handshake with destination
  - src sends MIG_CMD_RESUME to dst, telling "src is ready"
    - when dst receives the command, fault thread will be waked up,
      meanwhile, dst switch back to "postcopy-active"
  - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
    - when src receives the ack, state switch to "postcopy-active"
- postcopy migration continued

Testing:

As I said, it's still an extremely simple test. I used socat to create
a socket bridge:

  socat tcp-listen:6666 tcp-connect:localhost:5555 &

Then do the migration via the bridge. I emulated the network failure
by killing the socat process (bridge down), then tries to recover the
migration using the other channel (default dst channel). It looks
like:

        port:6666    +------------------+
        +----------> | socat bridge [1] |-------+
        |            +------------------+       |
        |         (Original channel)            |
        |                                       | port: 5555
     +---------+  (Recovery channel)            +--->+---------+
     | src VM  |------------------------------------>| dst VM  |
     +---------+                                     +---------+

Known issues/notes:

- currently destination listening port still cannot change. E.g., the
  recovery should be using the same port on destination for
  simplicity. (on source, we can specify new URL)

- the patch: "migration: let dst listen on port always" is still
  hacky, it just kept the incoming accept open forever for now...

- some migration numbers might still be inaccurate, like total
  migration time, etc. (But I don't really think that matters much
  now)

- the patches are very lightly tested.

- Dave reported one problem that may hang destination main loop thread
  (one vcpu thread holds the BQL) and the rest. I haven't encountered
  it yet, but it does not mean this series can survive with it.

- other potential issues that I may have forgotten or unnoticed...

Anyway, the work is still in preliminary stage. Any suggestions and
comments are greatly welcomed.  Thanks.

Peter Xu (28):
  migration: better error handling with QEMUFile
  migration: reuse mis->userfault_quit_fd
  migration: provide postcopy_fault_thread_notify()
  migration: new postcopy-pause state
  migration: implement "postcopy-pause" src logic
  migration: allow dst vm pause on postcopy
  migration: allow src return path to pause
  migration: allow send_rq to fail
  migration: allow fault thread to pause
  qmp: hmp: add migrate "resume" option
  migration: pass MigrationState to migrate_init()
  migration: rebuild channel on source
  migration: new state "postcopy-recover"
  migration: wakeup dst ram-load-thread for recover
  migration: new cmd MIG_CMD_RECV_BITMAP
  migration: new message MIG_RP_MSG_RECV_BITMAP
  migration: new cmd MIG_CMD_POSTCOPY_RESUME
  migration: new message MIG_RP_MSG_RESUME_ACK
  migration: introduce SaveVMHandlers.resume_prepare
  migration: synchronize dirty bitmap for resume
  migration: setup ramstate for resume
  migration: final handshake for the resume
  migration: free SocketAddress where allocated
  migration: init dst in migration_object_init too
  io: let watcher of the channel run in same ctx
  migration: allow migrate_cancel to pause postcopy
  qmp/migration: new command migrate-recover
  hmp/migration: add migrate_recover command

 hmp-commands.hx              |  28 ++-
 hmp.c                        |  14 +-
 hmp.h                        |   1 +
 include/migration/register.h |   2 +
 io/channel.c                 |   2 +-
 migration/migration.c        | 549 ++++++++++++++++++++++++++++++++++++++-----
 migration/migration.h        |  24 +-
 migration/postcopy-ram.c     | 110 +++++++--
 migration/postcopy-ram.h     |   2 +
 migration/ram.c              | 247 ++++++++++++++++++-
 migration/ram.h              |   3 +
 migration/savevm.c           | 233 +++++++++++++++++-
 migration/savevm.h           |   3 +
 migration/socket.c           |   4 +-
 migration/trace-events       |  21 ++
 qapi/migration.json          |  35 ++-
 16 files changed, 1172 insertions(+), 106 deletions(-)

-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05 11:40   ` Dr. David Alan Gilbert
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 02/28] migration: reuse mis->userfault_quit_fd Peter Xu
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

If the postcopy down due to some reason, we can always see this on dst:

  qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000

However in most cases that's not the real issue. The problem is that
qemu_get_be16() has no way to show whether the returned data is valid or
not, and we are _always_ assuming it is valid. That's possibly not wise.

The best approach to solve this would be: refactoring QEMUFile interface
to allow the APIs to return error if there is. However it needs quite a
bit of work and testing. For now, let's explicitly check the validity
first before using the data in all places for qemu_get_*().

This patch tries to fix most of the cases I can see. Only if we are with
this, can we make sure we are processing the valid data, and also can we
make sure we can capture the channel down events correctly.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c |  5 +++++
 migration/ram.c       | 21 +++++++++++++++++----
 migration/savevm.c    | 40 ++++++++++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c0206023d7..eae34d0524 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1708,6 +1708,11 @@ static void *source_return_path_thread(void *opaque)
         header_type = qemu_get_be16(rp);
         header_len = qemu_get_be16(rp);
 
+        if (qemu_file_get_error(rp)) {
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
         if (header_type >= MIG_RP_MSG_MAX ||
             header_type == MIG_RP_MSG_INVALID) {
             error_report("RP: Received invalid message 0x%04x length 0x%04x",
diff --git a/migration/ram.c b/migration/ram.c
index 021d583b9b..f159c16f6a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2696,6 +2696,16 @@ static int ram_load_postcopy(QEMUFile *f)
         uint8_t ch;
 
         addr = qemu_get_be64(f);
+
+        /*
+         * If qemu file error, we should stop here, and then "addr"
+         * may be invalid
+         */
+        ret = qemu_file_get_error(f);
+        if (ret) {
+            break;
+        }
+
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
@@ -2776,9 +2786,15 @@ static int ram_load_postcopy(QEMUFile *f)
             error_report("Unknown combination of migration flags: %#x"
                          " (postcopy mode)", flags);
             ret = -EINVAL;
+            break;
+        }
+
+        /* Detect for any possible file errors */
+        if (!ret && qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
         }
 
-        if (place_needed) {
+        if (!ret && place_needed) {
             /* This gets called at the last target page in the host page */
             void *place_dest = host + TARGET_PAGE_SIZE - block->page_size;
 
@@ -2790,9 +2806,6 @@ static int ram_load_postcopy(QEMUFile *f)
                                           place_source, block);
             }
         }
-        if (!ret) {
-            ret = qemu_file_get_error(f);
-        }
     }
 
     return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index b7908f62be..8814793255 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1765,6 +1765,11 @@ static int loadvm_process_command(QEMUFile *f)
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
+    /* Check validity before continue processing of cmds */
+    if (qemu_file_get_error(f)) {
+        return qemu_file_get_error(f);
+    }
+
     trace_loadvm_process_command(cmd, len);
     if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
         error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
@@ -1830,6 +1835,7 @@ static int loadvm_process_command(QEMUFile *f)
  */
 static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 {
+    int ret;
     uint8_t read_mark;
     uint32_t read_section_id;
 
@@ -1840,6 +1846,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 
     read_mark = qemu_get_byte(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Read section footer failed: %d",
+                     __func__, ret);
+        return false;
+    }
+
     if (read_mark != QEMU_VM_SECTION_FOOTER) {
         error_report("Missing section footer for %s", se->idstr);
         return false;
@@ -1875,6 +1888,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     instance_id = qemu_get_be32(f);
     version_id = qemu_get_be32(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Failed to read instance/version ID: %d",
+                     __func__, ret);
+        return ret;
+    }
+
     trace_qemu_loadvm_state_section_startfull(section_id, idstr,
             instance_id, version_id);
     /* Find savevm section */
@@ -1922,6 +1942,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 
     section_id = qemu_get_be32(f);
 
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        error_report("%s: Failed to read section ID: %d",
+                     __func__, ret);
+        return ret;
+    }
+
     trace_qemu_loadvm_state_section_partend(section_id);
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (se->load_section_id == section_id) {
@@ -1989,8 +2016,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
     uint8_t section_type;
     int ret = 0;
 
-    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
-        ret = 0;
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
         trace_qemu_loadvm_state_section(section_type);
         switch (section_type) {
         case QEMU_VM_SECTION_START:
@@ -2014,6 +2047,9 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
                 goto out;
             }
             break;
+        case QEMU_VM_EOF:
+            /* This is the end of migration */
+            goto out;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 02/28] migration: reuse mis->userfault_quit_fd
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 03/28] migration: provide postcopy_fault_thread_notify() Peter Xu
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

It was only used for quitting the page fault thread before. Let it be
something more useful - now we can use it to notify a "wake" for the
page fault thread (for any reason), and it only means "quit" if the
fault_thread_quit is set.

Since we changed what it does, renaming it to userfault_event_fd.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  6 ++++--
 migration/postcopy-ram.c | 29 ++++++++++++++++++++---------
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 663415fe48..6d36400975 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -36,6 +36,8 @@ struct MigrationIncomingState {
     bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
+    /* Set this when we want the fault thread to quit */
+    bool           fault_thread_quit;
 
     bool           have_listen_thread;
     QemuThread     listen_thread;
@@ -43,8 +45,8 @@ struct MigrationIncomingState {
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
-    /* To tell the fault_thread to quit */
-    int       userfault_quit_fd;
+    /* To notify the fault_thread to wake, e.g., when need to quit */
+    int       userfault_event_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index bec6c2c66b..9ad4f20f82 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -387,17 +387,18 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
          * currently be at 0, we're going to increment it to 1
          */
         tmp64 = 1;
-        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+        atomic_set(&mis->fault_thread_quit, 1);
+        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
             trace_postcopy_ram_incoming_cleanup_join();
             qemu_thread_join(&mis->fault_thread);
         } else {
             /* Not much we can do here, but may as well report it */
-            error_report("%s: incrementing userfault_quit_fd: %s", __func__,
+            error_report("%s: incrementing userfault_event_fd: %s", __func__,
                          strerror(errno));
         }
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
-        close(mis->userfault_quit_fd);
+        close(mis->userfault_event_fd);
         mis->have_fault_thread = false;
     }
 
@@ -520,7 +521,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
         pfd[0].fd = mis->userfault_fd;
         pfd[0].events = POLLIN;
         pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].fd = mis->userfault_event_fd;
         pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
         pfd[1].revents = 0;
 
@@ -530,8 +531,18 @@ static void *postcopy_ram_fault_thread(void *opaque)
         }
 
         if (pfd[1].revents) {
-            trace_postcopy_ram_fault_thread_quit();
-            break;
+            uint64_t tmp64 = 0;
+
+            /* Consume the signal */
+            if (read(mis->userfault_event_fd, &tmp64, 8) != 8) {
+                /* Nothing obviously nicer than posting this error. */
+                error_report("%s: read() failed", __func__);
+            }
+
+            if (atomic_read(&mis->fault_thread_quit)) {
+                trace_postcopy_ram_fault_thread_quit();
+                break;
+            }
         }
 
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
@@ -610,9 +621,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     }
 
     /* Now an eventfd we use to tell the fault-thread to quit */
-    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
-    if (mis->userfault_quit_fd == -1) {
-        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+    mis->userfault_event_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_event_fd == -1) {
+        error_report("%s: Opening userfault_event_fd: %s", __func__,
                      strerror(errno));
         close(mis->userfault_fd);
         return -1;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 03/28] migration: provide postcopy_fault_thread_notify()
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 02/28] migration: reuse mis->userfault_quit_fd Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 04/28] migration: new postcopy-pause state Peter Xu
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

A general helper to notify the fault thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/postcopy-ram.c | 35 ++++++++++++++++++++---------------
 migration/postcopy-ram.h |  2 ++
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9ad4f20f82..032abfbf1a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -377,25 +377,15 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     trace_postcopy_ram_incoming_cleanup_entry();
 
     if (mis->have_fault_thread) {
-        uint64_t tmp64;
-
         if (qemu_ram_foreach_block(cleanup_range, mis)) {
             return -1;
         }
-        /*
-         * Tell the fault_thread to exit, it's an eventfd that should
-         * currently be at 0, we're going to increment it to 1
-         */
-        tmp64 = 1;
+        /* Let the fault thread quit */
         atomic_set(&mis->fault_thread_quit, 1);
-        if (write(mis->userfault_event_fd, &tmp64, 8) == 8) {
-            trace_postcopy_ram_incoming_cleanup_join();
-            qemu_thread_join(&mis->fault_thread);
-        } else {
-            /* Not much we can do here, but may as well report it */
-            error_report("%s: incrementing userfault_event_fd: %s", __func__,
-                         strerror(errno));
-        }
+        postcopy_fault_thread_notify(mis);
+        trace_postcopy_ram_incoming_cleanup_join();
+        qemu_thread_join(&mis->fault_thread);
+
         trace_postcopy_ram_incoming_cleanup_closeuf();
         close(mis->userfault_fd);
         close(mis->userfault_event_fd);
@@ -824,6 +814,21 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 /* ------------------------------------------------------------------------- */
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis)
+{
+    uint64_t tmp64 = 1;
+
+    /*
+     * Wakeup the fault_thread.  It's an eventfd that should currently
+     * be at 0, we're going to increment it to 1
+     */
+    if (write(mis->userfault_event_fd, &tmp64, 8) != 8) {
+        /* Not much we can do here, but may as well report it */
+        error_report("%s: incrementing failed: %s", __func__,
+                     strerror(errno));
+    }
+}
+
 /**
  * postcopy_discard_send_init: Called at the start of each RAMBlock before
  *   asking to discard individual ranges.
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 77ea0fd264..14f6cadcbd 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -114,4 +114,6 @@ PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
 
+void postcopy_fault_thread_notify(MigrationIncomingState *mis);
+
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 04/28] migration: new postcopy-pause state
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (2 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 03/28] migration: provide postcopy_fault_thread_notify() Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 05/28] migration: implement "postcopy-pause" src logic Peter Xu
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Introducing a new state "postcopy-paused", which can be used when the
postcopy migration is paused. It is targeted for postcopy network
failure recovery.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 2 ++
 qapi/migration.json   | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index eae34d0524..dd270f8bc5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -530,6 +530,7 @@ static bool migration_is_setup_or_active(int state)
     switch (state) {
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -609,6 +610,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
diff --git a/qapi/migration.json b/qapi/migration.json
index 03f57c9616..ff336a4f5c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -89,6 +89,8 @@
 #
 # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
 #
+# @postcopy-paused: during postcopy but paused. (since 2.12)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -106,7 +108,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed', 'colo',
+            'active', 'postcopy-active', 'postcopy-paused',
+            'completed', 'failed', 'colo',
             'pre-switchover', 'device' ] }
 
 ##
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 05/28] migration: implement "postcopy-pause" src logic
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (3 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 04/28] migration: new postcopy-pause state Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy Peter Xu
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.

If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration/migration.h  |  3 ++
 migration/trace-events |  1 +
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index dd270f8bc5..f69a48ac63 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2159,6 +2159,80 @@ bool migrate_colo_enabled(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
 }
 
+typedef enum MigThrError {
+    /* No error detected */
+    MIG_THR_ERR_NONE = 0,
+    /* Detected error, but resumed successfully */
+    MIG_THR_ERR_RECOVERED = 1,
+    /* Detected fatal error, need to exit */
+    MIG_THR_ERR_FATAL = 2,
+} MigThrError;
+
+/*
+ * We don't return until we are in a safe state to continue current
+ * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
+ * MIG_THR_ERR_FATAL if unrecovery failure happened.
+ */
+static MigThrError postcopy_pause(MigrationState *s)
+{
+    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    /* Current channel is possibly broken. Release it. */
+    assert(s->to_dst_file);
+    qemu_file_shutdown(s->to_dst_file);
+    qemu_fclose(s->to_dst_file);
+    s->to_dst_file = NULL;
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    /*
+     * We wait until things fixed up. Then someone will setup the
+     * status back for us.
+     */
+    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&s->postcopy_pause_sem);
+    }
+
+    trace_postcopy_pause_continued();
+
+    return MIG_THR_ERR_RECOVERED;
+}
+
+static MigThrError migration_detect_error(MigrationState *s)
+{
+    int ret;
+
+    /* Try to detect any file errors */
+    ret = qemu_file_get_error(s->to_dst_file);
+
+    if (!ret) {
+        /* Everything is fine */
+        return MIG_THR_ERR_NONE;
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+        /*
+         * For postcopy, we allow the network to be down for a
+         * while. After that, it can be continued by a
+         * recovery phase.
+         */
+        return postcopy_pause(s);
+    } else {
+        /*
+         * For precopy (or postcopy with error outside IO), we fail
+         * with no time.
+         */
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        trace_migration_thread_file_err();
+
+        /* Time to stop the migration, now. */
+        return MIG_THR_ERR_FATAL;
+    }
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -2183,6 +2257,7 @@ static void *migration_thread(void *opaque)
     /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
     enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
     bool enable_colo = migrate_colo_enabled();
+    MigThrError thr_error;
 
     rcu_register_thread();
 
@@ -2255,12 +2330,24 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->to_dst_file)) {
-            migrate_set_state(&s->state, current_active_state,
-                              MIGRATION_STATUS_FAILED);
-            trace_migration_thread_file_err();
+        /*
+         * Try to detect any kind of failures, and see whether we
+         * should stop the migration now.
+         */
+        thr_error = migration_detect_error(s);
+        if (thr_error == MIG_THR_ERR_FATAL) {
+            /* Stop migration */
             break;
+        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
+            /*
+             * Just recovered from a e.g. network failure, reset all
+             * the local variables. This is important to avoid
+             * breaking transferred_bytes and bandwidth calculation
+             */
+            initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+            initial_bytes = 0;
         }
+
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         if (current_time >= initial_time + BUFFER_DELAY) {
             uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
@@ -2468,6 +2555,7 @@ static void migration_instance_finalize(Object *obj)
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
+    qemu_sem_destroy(&ms->postcopy_pause_sem);
 }
 
 static void migration_instance_init(Object *obj)
@@ -2496,6 +2584,8 @@ static void migration_instance_init(Object *obj)
     params->has_x_multifd_channels = true;
     params->has_x_multifd_page_count = true;
     params->has_xbzrle_cache_size = true;
+
+    qemu_sem_init(&ms->postcopy_pause_sem, 0);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 6d36400975..36aaa13f50 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -156,6 +156,9 @@ struct MigrationState
     bool send_configuration;
     /* Whether we send section footer during migration */
     bool send_section_footer;
+
+    /* Needed by postcopy-pause state */
+    QemuSemaphore postcopy_pause_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index 6f29fcc686..da1c63a933 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,7 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (4 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 05/28] migration: implement "postcopy-pause" src logic Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-14 13:10   ` Dr. David Alan Gilbert
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 07/28] migration: allow src return path to pause Peter Xu
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

One note is that here on ram loading thread we cannot detect the
POSTCOPY_ACTIVE state, but we need to detect the more specific
POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
the device states.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  1 +
 migration/migration.h  |  3 +++
 migration/savevm.c     | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/trace-events |  2 ++
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f69a48ac63..6f0d48bbed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -150,6 +150,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         memset(&mis_current, 0, sizeof(MigrationIncomingState));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index 36aaa13f50..55894ecb79 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -61,6 +61,9 @@ struct MigrationIncomingState {
     /* The coroutine we should enter (back) after failover */
     Coroutine *migration_incoming_co;
     QemuSemaphore colo_incoming_sem;
+
+    /* notify PAUSED postcopy incoming migrations to try to continue */
+    QemuSemaphore postcopy_pause_sem_dst;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index 8814793255..1e5fc4f332 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1529,8 +1529,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
  */
 static void *postcopy_ram_listen_thread(void *opaque)
 {
-    QEMUFile *f = opaque;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    QEMUFile *f = mis->from_src_file;
     int load_res;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -1544,6 +1544,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
      */
     qemu_file_set_blocking(f, true);
     load_res = qemu_loadvm_state_main(f, mis);
+
+    /*
+     * This is tricky, but, mis->from_src_file can change after it
+     * returns, when postcopy recovery happened. In the future, we may
+     * want a wrapper for the QEMUFile handle.
+     */
+    f = mis->from_src_file;
+
     /* And non-blocking again so we don't block in any cleanup */
     qemu_file_set_blocking(f, false);
 
@@ -1626,7 +1634,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     /* Start up the listening thread and wait for it to signal ready */
     qemu_sem_init(&mis->listen_thread_sem, 0);
     qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, mis->from_src_file,
+                       postcopy_ram_listen_thread, NULL,
                        QEMU_THREAD_DETACHED);
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
@@ -2011,11 +2019,44 @@ void qemu_loadvm_state_cleanup(void)
     }
 }
 
+/* Return true if we should continue the migration, or false. */
+static bool postcopy_pause_incoming(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_incoming();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    assert(mis->from_src_file);
+    qemu_file_shutdown(mis->from_src_file);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+
+    assert(mis->to_src_file);
+    qemu_file_shutdown(mis->to_src_file);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_fclose(mis->to_src_file);
+    mis->to_src_file = NULL;
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
+    }
+
+    trace_postcopy_pause_incoming_continued();
+
+    return true;
+}
+
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret = 0;
 
+retry:
     while (true) {
         section_type = qemu_get_byte(f);
 
@@ -2060,6 +2101,24 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
+
+        /*
+         * Detect whether it is:
+         *
+         * 1. postcopy running (after receiving all device data, which
+         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
+         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
+         *    still receiving device states).
+         * 2. network failure (-EIO)
+         *
+         * If so, we try to wait for a recovery.
+         */
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            ret == -EIO && postcopy_pause_incoming(mis)) {
+            /* Reset f to point to the newly created channel */
+            f = mis->from_src_file;
+            goto retry;
+        }
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index da1c63a933..bed1646cd6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -100,6 +100,8 @@ open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_continued(void) ""
+postcopy_pause_incoming(void) ""
+postcopy_pause_incoming_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 07/28] migration: allow src return path to pause
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (5 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail Peter Xu
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Let the thread pause for network issues.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--
 migration/migration.h  |  1 +
 migration/trace-events |  2 ++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6f0d48bbed..0dfbbe5673 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1689,6 +1689,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
+/* Return true to retry, false to quit */
+static bool postcopy_pause_return_path_thread(MigrationState *s)
+{
+    trace_postcopy_pause_return_path();
+
+    qemu_sem_wait(&s->postcopy_pause_rp_sem);
+
+    trace_postcopy_pause_return_path_continued();
+
+    return true;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1705,6 +1717,8 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+
+retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1796,13 +1810,28 @@ static void *source_return_path_thread(void *opaque)
             break;
         }
     }
-    if (qemu_file_get_error(rp)) {
+
+out:
+    res = qemu_file_get_error(rp);
+    if (res) {
+        if (res == -EIO) {
+            /*
+             * Maybe there is something we can do: it looks like a
+             * network down issue, and we pause for a recovery.
+             */
+            if (postcopy_pause_return_path_thread(ms)) {
+                /* Reload rp, reset the rest */
+                rp = ms->rp_state.from_dst_file;
+                ms->rp_state.error = false;
+                goto retry;
+            }
+        }
+
         trace_source_return_path_thread_bad_end();
         mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
-out:
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
     return NULL;
@@ -2557,6 +2586,7 @@ static void migration_instance_finalize(Object *obj)
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
+    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
 }
 
 static void migration_instance_init(Object *obj)
@@ -2587,6 +2617,7 @@ static void migration_instance_init(Object *obj)
     params->has_xbzrle_cache_size = true;
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
+    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 55894ecb79..ebb049f692 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -162,6 +162,7 @@ struct MigrationState
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
+    QemuSemaphore postcopy_pause_rp_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/trace-events b/migration/trace-events
index bed1646cd6..a4031cfe00 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,8 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_return_path(void) ""
+postcopy_pause_return_path_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (6 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 07/28] migration: allow src return path to pause Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-14 13:21   ` Dr. David Alan Gilbert
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 09/28] migration: allow fault thread to pause Peter Xu
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

We will not allow failures to happen when sending data from destination
to source via the return path. However it is possible that there can be
errors along the way.  This patch allows the migrate_send_rp_message()
to return error when it happens, and further extended it to
migrate_send_rp_req_pages().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 35 ++++++++++++++++++++++++++++-------
 migration/migration.h |  2 +-
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0dfbbe5673..e3ce22dce4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -199,17 +199,35 @@ static void deferred_incoming_migration(Error **errp)
  * Send a message on the return channel back to the source
  * of the migration.
  */
-static void migrate_send_rp_message(MigrationIncomingState *mis,
-                                    enum mig_rp_message_type message_type,
-                                    uint16_t len, void *data)
+static int migrate_send_rp_message(MigrationIncomingState *mis,
+                                   enum mig_rp_message_type message_type,
+                                   uint16_t len, void *data)
 {
+    int ret = 0;
+
     trace_migrate_send_rp_message((int)message_type, len);
     qemu_mutex_lock(&mis->rp_mutex);
+
+    /*
+     * It's possible that the file handle got lost due to network
+     * failures.
+     */
+    if (!mis->to_src_file) {
+        ret = -EIO;
+        goto error;
+    }
+
     qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
     qemu_put_be16(mis->to_src_file, len);
     qemu_put_buffer(mis->to_src_file, data, len);
     qemu_fflush(mis->to_src_file);
+
+    /* It's possible that qemu file got error during sending */
+    ret = qemu_file_get_error(mis->to_src_file);
+
+error:
     qemu_mutex_unlock(&mis->rp_mutex);
+    return ret;
 }
 
 /* Request a range of pages from the source VM at the given
@@ -219,11 +237,12 @@ static void migrate_send_rp_message(MigrationIncomingState *mis,
  *   Start: Address offset within the RB
  *   Len: Length in bytes required - must be a multiple of pagesize
  */
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
-                               ram_addr_t start, size_t len)
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                              ram_addr_t start, size_t len)
 {
     uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
     size_t msglen = 12; /* start + len */
+    enum mig_rp_message_type msg_type;
 
     *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
     *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
@@ -235,10 +254,12 @@ void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
         bufc[msglen++] = rbname_len;
         memcpy(bufc + msglen, rbname, rbname_len);
         msglen += rbname_len;
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES_ID;
     } else {
-        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+        msg_type = MIG_RP_MSG_REQ_PAGES;
     }
+
+    return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
 void qemu_start_incoming_migration(const char *uri, Error **errp)
diff --git a/migration/migration.h b/migration/migration.h
index ebb049f692..b63cdfbfdb 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -216,7 +216,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
-void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 09/28] migration: allow fault thread to pause
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (7 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 10/28] qmp: hmp: add migrate "resume" option Peter Xu
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    |  1 +
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c       |  3 +++
 migration/trace-events   |  2 ++
 5 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e3ce22dce4..90ff49a271 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -151,6 +151,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
         once = true;
     }
     return &mis_current;
diff --git a/migration/migration.h b/migration/migration.h
index b63cdfbfdb..2751b2dffc 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -64,6 +64,7 @@ struct MigrationIncomingState {
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
+    QemuSemaphore postcopy_pause_sem_fault;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 032abfbf1a..31c290c884 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -485,6 +485,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fault_thread();
+
+    qemu_sem_wait(&mis->postcopy_pause_sem_fault);
+
+    trace_postcopy_pause_fault_thread_continued();
+
+    return true;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -535,6 +546,22 @@ static void *postcopy_ram_fault_thread(void *opaque)
             }
         }
 
+        if (!mis->to_src_file) {
+            /*
+             * Possibly someone tells us that the return path is
+             * broken already using the event. We should hold until
+             * the channel is rebuilt.
+             */
+            if (postcopy_pause_fault_thread(mis)) {
+                last_rb = NULL;
+                /* Continue to read the userfaultfd */
+            } else {
+                error_report("%s: paused but don't allow to continue",
+                             __func__);
+                break;
+            }
+        }
+
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
         if (ret != sizeof(msg)) {
             if (errno == EAGAIN) {
@@ -574,18 +601,33 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset);
 
+retry:
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
          */
         if (rb != last_rb) {
             last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                            rb_offset, qemu_ram_pagesize(rb));
         } else {
             /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, qemu_ram_pagesize(rb));
+            ret = migrate_send_rp_req_pages(mis, NULL,
+                                            rb_offset, qemu_ram_pagesize(rb));
+        }
+
+        if (ret) {
+            /* May be network failure, try to wait for recovery */
+            if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                /* We got reconnected somehow, try to continue */
+                last_rb = NULL;
+                goto retry;
+            } else {
+                /* This is a unavoidable fault */
+                error_report("%s: migrate_send_rp_req_pages() get %d",
+                             __func__, ret);
+                break;
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
diff --git a/migration/savevm.c b/migration/savevm.c
index 1e5fc4f332..ee03fb5c32 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2039,6 +2039,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    /* Notify the fault thread for the invalidated file handle */
+    postcopy_fault_thread_notify(mis);
+
     error_report("Detected IO failure for postcopy. "
                  "Migration paused.");
 
diff --git a/migration/trace-events b/migration/trace-events
index a4031cfe00..32f02cbdcc 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -101,6 +101,8 @@ open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_return_path(void) ""
 postcopy_pause_return_path_continued(void) ""
+postcopy_pause_fault_thread(void) ""
+postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 10/28] qmp: hmp: add migrate "resume" option
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (8 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 09/28] migration: allow fault thread to pause Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 11/28] migration: pass MigrationState to migrate_init() Peter Xu
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

It will be used when we want to resume one paused migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx       | 7 ++++---
 hmp.c                 | 4 +++-
 migration/migration.c | 2 +-
 qapi/migration.json   | 5 ++++-
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 4afd57cf5f..ffcdc34652 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -928,13 +928,14 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
+        .params     = "[-d] [-b] [-i] [-r] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+                      "\n\t\t\t -r to resume a paused migration",
         .cmd        = hmp_migrate,
     },
 
diff --git a/hmp.c b/hmp.c
index 35a7041824..c7e1022283 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1921,10 +1921,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     bool detach = qdict_get_try_bool(qdict, "detach", false);
     bool blk = qdict_get_try_bool(qdict, "blk", false);
     bool inc = qdict_get_try_bool(qdict, "inc", false);
+    bool resume = qdict_get_try_bool(qdict, "resume", false);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc,
+                false, false, true, resume, &err);
     if (err) {
         hmp_handle_error(mon, &err);
         return;
diff --git a/migration/migration.c b/migration/migration.c
index 90ff49a271..ad4cbfc820 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1363,7 +1363,7 @@ bool migration_is_blocked(Error **errp)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_resume, bool resume, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
diff --git a/qapi/migration.json b/qapi/migration.json
index ff336a4f5c..2ea12fcad4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1012,6 +1012,8 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @resume: resume one paused migration, default "off". (since 2.12)
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
@@ -1033,7 +1035,8 @@
 #
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+           '*detach': 'bool', '*resume': 'bool' } }
 
 ##
 # @migrate-incoming:
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 11/28] migration: pass MigrationState to migrate_init()
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (9 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 10/28] qmp: hmp: add migrate "resume" option Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 12/28] migration: rebuild channel on source Peter Xu
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Let the callers take the object, then pass it to migrate_init().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 7 ++-----
 migration/migration.h | 2 +-
 migration/savevm.c    | 5 ++++-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ad4cbfc820..96b3c25505 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1265,10 +1265,8 @@ bool migration_is_idle(void)
     return false;
 }
 
-MigrationState *migrate_init(void)
+void migrate_init(MigrationState *s)
 {
-    MigrationState *s = migrate_get_current();
-
     /*
      * Reinitialise all migration state, except
      * parameters/capabilities that the user set, and
@@ -1294,7 +1292,6 @@ MigrationState *migrate_init(void)
     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-    return s;
 }
 
 static GSList *migration_blockers;
@@ -1402,7 +1399,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
         migrate_set_block_incremental(s, true);
     }
 
-    s = migrate_init();
+    migrate_init(s);
 
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
diff --git a/migration/migration.h b/migration/migration.h
index 2751b2dffc..d052669e1c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -180,7 +180,7 @@ void migrate_fd_error(MigrationState *s, const Error *error);
 
 void migrate_fd_connect(MigrationState *s);
 
-MigrationState *migrate_init(void);
+void migrate_init(MigrationState *s);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index ee03fb5c32..0dda7ca2b4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1256,8 +1256,11 @@ void qemu_savevm_state_cleanup(void)
 static int qemu_savevm_state(QEMUFile *f, Error **errp)
 {
     int ret;
-    MigrationState *ms = migrate_init();
+    MigrationState *ms = migrate_get_current();
     MigrationStatus status;
+
+    migrate_init(ms);
+
     ms->to_dst_file = f;
 
     if (migration_is_blocked(errp)) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 12/28] migration: rebuild channel on source
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (10 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 11/28] migration: pass MigrationState to migrate_init() Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 13/28] migration: new state "postcopy-recover" Peter Xu
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

This patch detects the "resume" flag of migration command, rebuild the
channels only if the flag is set.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 92 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 69 insertions(+), 23 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 96b3c25505..a061691fe5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1358,49 +1358,75 @@ bool migration_is_blocked(Error **errp)
     return false;
 }
 
-void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_resume, bool resume, Error **errp)
+/* Returns true if continue to migrate, or false if error detected */
+static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
+                            bool resume, Error **errp)
 {
     Error *local_err = NULL;
-    MigrationState *s = migrate_get_current();
-    const char *p;
+
+    if (resume) {
+        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            error_setg(errp, "Cannot resume if there is no "
+                       "paused migration");
+            return false;
+        }
+        /* This is a resume, skip init status */
+        return true;
+    }
 
     if (migration_is_setup_or_active(s->state) ||
         s->state == MIGRATION_STATUS_CANCELLING ||
         s->state == MIGRATION_STATUS_COLO) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
-        return;
+        return false;
     }
+
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         error_setg(errp, "Guest is waiting for an incoming migration");
-        return;
+        return false;
     }
 
     if (migration_is_blocked(errp)) {
-        return;
+        return false;
     }
 
-    if ((has_blk && blk) || (has_inc && inc)) {
+    if (blk || blk_inc) {
         if (migrate_use_block() || migrate_use_block_incremental()) {
             error_setg(errp, "Command options are incompatible with "
                        "current migration capabilities");
-            return;
+            return false;
         }
         migrate_set_block_enabled(true, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            return;
+            return false;
         }
         s->must_remove_block_options = true;
     }
 
-    if (has_inc && inc) {
+    if (blk_inc) {
         migrate_set_block_incremental(s, true);
     }
 
     migrate_init(s);
 
+    return true;
+}
+
+void qmp_migrate(const char *uri, bool has_blk, bool blk,
+                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_resume, bool resume, Error **errp)
+{
+    Error *local_err = NULL;
+    MigrationState *s = migrate_get_current();
+    const char *p;
+
+    if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
+                         has_resume && resume, errp)) {
+        /* Error detected, put into errp */
+        return;
+    }
+
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -1856,7 +1882,8 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms)
+static int open_return_path_on_source(MigrationState *ms,
+                                      bool create_thread)
 {
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
@@ -1865,6 +1892,12 @@ static int open_return_path_on_source(MigrationState *ms)
     }
 
     trace_open_return_path_on_source();
+
+    if (!create_thread) {
+        /* We're done */
+        return 0;
+    }
+
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 
@@ -2478,15 +2511,24 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s)
 {
-    s->expected_downtime = s->parameters.downtime_limit;
-    s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
+    int64_t rate_limit;
+    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
 
-    qemu_file_set_blocking(s->to_dst_file, true);
-    qemu_file_set_rate_limit(s->to_dst_file,
-                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
+    if (resume) {
+        /* This is a resumed migration */
+        rate_limit = INT64_MAX;
+    } else {
+        /* This is a fresh new migration */
+        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
+        s->expected_downtime = s->parameters.downtime_limit;
+        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
-    /* Notify before starting migration thread */
-    notifier_list_notify(&migration_state_notifiers, s);
+        /* Notify before starting migration thread */
+        notifier_list_notify(&migration_state_notifiers, s);
+    }
+
+    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
+    qemu_file_set_blocking(s->to_dst_file, true);
 
     /*
      * Open the return path. For postcopy, it is used exclusively. For
@@ -2494,15 +2536,19 @@ void migrate_fd_connect(MigrationState *s)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_use_return_path()) {
-        if (open_return_path_on_source(s)) {
+        if (open_return_path_on_source(s, !resume)) {
             error_report("Unable to open return-path for postcopy");
-            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
             return;
         }
     }
 
+    if (resume) {
+        /* TODO: do the resume logic */
+        return;
+    }
+
     if (multifd_save_setup() != 0) {
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                           MIGRATION_STATUS_FAILED);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 13/28] migration: new state "postcopy-recover"
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (11 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 12/28] migration: rebuild channel on source Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 14/28] migration: wakeup dst ram-load-thread for recover Peter Xu
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).

This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++-------------
 qapi/migration.json   |  4 ++-
 2 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a061691fe5..e00f4d0ad0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -554,6 +554,7 @@ static bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -634,6 +635,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -2250,6 +2252,13 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+/* Return zero if success, or <0 for error */
+static int postcopy_do_resume(MigrationState *s)
+{
+    /* TODO: do the resume logic */
+    return 0;
+}
+
 /*
  * We don't return until we are in a safe state to continue current
  * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
@@ -2258,29 +2267,55 @@ typedef enum MigThrError {
 static MigThrError postcopy_pause(MigrationState *s)
 {
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                      MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    /* Current channel is possibly broken. Release it. */
-    assert(s->to_dst_file);
-    qemu_file_shutdown(s->to_dst_file);
-    qemu_fclose(s->to_dst_file);
-    s->to_dst_file = NULL;
+    while (true) {
+        migrate_set_state(&s->state, s->state,
+                          MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    error_report("Detected IO failure for postcopy. "
-                 "Migration paused.");
+        /* Current channel is possibly broken. Release it. */
+        assert(s->to_dst_file);
+        qemu_file_shutdown(s->to_dst_file);
+        qemu_fclose(s->to_dst_file);
+        s->to_dst_file = NULL;
 
-    /*
-     * We wait until things fixed up. Then someone will setup the
-     * status back for us.
-     */
-    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
-        qemu_sem_wait(&s->postcopy_pause_sem);
-    }
+        error_report("Detected IO failure for postcopy. "
+                     "Migration paused.");
+
+        /*
+         * We wait until things fixed up. Then someone will setup the
+         * status back for us.
+         */
+        while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            qemu_sem_wait(&s->postcopy_pause_sem);
+        }
 
-    trace_postcopy_pause_continued();
+        if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+            /* Woken up by a recover procedure. Give it a shot */
+
+            /*
+             * Firstly, let's wake up the return path now, with a new
+             * return path channel.
+             */
+            qemu_sem_post(&s->postcopy_pause_rp_sem);
 
-    return MIG_THR_ERR_RECOVERED;
+            /* Do the resume logic */
+            if (postcopy_do_resume(s) == 0) {
+                /* Let's continue! */
+                trace_postcopy_pause_continued();
+                return MIG_THR_ERR_RECOVERED;
+            } else {
+                /*
+                 * Something wrong happened during the recovery, let's
+                 * pause again. Pause is always better than throwing
+                 * data away.
+                 */
+                continue;
+            }
+        } else {
+            /* This is not right... Time to quit. */
+            return MIG_THR_ERR_FATAL;
+        }
+    }
 }
 
 static MigThrError migration_detect_error(MigrationState *s)
@@ -2545,7 +2580,10 @@ void migrate_fd_connect(MigrationState *s)
     }
 
     if (resume) {
-        /* TODO: do the resume logic */
+        /* Wakeup the main migration thread to do the recovery */
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+        qemu_sem_post(&s->postcopy_pause_sem);
         return;
     }
 
diff --git a/qapi/migration.json b/qapi/migration.json
index 2ea12fcad4..5dfac0681d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -91,6 +91,8 @@
 #
 # @postcopy-paused: during postcopy but paused. (since 2.12)
 #
+# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -109,7 +111,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
-            'completed', 'failed', 'colo',
+            'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device' ] }
 
 ##
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 14/28] migration: wakeup dst ram-load-thread for recover
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (12 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 13/28] migration: new state "postcopy-recover" Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 15/28] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

On the destination side, we cannot wake up all the threads when we got
reconnected. The first thing to do is to wake up the main load thread,
so that we can continue to receive valid messages from source again and
reply when needed.

At this point, we switch the destination VM state from postcopy-paused
back to postcopy-recover.

Now we are finally ready to do the resume logic.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e00f4d0ad0..5e2e9cb117 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -422,8 +422,34 @@ static void migration_incoming_process(void)
 
 void migration_fd_process_incoming(QEMUFile *f)
 {
-    migration_incoming_setup(f);
-    migration_incoming_process();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        /* Resumed from a paused postcopy migration */
+
+        mis->from_src_file = f;
+        /* Postcopy has standalone thread to do vm load */
+        qemu_file_set_blocking(f, true);
+
+        /* Re-configure the return path */
+        mis->to_src_file = qemu_file_get_return_path(f);
+
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+
+        /*
+         * Here, we only wake up the main loading thread (while the
+         * fault thread will still be waiting), so that we can receive
+         * commands from source now, and answer it if needed. The
+         * fault thread will be woken up afterwards until we are sure
+         * that source is ready to reply to page requests.
+         */
+        qemu_sem_post(&mis->postcopy_pause_sem_dst);
+    } else {
+        /* New incoming migration */
+        migration_incoming_setup(f);
+        migration_incoming_process();
+    }
 }
 
 void migration_ioc_process_incoming(QIOChannel *ioc)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 15/28] migration: new cmd MIG_CMD_RECV_BITMAP
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (13 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 14/28] migration: wakeup dst ram-load-thread for recover Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 16/28] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
one ramblock.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 0dda7ca2b4..bf87dfb591 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -78,6 +78,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
 };
 
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
+    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -953,6 +955,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
+{
+    size_t len;
+    char buf[256];
+
+    trace_savevm_send_recv_bitmap(block_name);
+
+    buf[0] = len = strlen(block_name);
+    memcpy(buf + 1, block_name, len);
+
+    qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1760,6 +1775,49 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
     return ret;
 }
 
+/*
+ * Handle request that source requests for recved_bitmap on
+ * destination. Payload format:
+ *
+ * len (1 byte) + ramblock_name (<255 bytes)
+ */
+static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
+                                     uint16_t len)
+{
+    QEMUFile *file = mis->from_src_file;
+    RAMBlock *rb;
+    char block_name[256];
+    size_t cnt;
+
+    cnt = qemu_get_counted_string(file, block_name);
+    if (!cnt) {
+        error_report("%s: failed to read block name", __func__);
+        return -EINVAL;
+    }
+
+    /* Validate before using the data */
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    if (len != cnt + 1) {
+        error_report("%s: invalid payload length (%d)", __func__, len);
+        return -EINVAL;
+    }
+
+    rb = qemu_ram_block_by_name(block_name);
+    if (!rb) {
+        error_report("%s: block '%s' not found", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* TODO: send the bitmap back to source */
+
+    trace_loadvm_handle_recv_bitmap(block_name);
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
@@ -1833,6 +1891,9 @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case MIG_CMD_RECV_BITMAP:
+        return loadvm_handle_recv_bitmap(mis, len);
     }
 
     return 0;
diff --git a/migration/savevm.h b/migration/savevm.h
index 295c4a1f2c..8126b1cc14 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
diff --git a/migration/trace-events b/migration/trace-events
index 32f02cbdcc..55c0412aaa 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -12,6 +12,7 @@ loadvm_state_cleanup(void) ""
 loadvm_handle_cmd_packaged(unsigned int length) "%u"
 loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
+loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
@@ -34,6 +35,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 16/28] migration: new message MIG_RP_MSG_RECV_BITMAP
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (14 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 15/28] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 17/28] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.

This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.

When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by inverting the bits.

One thing to mention is that, when we send the recv bitmap, we are doing
these things in extra:

- converting the bitmap to little endian, to support when hosts are
  using different endianess on src/dst.

- do proper alignment for 8 bytes, to support when hosts are using
  different word size (32/64 bits) on src/dst.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  68 +++++++++++++++++++++++
 migration/migration.h  |   2 +
 migration/ram.c        | 144 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/ram.h        |   3 ++
 migration/savevm.c     |   2 +-
 migration/trace-events |   3 ++
 6 files changed, 221 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5e2e9cb117..a8c93df40d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -93,6 +93,7 @@ enum mig_rp_message_type {
 
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
+    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
 
     MIG_RP_MSG_MAX
 };
@@ -501,6 +502,45 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
 }
 
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name)
+{
+    char buf[512];
+    int len;
+    int64_t res;
+
+    /*
+     * First, we send the header part. It contains only the len of
+     * idstr, and the idstr itself.
+     */
+    len = strlen(block_name);
+    buf[0] = len;
+    memcpy(buf + 1, block_name, len);
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: MSG_RP_RECV_BITMAP only used for recovery",
+                     __func__);
+        return;
+    }
+
+    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
+
+    /*
+     * Next, we dump the received bitmap to the stream.
+     *
+     * TODO: currently we are safe since we are the only one that is
+     * using the to_src_file handle (fault thread is still paused),
+     * and it's ok even not taking the mutex. However the best way is
+     * to take the lock before sending the message header, and release
+     * the lock after sending the bitmap.
+     */
+    qemu_mutex_lock(&mis->rp_mutex);
+    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    trace_migrate_send_rp_recv_bitmap(block_name, res);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1730,6 +1770,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1774,6 +1815,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+
+    if (!block) {
+        error_report("%s: invalid block name '%s'", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* Fetch the received bitmap and refresh the dirty bitmap */
+    return ram_dirty_bitmap_reload(s, block);
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1879,6 +1933,20 @@ retry:
             migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
+        case MIG_RP_MSG_RECV_BITMAP:
+            if (header_len < 1) {
+                error_report("%s: missing block name", __func__);
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
+            buf[buf[0] + 1] = '\0';
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index d052669e1c..f879c93542 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,5 +219,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index f159c16f6a..601fa47754 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -180,6 +180,70 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
                       nr);
 }
 
+#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
+
+/*
+ * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
+ *
+ * Returns >0 if success with sent bytes, or <0 if error.
+ */
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+    unsigned long *le_bitmap, nbits;
+    uint64_t size;
+
+    if (!block) {
+        error_report("%s: invalid block name: %s", __func__, block_name);
+        return -1;
+    }
+
+    nbits = block->used_length >> TARGET_PAGE_BITS;
+
+    /*
+     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
+     * machines we may need 4 more bytes for padding (see below
+     * comment). So extend it a bit before hand.
+     */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    /*
+     * Always use little endian when sending the bitmap. This is
+     * required that when source and destination VMs are not using the
+     * same endianess. (Note: big endian won't work.)
+     */
+    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
+
+    /* Size of the bitmap, in bytes */
+    size = nbits / 8;
+
+    /*
+     * size is always aligned to 8 bytes for 64bit machines, but it
+     * may not be true for 32bit machines. We need this padding to
+     * make sure the migration can survive even between 32bit and
+     * 64bit machines.
+     */
+    size = ROUND_UP(size, 8);
+
+    qemu_put_be64(file, size);
+    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
+    /*
+     * Mark as an end, in case the middle part is screwed up due to
+     * some "misterious" reason.
+     */
+    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
+    qemu_fflush(file);
+
+    free(le_bitmap);
+
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    return size + sizeof(size);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -2992,6 +3056,86 @@ static bool ram_has_postcopy(void *opaque)
     return migrate_postcopy_ram();
 }
 
+/*
+ * Read the received bitmap, revert it as the initial dirty bitmap.
+ * This is only used when the postcopy migration is paused but wants
+ * to resume from a middle point.
+ */
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+{
+    int ret = -EINVAL;
+    QEMUFile *file = s->rp_state.from_dst_file;
+    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
+    uint64_t local_size = nbits / 8;
+    uint64_t size, end_mark;
+
+    trace_ram_dirty_bitmap_reload_begin(block->idstr);
+
+    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: incorrect state %s", __func__,
+                     MigrationStatus_str(s->state));
+        return -EINVAL;
+    }
+
+    /*
+     * Note: see comments in ramblock_recv_bitmap_send() on why we
+     * need the endianess convertion, and the paddings.
+     */
+    local_size = ROUND_UP(local_size, 8);
+
+    /* Add paddings */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    size = qemu_get_be64(file);
+
+    /* The size of the bitmap should match with our ramblock */
+    if (size != local_size) {
+        error_report("%s: ramblock '%s' bitmap size mismatch "
+                     "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
+                     block->idstr, size, local_size);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
+    end_mark = qemu_get_be64(file);
+
+    ret = qemu_file_get_error(file);
+    if (ret || size != local_size) {
+        error_report("%s: read bitmap failed for ramblock '%s': %d"
+                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
+                     __func__, block->idstr, ret, local_size, size);
+        ret = -EIO;
+        goto out;
+    }
+
+    if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
+        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64,
+                     __func__, block->idstr, end_mark);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /*
+     * Endianess convertion. We are during postcopy (though paused).
+     * The dirty bitmap won't change. We can directly modify it.
+     */
+    bitmap_from_le(block->bmap, le_bitmap, nbits);
+
+    /*
+     * What we received is "received bitmap". Revert it as the initial
+     * dirty bitmap for this ramblock.
+     */
+    bitmap_complement(block->bmap, block->bmap, nbits);
+
+    trace_ram_dirty_bitmap_reload_complete(block->idstr);
+
+    ret = 0;
+out:
+    free(le_bitmap);
+    return ret;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/ram.h b/migration/ram.h
index 64d81e9f1d..10a459cc89 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -61,5 +61,8 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index bf87dfb591..57ffa1e224 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1811,7 +1811,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
         return -EINVAL;
     }
 
-    /* TODO: send the bitmap back to source */
+    migrate_send_rp_recv_bitmap(mis, block_name);
 
     trace_loadvm_handle_recv_bitmap(block_name);
 
diff --git a/migration/trace-events b/migration/trace-events
index 55c0412aaa..3dcf8a93d9 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -79,6 +79,8 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_dirty_bitmap_reload_begin(char *str) "%s"
+ram_dirty_bitmap_reload_complete(char *str) "%s"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
@@ -90,6 +92,7 @@ migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
 migration_completion_postcopy_end(void) ""
 migration_completion_postcopy_end_after_complete(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 17/28] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (15 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 16/28] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 18/28] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Introducing this new command to be sent when the source VM is ready to
resume the paused migration.  What the destination does here is
basically release the fault thread to continue service page faults.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.c     | 35 +++++++++++++++++++++++++++++++++++
 migration/savevm.h     |  1 +
 migration/trace-events |  2 ++
 3 files changed, 38 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 57ffa1e224..c3682f0d47 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -77,6 +77,7 @@ enum qemu_vm_cmd {
     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                       were previously sent during
                                       precopy but are dirty. */
+    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
@@ -95,6 +96,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
+    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
@@ -955,6 +957,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_postcopy_resume(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_resume();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
+}
+
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
 {
     size_t len;
@@ -1727,6 +1735,30 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     return LOADVM_QUIT;
 }
 
+static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
+{
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: illegal resume received", __func__);
+        /* Don't fail the load, only for this. */
+        return 0;
+    }
+
+    /*
+     * This means source VM is ready to resume the postcopy migration.
+     * It's time to switch state and release the fault thread to
+     * continue service page faults.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_sem_post(&mis->postcopy_pause_sem_fault);
+
+    trace_loadvm_postcopy_handle_resume();
+
+    /* TODO: Tell source that "we are ready" */
+
+    return 0;
+}
+
 /**
  * Immediately following this command is a blob of data containing an embedded
  * chunk of migration stream; read it and load it.
@@ -1892,6 +1924,9 @@ static int loadvm_process_command(QEMUFile *f)
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
 
+    case MIG_CMD_POSTCOPY_RESUME:
+        return loadvm_postcopy_handle_resume(mis);
+
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
     }
diff --git a/migration/savevm.h b/migration/savevm.h
index 8126b1cc14..a5f3879191 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_resume(QEMUFile *f);
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
diff --git a/migration/trace-events b/migration/trace-events
index 3dcf8a93d9..4b60865194 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -18,6 +18,7 @@ loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
 loadvm_postcopy_handle_run_cpu_sync(void) ""
 loadvm_postcopy_handle_run_vmstart(void) ""
+loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
@@ -35,6 +36,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 18/28] migration: new message MIG_RP_MSG_RESUME_ACK
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (16 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 17/28] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 19/28] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
 migration/migration.h  |  3 +++
 migration/savevm.c     |  3 ++-
 migration/trace-events |  1 +
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index a8c93df40d..ed4024633c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -94,6 +94,7 @@ enum mig_rp_message_type {
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
     MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
+    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
 
     MIG_RP_MSG_MAX
 };
@@ -541,6 +542,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
     trace_migrate_send_rp_recv_bitmap(block_name, res);
 }
 
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1771,6 +1780,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
     [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
+    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1828,6 +1838,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
     return ram_dirty_bitmap_reload(s, block);
 }
 
+static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+{
+    trace_source_return_path_thread_resume_ack(value);
+
+    if (value != MIGRATION_RESUME_ACK_VALUE) {
+        error_report("%s: illegal resume_ack value %"PRIu32,
+                     __func__, value);
+        return -1;
+    }
+
+    /* Now both sides are active. */
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    /* TODO: notify send thread that time to continue send pages */
+
+    return 0;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1947,6 +1976,14 @@ retry:
             }
             break;
 
+        case MIG_RP_MSG_RESUME_ACK:
+            tmp32 = ldl_be_p(buf);
+            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/migration.h b/migration/migration.h
index f879c93542..11fbfebba1 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -22,6 +22,8 @@
 #include "hw/qdev.h"
 #include "io/channel.h"
 
+#define  MIGRATION_RESUME_ACK_VALUE  (1)
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -221,5 +223,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index c3682f0d47..7ab3fd41dc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1754,7 +1754,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
 
     trace_loadvm_postcopy_handle_resume();
 
-    /* TODO: Tell source that "we are ready" */
+    /* Tell source that "we are ready" */
+    migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
 
     return 0;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 4b60865194..2bf8301293 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -120,6 +120,7 @@ source_return_path_thread_entry(void) ""
 source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "0x%x"
 source_return_path_thread_shut(uint32_t val) "0x%x"
+source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 19/28] migration: introduce SaveVMHandlers.resume_prepare
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (17 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 18/28] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 20/28] migration: synchronize dirty bitmap for resume Peter Xu
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

This is hook function to be called when a postcopy migration wants to
resume from a failure. For each module, it should provide its own
recovery logic before we switch to the postcopy-active state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h |  2 ++
 migration/migration.c        | 20 +++++++++++++++++++-
 migration/savevm.c           | 25 +++++++++++++++++++++++++
 migration/savevm.h           |  1 +
 migration/trace-events       |  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index f4f7bdc177..128124f008 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -42,6 +42,8 @@ typedef struct SaveVMHandlers {
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
+    /* Called when postcopy migration wants to resume from failure */
+    int (*resume_prepare)(MigrationState *s, void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(DeviceState *dev,
diff --git a/migration/migration.c b/migration/migration.c
index ed4024633c..40cb9a29cf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2386,7 +2386,25 @@ typedef enum MigThrError {
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
-    /* TODO: do the resume logic */
+    int ret;
+
+    /*
+     * Call all the resume_prepare() hooks, so that modules can be
+     * ready for the migration resume.
+     */
+    ret = qemu_savevm_state_resume_prepare(s);
+    if (ret) {
+        error_report("%s: resume_prepare() failure detected: %d",
+                     __func__, ret);
+        return ret;
+    }
+
+    /*
+     * TODO: handshake with dest using MIG_CMD_RESUME,
+     * MIG_RP_MSG_RESUME_ACK, then switch source state to
+     * "postcopy-active"
+     */
+
     return 0;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 7ab3fd41dc..0aec5b0b6b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1028,6 +1028,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
     }
 }
 
+int qemu_savevm_state_resume_prepare(MigrationState *s)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    trace_savevm_state_resume_prepare();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->resume_prepare) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        ret = se->ops->resume_prepare(s, se->opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * this function has three return values:
  *   negative: there was one error, and we have -errno.
diff --git a/migration/savevm.h b/migration/savevm.h
index a5f3879191..3193f04cca 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,6 +31,7 @@
 
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/trace-events b/migration/trace-events
index 2bf8301293..eadabf03e8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -39,6 +39,7 @@ savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
+savevm_state_resume_prepare(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_cleanup(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 20/28] migration: synchronize dirty bitmap for resume
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (18 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 19/28] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2017-12-05  6:52 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 21/28] migration: setup ramstate " Peter Xu
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.

When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.

The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c  |  2 ++
 migration/migration.h  |  1 +
 migration/ram.c        | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  4 ++++
 4 files changed, 54 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 40cb9a29cf..2305a08c26 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2839,6 +2839,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
+    qemu_sem_destroy(&ms->rp_state.rp_sem);
 }
 
 static void migration_instance_init(Object *obj)
@@ -2870,6 +2871,7 @@ static void migration_instance_init(Object *obj)
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
+    qemu_sem_init(&ms->rp_state.rp_sem, 0);
 }
 
 /*
diff --git a/migration/migration.h b/migration/migration.h
index 11fbfebba1..82dd7d9820 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -108,6 +108,7 @@ struct MigrationState
         QEMUFile     *from_dst_file;
         QemuThread    rp_thread;
         bool          error;
+        QemuSemaphore rp_sem;
     } rp_state;
 
     double mbps;
diff --git a/migration/ram.c b/migration/ram.c
index 601fa47754..8fa47a0353 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -49,6 +49,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "savevm.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -3056,6 +3057,38 @@ static bool ram_has_postcopy(void *opaque)
     return migrate_postcopy_ram();
 }
 
+/* Sync all the dirty bitmap with destination VM.  */
+static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
+{
+    RAMBlock *block;
+    QEMUFile *file = s->to_dst_file;
+    int ramblock_count = 0;
+
+    trace_ram_dirty_bitmap_sync_start();
+
+    RAMBLOCK_FOREACH(block) {
+        qemu_savevm_send_recv_bitmap(file, block->idstr);
+        trace_ram_dirty_bitmap_request(block->idstr);
+        ramblock_count++;
+    }
+
+    trace_ram_dirty_bitmap_sync_wait();
+
+    /* Wait until all the ramblocks' dirty bitmap synced */
+    while (ramblock_count--) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    trace_ram_dirty_bitmap_sync_complete();
+
+    return 0;
+}
+
+static void ram_dirty_bitmap_reload_notify(MigrationState *s)
+{
+    qemu_sem_post(&s->rp_state.rp_sem);
+}
+
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -3130,12 +3163,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
+    /*
+     * We succeeded to sync bitmap for current ramblock. If this is
+     * the last one to sync, we need to notify the main send thread.
+     */
+    ram_dirty_bitmap_reload_notify(s);
+
     ret = 0;
 out:
     free(le_bitmap);
     return ret;
 }
 
+static int ram_resume_prepare(MigrationState *s, void *opaque)
+{
+    RAMState *rs = *(RAMState **)opaque;
+
+    return ram_dirty_bitmap_sync_all(s, rs);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -3147,6 +3193,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
+    .resume_prepare = ram_resume_prepare,
 };
 
 void ram_mig_init(void)
diff --git a/migration/trace-events b/migration/trace-events
index eadabf03e8..804f18d492 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -82,8 +82,12 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_dirty_bitmap_request(char *str) "%s"
 ram_dirty_bitmap_reload_begin(char *str) "%s"
 ram_dirty_bitmap_reload_complete(char *str) "%s"
+ram_dirty_bitmap_sync_start(void) ""
+ram_dirty_bitmap_sync_wait(void) ""
+ram_dirty_bitmap_sync_complete(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 21/28] migration: setup ramstate for resume
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (19 preceding siblings ...)
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 20/28] migration: synchronize dirty bitmap for resume Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 22/28] migration: final handshake for the resume Peter Xu
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

After we updated the dirty bitmaps of ramblocks, we also need to update
the critical fields in RAMState to make sure it is ready for a resume.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c        | 37 ++++++++++++++++++++++++++++++++++++-
 migration/trace-events |  1 +
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 8fa47a0353..358d229721 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2250,6 +2250,33 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+static void ram_state_resume_prepare(RAMState *rs)
+{
+    RAMBlock *block;
+    long pages = 0;
+
+    /*
+     * Postcopy is not using xbzrle/compression, so no need for that.
+     * Also, since source are already halted, we don't need to care
+     * about dirty page logging as well.
+     */
+
+    RAMBLOCK_FOREACH(block) {
+        pages += bitmap_count_one(block->bmap,
+                                  block->used_length >> TARGET_PAGE_BITS);
+    }
+
+    /* This may not be aligned with current bitmaps. Recalculate. */
+    rs->migration_dirty_pages = pages;
+
+    rs->last_seen_block = NULL;
+    rs->last_sent_block = NULL;
+    rs->last_page = 0;
+    rs->last_version = ram_list.version;
+
+    trace_ram_state_resume_prepare(pages);
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3178,8 +3205,16 @@ out:
 static int ram_resume_prepare(MigrationState *s, void *opaque)
 {
     RAMState *rs = *(RAMState **)opaque;
+    int ret;
 
-    return ram_dirty_bitmap_sync_all(s, rs);
+    ret = ram_dirty_bitmap_sync_all(s, rs);
+    if (ret) {
+        return ret;
+    }
+
+    ram_state_resume_prepare(rs);
+
+    return 0;
 }
 
 static SaveVMHandlers savevm_ram_handlers = {
diff --git a/migration/trace-events b/migration/trace-events
index 804f18d492..98c2e4de58 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -88,6 +88,7 @@ ram_dirty_bitmap_reload_complete(char *str) "%s"
 ram_dirty_bitmap_sync_start(void) ""
 ram_dirty_bitmap_sync_wait(void) ""
 ram_dirty_bitmap_sync_complete(void) ""
+ram_state_resume_prepare(long v) "%ld"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 22/28] migration: final handshake for the resume
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (20 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 21/28] migration: setup ramstate " Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 23/28] migration: free SocketAddress where allocated Peter Xu
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Finish the last step to do the final handshake for the recovery.

First source sends one MIG_CMD_RESUME to dst, telling that source is
ready to resume.

Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
dest is ready to resume (after switch to postcopy-active state).

When source received the RESUME_ACK, it switches its state to
postcopy-active, and finally the recovery is completed.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2305a08c26..7f73a413b2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1852,7 +1852,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
-    /* TODO: notify send thread that time to continue send pages */
+    /* Notify send thread that time to continue send pages */
+    qemu_sem_post(&s->rp_state.rp_sem);
 
     return 0;
 }
@@ -2383,6 +2384,21 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+static int postcopy_resume_handshake(MigrationState *s)
+{
+    qemu_savevm_send_postcopy_resume(s->to_dst_file);
+
+    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
+    return -1;
+}
+
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
@@ -2400,10 +2416,14 @@ static int postcopy_do_resume(MigrationState *s)
     }
 
     /*
-     * TODO: handshake with dest using MIG_CMD_RESUME,
-     * MIG_RP_MSG_RESUME_ACK, then switch source state to
-     * "postcopy-active"
+     * Last handshake with destination on the resume (destination will
+     * switch to postcopy-active afterwards)
      */
+    ret = postcopy_resume_handshake(s);
+    if (ret) {
+        error_report("%s: handshake failed: %d", __func__, ret);
+        return ret;
+    }
 
     return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 23/28] migration: free SocketAddress where allocated
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (21 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 22/28] migration: final handshake for the resume Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 24/28] migration: init dst in migration_object_init too Peter Xu
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Freeing the SocketAddress struct in socket_start_incoming_migration is
slightly confusing. Let's free the address in the same context where we
allocated it.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/socket.c b/migration/socket.c
index dee869044a..4879f11e0f 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -172,7 +172,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
 
     if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
         object_unref(OBJECT(listen_ioc));
-        qapi_free_SocketAddress(saddr);
         return;
     }
 
@@ -181,7 +180,6 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
                           socket_accept_incoming_migration,
                           listen_ioc,
                           (GDestroyNotify)object_unref);
-    qapi_free_SocketAddress(saddr);
 }
 
 void tcp_start_incoming_migration(const char *host_port, Error **errp)
@@ -192,10 +190,12 @@ void tcp_start_incoming_migration(const char *host_port, Error **errp)
         socket_start_incoming_migration(saddr, &err);
     }
     error_propagate(errp, err);
+    qapi_free_SocketAddress(saddr);
 }
 
 void unix_start_incoming_migration(const char *path, Error **errp)
 {
     SocketAddress *saddr = unix_build_address(path);
     socket_start_incoming_migration(saddr, errp);
+    qapi_free_SocketAddress(saddr);
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 24/28] migration: init dst in migration_object_init too
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (22 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 23/28] migration: free SocketAddress where allocated Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 25/28] io: let watcher of the channel run in same ctx Peter Xu
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Though we may not need it, now we init both the src/dst migration
objects in migration_object_init() so that even incoming migration
object would be thread safe (it was not).

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7f73a413b2..8762fad9be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -104,6 +104,7 @@ enum mig_rp_message_type {
    dynamic creation of migration */
 
 static MigrationState *current_migration;
+static MigrationIncomingState *current_incoming;
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 static int migration_maybe_pause(MigrationState *s,
@@ -119,6 +120,18 @@ void migration_object_init(void)
     assert(!current_migration);
     current_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION));
 
+    /*
+     * Init the migrate incoming object as well no matter whether
+     * we'll use it or not.
+     */
+    assert(!current_incoming);
+    current_incoming = g_new0(MigrationIncomingState, 1);
+    current_incoming->state = MIGRATION_STATUS_NONE;
+    qemu_mutex_init(&current_incoming->rp_mutex);
+    qemu_event_init(&current_incoming->main_thread_load_event, false);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
+
     if (!migration_object_check(current_migration, &err)) {
         error_report_err(err);
         exit(1);
@@ -144,19 +157,8 @@ MigrationState *migrate_get_current(void)
 
 MigrationIncomingState *migration_incoming_get_current(void)
 {
-    static bool once;
-    static MigrationIncomingState mis_current;
-
-    if (!once) {
-        mis_current.state = MIGRATION_STATUS_NONE;
-        memset(&mis_current, 0, sizeof(MigrationIncomingState));
-        qemu_mutex_init(&mis_current.rp_mutex);
-        qemu_event_init(&mis_current.main_thread_load_event, false);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
-        once = true;
-    }
-    return &mis_current;
+    assert(current_incoming);
+    return current_incoming;
 }
 
 void migration_incoming_state_destroy(void)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 25/28] io: let watcher of the channel run in same ctx
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (23 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 24/28] migration: init dst in migration_object_init too Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy Peter Xu
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Per-thread gcontext is only used in IOThread (please refer to callers of
g_main_context_push_thread_default), so this patch only affects anything
that will be run in an IOThread.  It lets the watcher object be run in
the same context as the caller that added the watcher.

This patch is critical to make sure that migration stream accept()
procedure will also be run in the monitor IOThread rather than the
default main thread, so it can survive even if main thread hangs.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 io/channel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io/channel.c b/io/channel.c
index ec4b86de7c..d6018ddfb6 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -312,7 +312,7 @@ guint qio_channel_add_watch(QIOChannel *ioc,
 
     g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
 
-    id = g_source_attach(source, NULL);
+    id = g_source_attach(source, g_main_context_get_thread_default());
     g_source_unref(source);
 
     return id;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (24 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 25/28] io: let watcher of the channel run in same ctx Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-19 10:58   ` Dr. David Alan Gilbert
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 27/28] qmp/migration: new command migrate-recover Peter Xu
                   ` (4 subsequent siblings)
  30 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

It was allowed in the past to even cancel a postcopy migration, but it
does not really make sense, and no one should be using it, since
cancelling a migration during postcopy means crashing the VM at no time.

Let's just use re-use this command as a way to pause the postcopy
migration when we detected that we are in postcopy migration.  This can
be used when we want to trigger the postcopy recovery manually.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx       |  8 ++++++--
 migration/migration.c | 18 +++++++++++++++++-
 qapi/migration.json   |  3 ++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ffcdc34652..32fdd52212 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -952,14 +952,18 @@ ETEXI
         .name       = "migrate_cancel",
         .args_type  = "",
         .params     = "",
-        .help       = "cancel the current VM migration",
+        .help       = "cancel the current VM precopy migration, or "
+                      "pause the migration if it's in postcopy state, "
+                      "so that a postcopy recovery can be carried out later.",
         .cmd        = hmp_migrate_cancel,
     },
 
 STEXI
 @item migrate_cancel
 @findex migrate_cancel
-Cancel the current VM migration.
+If during a precopy, this command cancels the migration.  If during
+postcopy, this command pauses the migration (so that a postcopy
+recovery can be carried out afterward).
 ETEXI
 
     {
diff --git a/migration/migration.c b/migration/migration.c
index 8762fad9be..0c1a783df2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1535,7 +1535,23 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migrate_fd_cancel(migrate_get_current());
+    MigrationState *ms = migrate_get_current();
+    int ret;
+
+    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        /*
+         * Cancelling a postcopy migration does not really make sense.
+         * Here instead we pause the migration only, so another
+         * recovery can take place if needed.
+         */
+        ret = qemu_file_shutdown(ms->to_dst_file);
+        if (ret) {
+            error_setg(errp, "Failed to pause migration stream.");
+        }
+        return;
+    }
+
+    migrate_fd_cancel(ms);
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 5dfac0681d..e15bb516cd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -881,7 +881,8 @@
 ##
 # @migrate_cancel:
 #
-# Cancel the current executing migration process.
+# Cancel the current executing migration process for precopy.  For
+# postcopy, it'll pause the migration instead.
 #
 # Returns: nothing on success
 #
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 27/28] qmp/migration: new command migrate-recover
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (25 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 28/28] hmp/migration: add migrate_recover command Peter Xu
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

The first allow-oob=true command.  It's used on destination side when
the postcopy migration is paused and ready for a recovery.  After
execution, a new migration channel will be established for postcopy to
continue.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 18 ++++++++++++++++++
 qapi/migration.json   | 20 ++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 0c1a783df2..b37c4ef669 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1423,6 +1423,24 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
     once = false;
 }
 
+void qmp_migrate_recover(const char *uri, Error **errp)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        error_setg(errp, "Migrate recover can only be run "
+                   "when postcopy is paused.");
+        return;
+    }
+
+    /*
+     * Note that this call will never start a real migration; it will
+     * only re-setup the migration stream and poke existing migration
+     * to continue using that newly established channel.
+     */
+    qemu_start_incoming_migration(uri, errp);
+}
+
 bool migration_is_blocked(Error **errp)
 {
     if (qemu_savevm_state_blocked(errp)) {
diff --git a/qapi/migration.json b/qapi/migration.json
index e15bb516cd..9fcf980750 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1173,3 +1173,23 @@
 # Since: 2.9
 ##
 { 'command': 'xen-colo-do-checkpoint' }
+
+##
+# @migrate-recover:
+#
+# Provide a recovery migration stream URI.
+#
+# @uri: the URI to be used for the recovery of migration stream.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "migrate-recover",
+#      "arguments": { "uri": "tcp:192.168.1.200:12345" } }
+# <- { "return": {} }
+#
+# Since: 2.12
+##
+{ 'command': 'migrate-recover', 'data': { 'uri': 'str' },
+  'allow-oob': true }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH v5 28/28] hmp/migration: add migrate_recover command
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (26 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 27/28] qmp/migration: new command migrate-recover Peter Xu
@ 2017-12-05  6:53 ` Peter Xu
  2017-12-05  6:55 ` [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert, peterx

Sister command to migrate-recover in QMP.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp-commands.hx | 13 +++++++++++++
 hmp.c           | 10 ++++++++++
 hmp.h           |  1 +
 3 files changed, 24 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 32fdd52212..79e28c5228 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -992,7 +992,20 @@ STEXI
 @findex migrate_incoming
 Continue an incoming migration using the @var{uri} (that has the same syntax
 as the -incoming option).
+ETEXI
 
+    {
+        .name       = "migrate_recover",
+        .args_type  = "uri:s",
+        .params     = "uri",
+        .help       = "Continue a paused incoming postcopy migration",
+        .cmd        = hmp_migrate_recover,
+    },
+
+STEXI
+@item migrate_recover @var{uri}
+@findex migrate_recover
+Continue a paused incoming postcopy migration using the @var{uri}.
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index c7e1022283..4e2a88bbc2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1519,6 +1519,16 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_migrate_recover(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *uri = qdict_get_str(qdict, "uri");
+
+    qmp_migrate_recover(uri, &err);
+
+    hmp_handle_error(mon, &err);
+}
+
 /* Kept for backwards compatibility */
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
 {
diff --git a/hmp.h b/hmp.h
index a6f56b1f29..81c6837550 100644
--- a/hmp.h
+++ b/hmp.h
@@ -70,6 +70,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
+void hmp_migrate_recover(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (27 preceding siblings ...)
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 28/28] hmp/migration: add migrate_recover command Peter Xu
@ 2017-12-05  6:55 ` Peter Xu
  2017-12-05 18:43 ` Dr. David Alan Gilbert
  2018-01-11 16:59 ` Dr. David Alan Gilbert
  30 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-05  6:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov, Dr . David Alan Gilbert

On Tue, Dec 05, 2017 at 02:52:39PM +0800, Peter Xu wrote:
> Tree is pushed here for better reference and testing (online tree
> includes monitor OOB series):
> 
>   https://github.com/xzpeter/qemu/tree/postcopy-recover-all

Sorry, now this series is depending on the OOB series.  Hello,
Patchew, hope you see this soon enough:

Based-on: <20171205055200.16305-1-peterx@redhat.com>

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile Peter Xu
@ 2017-12-05 11:40   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-05 11:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> If the postcopy down due to some reason, we can always see this on dst:
> 
>   qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000
> 
> However in most cases that's not the real issue. The problem is that
> qemu_get_be16() has no way to show whether the returned data is valid or
> not, and we are _always_ assuming it is valid. That's possibly not wise.
> 
> The best approach to solve this would be: refactoring QEMUFile interface
> to allow the APIs to return error if there is. However it needs quite a
> bit of work and testing. For now, let's explicitly check the validity
> first before using the data in all places for qemu_get_*().
> 
> This patch tries to fix most of the cases I can see. Only if we are with
> this, can we make sure we are processing the valid data, and also can we
> make sure we can capture the channel down events correctly.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c |  5 +++++
>  migration/ram.c       | 21 +++++++++++++++++----
>  migration/savevm.c    | 40 ++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 60 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c0206023d7..eae34d0524 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1708,6 +1708,11 @@ static void *source_return_path_thread(void *opaque)
>          header_type = qemu_get_be16(rp);
>          header_len = qemu_get_be16(rp);
>  
> +        if (qemu_file_get_error(rp)) {
> +            mark_source_rp_bad(ms);
> +            goto out;
> +        }
> +
>          if (header_type >= MIG_RP_MSG_MAX ||
>              header_type == MIG_RP_MSG_INVALID) {
>              error_report("RP: Received invalid message 0x%04x length 0x%04x",
> diff --git a/migration/ram.c b/migration/ram.c
> index 021d583b9b..f159c16f6a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2696,6 +2696,16 @@ static int ram_load_postcopy(QEMUFile *f)
>          uint8_t ch;
>  
>          addr = qemu_get_be64(f);
> +
> +        /*
> +         * If qemu file error, we should stop here, and then "addr"
> +         * may be invalid
> +         */
> +        ret = qemu_file_get_error(f);
> +        if (ret) {
> +            break;
> +        }
> +
>          flags = addr & ~TARGET_PAGE_MASK;
>          addr &= TARGET_PAGE_MASK;
>  
> @@ -2776,9 +2786,15 @@ static int ram_load_postcopy(QEMUFile *f)
>              error_report("Unknown combination of migration flags: %#x"
>                           " (postcopy mode)", flags);
>              ret = -EINVAL;
> +            break;
> +        }
> +
> +        /* Detect for any possible file errors */
> +        if (!ret && qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
>          }
>  
> -        if (place_needed) {
> +        if (!ret && place_needed) {
>              /* This gets called at the last target page in the host page */
>              void *place_dest = host + TARGET_PAGE_SIZE - block->page_size;
>  
> @@ -2790,9 +2806,6 @@ static int ram_load_postcopy(QEMUFile *f)
>                                            place_source, block);
>              }
>          }
> -        if (!ret) {
> -            ret = qemu_file_get_error(f);
> -        }
>      }
>  
>      return ret;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7908f62be..8814793255 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1765,6 +1765,11 @@ static int loadvm_process_command(QEMUFile *f)
>      cmd = qemu_get_be16(f);
>      len = qemu_get_be16(f);
>  
> +    /* Check validity before continue processing of cmds */
> +    if (qemu_file_get_error(f)) {
> +        return qemu_file_get_error(f);
> +    }
> +
>      trace_loadvm_process_command(cmd, len);
>      if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
>          error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
> @@ -1830,6 +1835,7 @@ static int loadvm_process_command(QEMUFile *f)
>   */
>  static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  {
> +    int ret;
>      uint8_t read_mark;
>      uint32_t read_section_id;
>  
> @@ -1840,6 +1846,13 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  
>      read_mark = qemu_get_byte(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Read section footer failed: %d",
> +                     __func__, ret);
> +        return false;
> +    }
> +
>      if (read_mark != QEMU_VM_SECTION_FOOTER) {
>          error_report("Missing section footer for %s", se->idstr);
>          return false;
> @@ -1875,6 +1888,13 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>      instance_id = qemu_get_be32(f);
>      version_id = qemu_get_be32(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Failed to read instance/version ID: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
>      trace_qemu_loadvm_state_section_startfull(section_id, idstr,
>              instance_id, version_id);
>      /* Find savevm section */
> @@ -1922,6 +1942,13 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>  
>      section_id = qemu_get_be32(f);
>  
> +    ret = qemu_file_get_error(f);
> +    if (ret) {
> +        error_report("%s: Failed to read section ID: %d",
> +                     __func__, ret);
> +        return ret;
> +    }
> +
>      trace_qemu_loadvm_state_section_partend(section_id);
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (se->load_section_id == section_id) {
> @@ -1989,8 +2016,14 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>      uint8_t section_type;
>      int ret = 0;
>  
> -    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
> -        ret = 0;
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
> +        }
> +
>          trace_qemu_loadvm_state_section(section_type);
>          switch (section_type) {
>          case QEMU_VM_SECTION_START:
> @@ -2014,6 +2047,9 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>                  goto out;
>              }
>              break;
> +        case QEMU_VM_EOF:
> +            /* This is the end of migration */
> +            goto out;
>          default:
>              error_report("Unknown savevm section type %d", section_type);
>              ret = -EINVAL;
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (28 preceding siblings ...)
  2017-12-05  6:55 ` [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
@ 2017-12-05 18:43 ` Dr. David Alan Gilbert
  2017-12-06  2:39   ` Peter Xu
  2018-01-11 16:59 ` Dr. David Alan Gilbert
  30 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-05 18:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> Tree is pushed here for better reference and testing (online tree
> includes monitor OOB series):
> 
>   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> 
> This version removed quite a few patches related to migrate-incoming,
> instead I introduced a new command "migrate-recover" to trigger the
> recovery channel on destination side to simplify the code.
> 
> To test this two series altogether, please checkout above tree and
> build.  Note: to test on small and single host, one need to disable
> full bandwidth postcopy migration otherwise it'll complete very fast.
> Basically a simple patch like this would help:
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4de3b551fe..c0206023d7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
>       * will notice we're in POSTCOPY_ACTIVE and not actually
>       * wrap their state up here
>       */
> -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
>      if (migrate_postcopy_ram()) {
>          /* Ping just for debugging, helps line traces up */
>          qemu_savevm_send_ping(ms->to_dst_file, 2);
> 
> This patch is included already in above github tree.  Please feel free
> to drop this patch when want to test on big machines and between real
> hosts.
> 
> Detailed Test Procedures (QMP only)
> ===================================
> 
> 1. start source QEMU.
> 
> $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
>      -smp 4 -m 1G -qmp stdio \
>      -name peter-vm,debug-threads=on \
>      -netdev user,id=net0 \
>      -device e1000,netdev=net0 \
>      -global migration.x-max-bandwidth=4096 \
>      -global migration.x-postcopy-ram=on \
>      /images/fedora-25.qcow2

I suspect -snapshot isn't doing the right thing to the storage when
combined with the migration - I'm assuming the destination isn't using
the same temporary file.
(Also any reason for specifying split irqchip?)

> 2. start destination QEMU.
> 
> $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
>      -smp 4 -m 1G -qmp stdio \
>      -name peter-vm,debug-threads=on \
>      -netdev user,id=net0 \
>      -device e1000,netdev=net0 \
>      -global migration.x-max-bandwidth=4096 \
>      -global migration.x-postcopy-ram=on \
>      -incoming tcp:0.0.0.0:5555 \
>      /images/fedora-25.qcow2
> 
> 3. On source, do QMP handshake as normal:
> 
>   {"execute": "qmp_capabilities"}
>   {"return": {}}
> 
> 4. On destination, do QMP handshake to enable OOB:
> 
>   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
>   {"return": {}}
> 
> 5. On source, trigger initial migrate command, switch to postcopy:
> 
>   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
>   {"return": {}}
>   {"execute": "query-migrate"}
>   {"return": {"expected-downtime": 300, "status": "active", ...}}
>   {"execute": "migrate-start-postcopy"}
>   {"return": {}}
>   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
>   {"execute": "query-migrate"}
>   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> 
> 6. On source, manually trigger a "fake network down" using
>    "migrate-cancel" command:
> 
>   {"execute": "migrate_cancel"}
>   {"return": {}}
> 
>   During postcopy, it'll not really cancel the migration, but pause
>   it.  On both sides, we should see this on stderr:
> 
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> 
>   It means now both sides are in postcopy-pause state.
> 
> 7. (Optional) On destination side, let's try to hang the main thread
>    using the new x-oob-test command, providing a "lock=true" param:
> 
>    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
>     "arguments": { "lock": true } }
> 
>    After sending this command, we should not see any "return", because
>    main thread is blocked already.  But we can still use the monitor
>    since the monitor now has dedicated IOThread.
> 
> 8. On destination side, provide a new incoming port using the new
>    command "migrate-recover" (note that if step 7 is carried out, we
>    _must_ use OOB form, otherwise the command will hang.  With OOB,
>    this command will return immediately):
> 
>   {"execute": "migrate-recover", "id": "recover-cmd",
>    "arguments": { "uri": "tcp:localhost:5556" },
>    "control": { "run-oob": true } }
>   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
>    "event": "MIGRATION", "data": {"status": "setup"}}
>   {"return": {}, "id": "recover-cmd"}
> 
>    We can see that the command will success even if main thread is
>    locked up.
> 
> 9. (Optional) This step is only needed if step 7 is carried out. On
>    destination, let's unlock the main thread before resuming the
>    migration, this time with "lock=false" to unlock the main thread
>    (since system running needs the main thread). Note that we _must_
>    use OOB command here too:
> 
>   {"execute": "x-oob-test", "id": "unlock-dispatcher",
>    "arguments": { "lock": false }, "control": { "run-oob": true } }
>   {"return": {}, "id": "unlock-dispatcher"}
>   {"return": {}, "id": "lock-dispatcher-cmd"}
> 
>   Here the first "return" is the reply to the unlock command, the
>   second "return" is the reply to the lock command.  After this
>   command, main thread is released.
> 
> 10. On source, resume the postcopy migration:
> 
>   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }}
>   {"return": {}}
>   {"execute": "query-migrate"}
>   {"return": {"status": "completed", ...}}

The use of x-oob-test to lock things is a bit different to reality
and that means the ordering is different.
When the destination is blocked by a page request, that page won't
become unstuck until sometime after (10) happens and delivers the page
to the target.

You could try an 'info cpu' on the destination at (7) - although it's
not guaranteed to lock, depending whether the page needed has arrived.

Dave

> Here's the change log:
> 
> v5:
> - add some more r-bs
> - fix error path in ram_load_postcopy to always check on "ret" [Dave]
> - move init/destroy of three new sems into migration object
>   init/finalize functions
> - dropped patch "migration: delay the postcopy-active state switch",
>   meanwhile touch up patch 6 to check against
>   POSTCOPY_INCOMING_RUNNING state when trying to switch to
>   postcopy-pause state. [Dave]
> - drop two patches that introduce qmp/hmp of migrate-pause, instead
>   re-use migrate-cancel to do manual trigger of postcopy recovery.
> - add a new patch to let migrate_cancel to pause migration if it's
>   already in postcopy phase.
> - add a new command "migrate-recover" to re-assign the incoming port,
>   instead of reusing migrate-incoming.
> - since now I used migrate-recover command instead of migrate-incoming
>   itself, I dropped quite a few patches that are not really relevant
>   now, so the series got smaller:
>         migration: return incoming task tag for sockets
>         migration: return incoming task tag for exec
>         migration: return incoming task tag for fd
>         migration: store listen task tag
>         migration: allow migrate_incoming for paused VM
> 
> v4:
> - fix two compile errors that patchew reported
> - for QMP: do s/2.11/2.12/g
> - fix migrate-incoming logic to be more strict
> 
> v3:
> - add r-bs correspondingly
> - in ram_load_postcopy() capture error if postcopy_place_page() failed
>   [Dave]
> - remove "break" if there is a "goto" before that [Dave]
> - ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more
>   print sizes [Dave]
> - remove RAMState.ramblock_to_sync, instead use local counter [Dave]
> - init tag in tcp_start_incoming_migration() [Dave]
> - more traces when transmiting the recv bitmap [Dave]
> - postcopy_pause_incoming(): do shutdown before taking rp lock [Dave]
> - add one more patch to postpone the state switch of postcopy-active [Dave]
> - refactor the migrate_incoming handling according to the email
>   discussion [Dave]
> - add manual trigger to pause postcopy (two new patches added to
>   introduce "migrate-pause" command for QMP/HMP). [Dave]
> 
> v2:
> - rebased to alexey's received bitmap v9
> - add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21
> - patch 1: use target page size to calc bitmap [Dave]
> - patch 3: move trace_*() after EINTR check [Dave]
> - patch 4: dropped since I can use bitmap_complement() [Dave]
> - patch 7: check file error right after data is read in both
>   qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),
>   meanwhile also check in check_section_footer() [Dave]
> - patch 8/9: fix error_report/commit message in both patches [Dave]
> - patch 10: dropped (new parameter "x-postcopy-fast")
> - patch 11: split the "postcopy-paused" patch into two, one to
>   introduce the new state, the other to implement the logic. Also,
>   print something when paused [Dave]
> - patch 17: removed do_resume label, introduced migration_prepare()
>   [Dave]
> - patch 18: removed do_pause label using a new loop [Dave]
> - patch 20: removed incorrect comment [Dave]
> - patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add
>   trace in loadvm_handle_recv_bitmap() [Dave]
> - patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit
>   machines. More info in the commit message update.
> - patch 23: add one check on migration state [Dave]
> - patch 24: use macro instead of magic 1 [Dave]
> - patch 26: use more trace_*() instead of one, and use one sem to
>   replace mutex+cond. [Dave]
> - move sem init/destroy into migration_instance_init() and
>   migration_instance_finalize (new function after rebase).
> - patch 29: squashed this patch most into:
>   "migration: implement "postcopy-pause" src logic" [Dave]
> - split the two fix patches out of the series
> - fixed two places where I misused "wake/woke/woken". [Dave]
> - add new patch "bitmap: provide to_le/from_le helpers" to solve the
>   bitmap endianess issue [Dave]
> - appended migrate_incoming series to this series, since that one is
>   depending on the paused state.  Using explicit g_source_remove() for
>   listening ports [Dan]
> 
> FUTURE TODO LIST
> - support migrate_cancel during PAUSED/RECOVER state
> - when anything wrong happens during PAUSED/RECOVER, switching back to
>   PAUSED state on both sides
> 
> As we all know that postcopy migration has a potential risk to lost
> the VM if the network is broken during the migration. This series
> tries to solve the problem by allowing the migration to pause at the
> failure point, and do recovery after the link is reconnected.
> 
> There was existing work on this issue from Md Haris Iqbal:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> 
> This series is a totally re-work of the issue, based on Alexey
> Perevalov's recved bitmap v8 series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
> 
> Two new status are added to support the migration (used on both
> sides):
> 
>   MIGRATION_STATUS_POSTCOPY_PAUSED
>   MIGRATION_STATUS_POSTCOPY_RECOVER
> 
> The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
> network failure is detected. It is a phase that we'll be in for a long
> time as long as the failure is detected, and we'll be there until a
> recovery is triggered.  In this state, all the threads (on source:
> send thread, return-path thread; destination: ram-load thread,
> page-fault thread) will be halted.
> 
> The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
> a recovery, both source/destination VM will jump into this stage, do
> whatever it needs to prepare the recovery (e.g., currently the most
> important thing is to synchronize the dirty bitmap, please see commit
> messages for more information). After the preparation is ready, the
> source will do the final handshake with destination, then both sides
> will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.
> 
> New commands/messages are defined as well to satisfy the need:
> 
> MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
> delivering received bitmaps
> 
> MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
> handshake of postcopy recovery.
> 
> Here's some more details on how the whole failure/recovery routine is
> happened:
> 
> - start migration
> - ... (switch from precopy to postcopy)
> - both sides are in "postcopy-active" state
> - ... (failure happened, e.g., network unplugged)
> - both sides switch to "postcopy-paused" state
>   - all the migration threads are stopped on both sides
> - ... (both VMs hanged)
> - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
>   source side, "-r" means "recover")
> - both sides switch to "postcopy-recover" state
>   - on source: send-thread, return-path-thread will be waked up
>   - on dest: ram-load-thread waked up, fault-thread still paused
> - source calls new savevmhandler hook resume_prepare() (currently,
>   only ram is providing the hook):
>   - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
>     - src sends MIG_CMD_RECV_BITMAP to dst
>     - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
>       - src uses the recved bitmap to rebuild dirty bitmap
> - source do final handshake with destination
>   - src sends MIG_CMD_RESUME to dst, telling "src is ready"
>     - when dst receives the command, fault thread will be waked up,
>       meanwhile, dst switch back to "postcopy-active"
>   - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
>     - when src receives the ack, state switch to "postcopy-active"
> - postcopy migration continued
> 
> Testing:
> 
> As I said, it's still an extremely simple test. I used socat to create
> a socket bridge:
> 
>   socat tcp-listen:6666 tcp-connect:localhost:5555 &
> 
> Then do the migration via the bridge. I emulated the network failure
> by killing the socat process (bridge down), then tries to recover the
> migration using the other channel (default dst channel). It looks
> like:
> 
>         port:6666    +------------------+
>         +----------> | socat bridge [1] |-------+
>         |            +------------------+       |
>         |         (Original channel)            |
>         |                                       | port: 5555
>      +---------+  (Recovery channel)            +--->+---------+
>      | src VM  |------------------------------------>| dst VM  |
>      +---------+                                     +---------+
> 
> Known issues/notes:
> 
> - currently destination listening port still cannot change. E.g., the
>   recovery should be using the same port on destination for
>   simplicity. (on source, we can specify new URL)
> 
> - the patch: "migration: let dst listen on port always" is still
>   hacky, it just kept the incoming accept open forever for now...
> 
> - some migration numbers might still be inaccurate, like total
>   migration time, etc. (But I don't really think that matters much
>   now)
> 
> - the patches are very lightly tested.
> 
> - Dave reported one problem that may hang destination main loop thread
>   (one vcpu thread holds the BQL) and the rest. I haven't encountered
>   it yet, but it does not mean this series can survive with it.
> 
> - other potential issues that I may have forgotten or unnoticed...
> 
> Anyway, the work is still in preliminary stage. Any suggestions and
> comments are greatly welcomed.  Thanks.
> 
> Peter Xu (28):
>   migration: better error handling with QEMUFile
>   migration: reuse mis->userfault_quit_fd
>   migration: provide postcopy_fault_thread_notify()
>   migration: new postcopy-pause state
>   migration: implement "postcopy-pause" src logic
>   migration: allow dst vm pause on postcopy
>   migration: allow src return path to pause
>   migration: allow send_rq to fail
>   migration: allow fault thread to pause
>   qmp: hmp: add migrate "resume" option
>   migration: pass MigrationState to migrate_init()
>   migration: rebuild channel on source
>   migration: new state "postcopy-recover"
>   migration: wakeup dst ram-load-thread for recover
>   migration: new cmd MIG_CMD_RECV_BITMAP
>   migration: new message MIG_RP_MSG_RECV_BITMAP
>   migration: new cmd MIG_CMD_POSTCOPY_RESUME
>   migration: new message MIG_RP_MSG_RESUME_ACK
>   migration: introduce SaveVMHandlers.resume_prepare
>   migration: synchronize dirty bitmap for resume
>   migration: setup ramstate for resume
>   migration: final handshake for the resume
>   migration: free SocketAddress where allocated
>   migration: init dst in migration_object_init too
>   io: let watcher of the channel run in same ctx
>   migration: allow migrate_cancel to pause postcopy
>   qmp/migration: new command migrate-recover
>   hmp/migration: add migrate_recover command
> 
>  hmp-commands.hx              |  28 ++-
>  hmp.c                        |  14 +-
>  hmp.h                        |   1 +
>  include/migration/register.h |   2 +
>  io/channel.c                 |   2 +-
>  migration/migration.c        | 549 ++++++++++++++++++++++++++++++++++++++-----
>  migration/migration.h        |  24 +-
>  migration/postcopy-ram.c     | 110 +++++++--
>  migration/postcopy-ram.h     |   2 +
>  migration/ram.c              | 247 ++++++++++++++++++-
>  migration/ram.h              |   3 +
>  migration/savevm.c           | 233 +++++++++++++++++-
>  migration/savevm.h           |   3 +
>  migration/socket.c           |   4 +-
>  migration/trace-events       |  21 ++
>  qapi/migration.json          |  35 ++-
>  16 files changed, 1172 insertions(+), 106 deletions(-)
> 
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2017-12-05 18:43 ` Dr. David Alan Gilbert
@ 2017-12-06  2:39   ` Peter Xu
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Xu @ 2017-12-06  2:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

On Tue, Dec 05, 2017 at 06:43:42PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Tree is pushed here for better reference and testing (online tree
> > includes monitor OOB series):
> > 
> >   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> > 
> > This version removed quite a few patches related to migrate-incoming,
> > instead I introduced a new command "migrate-recover" to trigger the
> > recovery channel on destination side to simplify the code.
> > 
> > To test this two series altogether, please checkout above tree and
> > build.  Note: to test on small and single host, one need to disable
> > full bandwidth postcopy migration otherwise it'll complete very fast.
> > Basically a simple patch like this would help:
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 4de3b551fe..c0206023d7 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >       * will notice we're in POSTCOPY_ACTIVE and not actually
> >       * wrap their state up here
> >       */
> > -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> >      if (migrate_postcopy_ram()) {
> >          /* Ping just for debugging, helps line traces up */
> >          qemu_savevm_send_ping(ms->to_dst_file, 2);
> > 
> > This patch is included already in above github tree.  Please feel free
> > to drop this patch when want to test on big machines and between real
> > hosts.
> > 
> > Detailed Test Procedures (QMP only)
> > ===================================
> > 
> > 1. start source QEMU.
> > 
> > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> >      -smp 4 -m 1G -qmp stdio \
> >      -name peter-vm,debug-threads=on \
> >      -netdev user,id=net0 \
> >      -device e1000,netdev=net0 \
> >      -global migration.x-max-bandwidth=4096 \
> >      -global migration.x-postcopy-ram=on \
> >      /images/fedora-25.qcow2
> 
> I suspect -snapshot isn't doing the right thing to the storage when
> combined with the migration - I'm assuming the destination isn't using
> the same temporary file.
> (Also any reason for specifying split irqchip?)

Ah yes.  Sorry we should not use "-snapshot" here.  Please remove it.

I think my smoke test just didn't try to fetch anything on that temp
storage so nothing went wrong.

And, no reason for split irqchip - I just fetched this command line
somewhere where I was testing IOMMUs. :-) Please feel free to remove
it too if you want.

(so basically I was just pasting my smoke test command lines, not
 really command line required to run the tests)

> 
> > 2. start destination QEMU.
> > 
> > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> >      -smp 4 -m 1G -qmp stdio \
> >      -name peter-vm,debug-threads=on \
> >      -netdev user,id=net0 \
> >      -device e1000,netdev=net0 \
> >      -global migration.x-max-bandwidth=4096 \
> >      -global migration.x-postcopy-ram=on \
> >      -incoming tcp:0.0.0.0:5555 \
> >      /images/fedora-25.qcow2
> > 
> > 3. On source, do QMP handshake as normal:
> > 
> >   {"execute": "qmp_capabilities"}
> >   {"return": {}}
> > 
> > 4. On destination, do QMP handshake to enable OOB:
> > 
> >   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> >   {"return": {}}
> > 
> > 5. On source, trigger initial migrate command, switch to postcopy:
> > 
> >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
> >   {"return": {}}
> >   {"execute": "query-migrate"}
> >   {"return": {"expected-downtime": 300, "status": "active", ...}}
> >   {"execute": "migrate-start-postcopy"}
> >   {"return": {}}
> >   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
> >   {"execute": "query-migrate"}
> >   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> > 
> > 6. On source, manually trigger a "fake network down" using
> >    "migrate-cancel" command:
> > 
> >   {"execute": "migrate_cancel"}
> >   {"return": {}}
> > 
> >   During postcopy, it'll not really cancel the migration, but pause
> >   it.  On both sides, we should see this on stderr:
> > 
> >   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > 
> >   It means now both sides are in postcopy-pause state.
> > 
> > 7. (Optional) On destination side, let's try to hang the main thread
> >    using the new x-oob-test command, providing a "lock=true" param:
> > 
> >    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
> >     "arguments": { "lock": true } }
> > 
> >    After sending this command, we should not see any "return", because
> >    main thread is blocked already.  But we can still use the monitor
> >    since the monitor now has dedicated IOThread.
> > 
> > 8. On destination side, provide a new incoming port using the new
> >    command "migrate-recover" (note that if step 7 is carried out, we
> >    _must_ use OOB form, otherwise the command will hang.  With OOB,
> >    this command will return immediately):
> > 
> >   {"execute": "migrate-recover", "id": "recover-cmd",
> >    "arguments": { "uri": "tcp:localhost:5556" },
> >    "control": { "run-oob": true } }
> >   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
> >    "event": "MIGRATION", "data": {"status": "setup"}}
> >   {"return": {}, "id": "recover-cmd"}
> > 
> >    We can see that the command will success even if main thread is
> >    locked up.
> > 
> > 9. (Optional) This step is only needed if step 7 is carried out. On
> >    destination, let's unlock the main thread before resuming the
> >    migration, this time with "lock=false" to unlock the main thread
> >    (since system running needs the main thread). Note that we _must_
> >    use OOB command here too:
> > 
> >   {"execute": "x-oob-test", "id": "unlock-dispatcher",
> >    "arguments": { "lock": false }, "control": { "run-oob": true } }
> >   {"return": {}, "id": "unlock-dispatcher"}
> >   {"return": {}, "id": "lock-dispatcher-cmd"}
> > 
> >   Here the first "return" is the reply to the unlock command, the
> >   second "return" is the reply to the lock command.  After this
> >   command, main thread is released.
> > 
> > 10. On source, resume the postcopy migration:
> > 
> >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }}
> >   {"return": {}}
> >   {"execute": "query-migrate"}
> >   {"return": {"status": "completed", ...}}
> 
> The use of x-oob-test to lock things is a bit different to reality
> and that means the ordering is different.
> When the destination is blocked by a page request, that page won't
> become unstuck until sometime after (10) happens and delivers the page
> to the target.
> 
> You could try an 'info cpu' on the destination at (7) - although it's
> not guaranteed to lock, depending whether the page needed has arrived.

Yes info cpus (or say "query-cpus", in QMP) would work too.  The
"return" will be delayed until sending the resuming command, but it's
the same thing - here I just want to make sure main thread is totally
hang death, so I can know whether the new accept() port and the whole
workflow will work even with that.

Btw, IMHO "info cpus" should guarantee a block, if not, we just do
something in guest to make sure guest hangs, then at least one vcpu
must be waiting for a page.  Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy Peter Xu
@ 2017-12-14 13:10   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-14 13:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> When there is IO error on the incoming channel (e.g., network down),
> instead of bailing out immediately, we allow the dst vm to switch to the
> new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
> new semaphore, until someone poke it for another attempt.
> 
> One note is that here on ram loading thread we cannot detect the
> POSTCOPY_ACTIVE state, but we need to detect the more specific
> POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
> the device states.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c  |  1 +
>  migration/migration.h  |  3 +++
>  migration/savevm.c     | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  migration/trace-events |  2 ++
>  4 files changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index f69a48ac63..6f0d48bbed 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -150,6 +150,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
>          memset(&mis_current, 0, sizeof(MigrationIncomingState));
>          qemu_mutex_init(&mis_current.rp_mutex);
>          qemu_event_init(&mis_current.main_thread_load_event, false);
> +        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
>          once = true;
>      }
>      return &mis_current;
> diff --git a/migration/migration.h b/migration/migration.h
> index 36aaa13f50..55894ecb79 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -61,6 +61,9 @@ struct MigrationIncomingState {
>      /* The coroutine we should enter (back) after failover */
>      Coroutine *migration_incoming_co;
>      QemuSemaphore colo_incoming_sem;
> +
> +    /* notify PAUSED postcopy incoming migrations to try to continue */
> +    QemuSemaphore postcopy_pause_sem_dst;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 8814793255..1e5fc4f332 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1529,8 +1529,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
>   */
>  static void *postcopy_ram_listen_thread(void *opaque)
>  {
> -    QEMUFile *f = opaque;
>      MigrationIncomingState *mis = migration_incoming_get_current();
> +    QEMUFile *f = mis->from_src_file;
>      int load_res;
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> @@ -1544,6 +1544,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
>       */
>      qemu_file_set_blocking(f, true);
>      load_res = qemu_loadvm_state_main(f, mis);
> +
> +    /*
> +     * This is tricky, but, mis->from_src_file can change after it
> +     * returns, when postcopy recovery happened. In the future, we may
> +     * want a wrapper for the QEMUFile handle.
> +     */
> +    f = mis->from_src_file;
> +
>      /* And non-blocking again so we don't block in any cleanup */
>      qemu_file_set_blocking(f, false);
>  
> @@ -1626,7 +1634,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      /* Start up the listening thread and wait for it to signal ready */
>      qemu_sem_init(&mis->listen_thread_sem, 0);
>      qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> -                       postcopy_ram_listen_thread, mis->from_src_file,
> +                       postcopy_ram_listen_thread, NULL,
>                         QEMU_THREAD_DETACHED);
>      qemu_sem_wait(&mis->listen_thread_sem);
>      qemu_sem_destroy(&mis->listen_thread_sem);
> @@ -2011,11 +2019,44 @@ void qemu_loadvm_state_cleanup(void)
>      }
>  }
>  
> +/* Return true if we should continue the migration, or false. */
> +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_incoming();
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_POSTCOPY_PAUSED);
> +
> +    assert(mis->from_src_file);
> +    qemu_file_shutdown(mis->from_src_file);
> +    qemu_fclose(mis->from_src_file);
> +    mis->from_src_file = NULL;
> +
> +    assert(mis->to_src_file);
> +    qemu_file_shutdown(mis->to_src_file);
> +    qemu_mutex_lock(&mis->rp_mutex);
> +    qemu_fclose(mis->to_src_file);
> +    mis->to_src_file = NULL;
> +    qemu_mutex_unlock(&mis->rp_mutex);
> +
> +    error_report("Detected IO failure for postcopy. "
> +                 "Migration paused.");
> +
> +    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> +    }
> +
> +    trace_postcopy_pause_incoming_continued();
> +
> +    return true;
> +}
> +
>  static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  {
>      uint8_t section_type;
>      int ret = 0;
>  
> +retry:
>      while (true) {
>          section_type = qemu_get_byte(f);
>  
> @@ -2060,6 +2101,24 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
>  out:
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
> +
> +        /*
> +         * Detect whether it is:
> +         *
> +         * 1. postcopy running (after receiving all device data, which
> +         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
> +         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
> +         *    still receiving device states).
> +         * 2. network failure (-EIO)
> +         *
> +         * If so, we try to wait for a recovery.
> +         */
> +        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
> +            ret == -EIO && postcopy_pause_incoming(mis)) {
> +            /* Reset f to point to the newly created channel */
> +            f = mis->from_src_file;
> +            goto retry;
> +        }
>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index da1c63a933..bed1646cd6 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -100,6 +100,8 @@ open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""
>  postcopy_pause_continued(void) ""
> +postcopy_pause_incoming(void) ""
> +postcopy_pause_incoming_continued(void) ""
>  postcopy_start_set_run(void) ""
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail
  2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail Peter Xu
@ 2017-12-14 13:21   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-14 13:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> We will not allow failures to happen when sending data from destination
> to source via the return path. However it is possible that there can be
> errors along the way.  This patch allows the migrate_send_rp_message()
> to return error when it happens, and further extended it to
> migrate_send_rp_req_pages().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 35 ++++++++++++++++++++++++++++-------
>  migration/migration.h |  2 +-
>  2 files changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0dfbbe5673..e3ce22dce4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -199,17 +199,35 @@ static void deferred_incoming_migration(Error **errp)
>   * Send a message on the return channel back to the source
>   * of the migration.
>   */
> -static void migrate_send_rp_message(MigrationIncomingState *mis,
> -                                    enum mig_rp_message_type message_type,
> -                                    uint16_t len, void *data)
> +static int migrate_send_rp_message(MigrationIncomingState *mis,
> +                                   enum mig_rp_message_type message_type,
> +                                   uint16_t len, void *data)
>  {
> +    int ret = 0;
> +
>      trace_migrate_send_rp_message((int)message_type, len);
>      qemu_mutex_lock(&mis->rp_mutex);
> +
> +    /*
> +     * It's possible that the file handle got lost due to network
> +     * failures.
> +     */
> +    if (!mis->to_src_file) {
> +        ret = -EIO;
> +        goto error;
> +    }
> +
>      qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
>      qemu_put_be16(mis->to_src_file, len);
>      qemu_put_buffer(mis->to_src_file, data, len);
>      qemu_fflush(mis->to_src_file);
> +
> +    /* It's possible that qemu file got error during sending */
> +    ret = qemu_file_get_error(mis->to_src_file);
> +
> +error:
>      qemu_mutex_unlock(&mis->rp_mutex);
> +    return ret;
>  }
>  
>  /* Request a range of pages from the source VM at the given
> @@ -219,11 +237,12 @@ static void migrate_send_rp_message(MigrationIncomingState *mis,
>   *   Start: Address offset within the RB
>   *   Len: Length in bytes required - must be a multiple of pagesize
>   */
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> -                               ram_addr_t start, size_t len)
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> +                              ram_addr_t start, size_t len)
>  {
>      uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname up to 256 */
>      size_t msglen = 12; /* start + len */
> +    enum mig_rp_message_type msg_type;
>  
>      *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
>      *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
> @@ -235,10 +254,12 @@ void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
>          bufc[msglen++] = rbname_len;
>          memcpy(bufc + msglen, rbname, rbname_len);
>          msglen += rbname_len;
> -        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
> +        msg_type = MIG_RP_MSG_REQ_PAGES_ID;
>      } else {
> -        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
> +        msg_type = MIG_RP_MSG_REQ_PAGES;
>      }
> +
> +    return migrate_send_rp_message(mis, msg_type, msglen, bufc);
>  }
>  
>  void qemu_start_incoming_migration(const char *uri, Error **errp)
> diff --git a/migration/migration.h b/migration/migration.h
> index ebb049f692..b63cdfbfdb 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -216,7 +216,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>                            uint32_t value);
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
> -void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> +int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
>  
>  #endif
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy
  2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy Peter Xu
@ 2017-12-19 10:58   ` Dr. David Alan Gilbert
  2018-01-24  8:28     ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-19 10:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> It was allowed in the past to even cancel a postcopy migration, but it
> does not really make sense, and no one should be using it, since
> cancelling a migration during postcopy means crashing the VM at no time.
> 
> Let's just use re-use this command as a way to pause the postcopy
> migration when we detected that we are in postcopy migration.  This can
> be used when we want to trigger the postcopy recovery manually.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Yes, OK, this is a little odd without having any flags or anything, but
it's essentially the saem reason that cancel exists - to stop a
migration we know that's broken for some reason.

(I could argue whether this should be special cased in migrate_fd_cancel
instead, but that's just a preference).


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  hmp-commands.hx       |  8 ++++++--
>  migration/migration.c | 18 +++++++++++++++++-
>  qapi/migration.json   |  3 ++-
>  3 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ffcdc34652..32fdd52212 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -952,14 +952,18 @@ ETEXI
>          .name       = "migrate_cancel",
>          .args_type  = "",
>          .params     = "",
> -        .help       = "cancel the current VM migration",
> +        .help       = "cancel the current VM precopy migration, or "
> +                      "pause the migration if it's in postcopy state, "
> +                      "so that a postcopy recovery can be carried out later.",
>          .cmd        = hmp_migrate_cancel,
>      },
>  
>  STEXI
>  @item migrate_cancel
>  @findex migrate_cancel
> -Cancel the current VM migration.
> +If during a precopy, this command cancels the migration.  If during
> +postcopy, this command pauses the migration (so that a postcopy
> +recovery can be carried out afterward).
>  ETEXI
>  
>      {
> diff --git a/migration/migration.c b/migration/migration.c
> index 8762fad9be..0c1a783df2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1535,7 +1535,23 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
>  
>  void qmp_migrate_cancel(Error **errp)
>  {
> -    migrate_fd_cancel(migrate_get_current());
> +    MigrationState *ms = migrate_get_current();
> +    int ret;
> +
> +    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        /*
> +         * Cancelling a postcopy migration does not really make sense.
> +         * Here instead we pause the migration only, so another
> +         * recovery can take place if needed.
> +         */
> +        ret = qemu_file_shutdown(ms->to_dst_file);
> +        if (ret) {
> +            error_setg(errp, "Failed to pause migration stream.");
> +        }
> +        return;
> +    }
> +
> +    migrate_fd_cancel(ms);
>  }
>  
>  void qmp_migrate_continue(MigrationStatus state, Error **errp)
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5dfac0681d..e15bb516cd 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -881,7 +881,8 @@
>  ##
>  # @migrate_cancel:
>  #
> -# Cancel the current executing migration process.
> +# Cancel the current executing migration process for precopy.  For
> +# postcopy, it'll pause the migration instead.
>  #
>  # Returns: nothing on success
>  #
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
                   ` (29 preceding siblings ...)
  2017-12-05 18:43 ` Dr. David Alan Gilbert
@ 2018-01-11 16:59 ` Dr. David Alan Gilbert
  2018-01-12  9:27   ` Peter Xu
  30 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-11 16:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> Tree is pushed here for better reference and testing (online tree
> includes monitor OOB series):
> 
>   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> 
> This version removed quite a few patches related to migrate-incoming,
> instead I introduced a new command "migrate-recover" to trigger the
> recovery channel on destination side to simplify the code.

I've got this setup on a couple of my test hosts, and I'm using
iptables to try breaking the connection.

See below for where I got stuck.

> To test this two series altogether, please checkout above tree and
> build.  Note: to test on small and single host, one need to disable
> full bandwidth postcopy migration otherwise it'll complete very fast.
> Basically a simple patch like this would help:
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4de3b551fe..c0206023d7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
>       * will notice we're in POSTCOPY_ACTIVE and not actually
>       * wrap their state up here
>       */
> -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
>      if (migrate_postcopy_ram()) {
>          /* Ping just for debugging, helps line traces up */
>          qemu_savevm_send_ping(ms->to_dst_file, 2);
> 
> This patch is included already in above github tree.  Please feel free
> to drop this patch when want to test on big machines and between real
> hosts.
> 
> Detailed Test Procedures (QMP only)
> ===================================
> 
> 1. start source QEMU.
> 
> $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
>      -smp 4 -m 1G -qmp stdio \
>      -name peter-vm,debug-threads=on \
>      -netdev user,id=net0 \
>      -device e1000,netdev=net0 \
>      -global migration.x-max-bandwidth=4096 \
>      -global migration.x-postcopy-ram=on \
>      /images/fedora-25.qcow2
>
> 2. start destination QEMU.
> 
> $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
>      -smp 4 -m 1G -qmp stdio \
>      -name peter-vm,debug-threads=on \
>      -netdev user,id=net0 \
>      -device e1000,netdev=net0 \
>      -global migration.x-max-bandwidth=4096 \
>      -global migration.x-postcopy-ram=on \
>      -incoming tcp:0.0.0.0:5555 \
>      /images/fedora-25.qcow2

I'm using:
./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh
and I've got both the HMP on the stdio, and the QMP via a telnet

> 
> 3. On source, do QMP handshake as normal:
> 
>   {"execute": "qmp_capabilities"}
>   {"return": {}}
> 
> 4. On destination, do QMP handshake to enable OOB:
> 
>   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
>   {"return": {}}
> 
> 5. On source, trigger initial migrate command, switch to postcopy:
> 
>   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
>   {"return": {}}
>   {"execute": "query-migrate"}
>   {"return": {"expected-downtime": 300, "status": "active", ...}}
>   {"execute": "migrate-start-postcopy"}
>   {"return": {}}
>   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
>   {"execute": "query-migrate"}
>   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> 
> 6. On source, manually trigger a "fake network down" using
>    "migrate-cancel" command:
> 
>   {"execute": "migrate_cancel"}
>   {"return": {}}

Before I do that, I'm breaking the network connection by running on the
source:
iptables -A INPUT -p tcp --source-port 8888 -j DROP
iptables -A INPUT -p tcp --destination-port 8888 -j DROP

>   During postcopy, it'll not really cancel the migration, but pause
>   it.  On both sides, we should see this on stderr:
> 
>   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> 
>   It means now both sides are in postcopy-pause state.

Now, here we start to have a problem; I do the migrate-cancel on the
source, that works and goes into pause; but remember the network is
broken, so the destination hasn't received the news.

> 7. (Optional) On destination side, let's try to hang the main thread
>    using the new x-oob-test command, providing a "lock=true" param:
> 
>    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
>     "arguments": { "lock": true } }
> 
>    After sending this command, we should not see any "return", because
>    main thread is blocked already.  But we can still use the monitor
>    since the monitor now has dedicated IOThread.
> 
> 8. On destination side, provide a new incoming port using the new
>    command "migrate-recover" (note that if step 7 is carried out, we
>    _must_ use OOB form, otherwise the command will hang.  With OOB,
>    this command will return immediately):
> 
>   {"execute": "migrate-recover", "id": "recover-cmd",
>    "arguments": { "uri": "tcp:localhost:5556" },
>    "control": { "run-oob": true } }
>   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
>    "event": "MIGRATION", "data": {"status": "setup"}}
>   {"return": {}, "id": "recover-cmd"}
> 
>    We can see that the command will success even if main thread is
>    locked up.

Because the destination didn't get the news of the pause, I get:
{"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}}

and I can't explicitly cause a cancel on the destination:
{"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}}

So I think we need a way out of this on the destination.

Dave

> 9. (Optional) This step is only needed if step 7 is carried out. On
>    destination, let's unlock the main thread before resuming the
>    migration, this time with "lock=false" to unlock the main thread
>    (since system running needs the main thread). Note that we _must_
>    use OOB command here too:
> 
>   {"execute": "x-oob-test", "id": "unlock-dispatcher",
>    "arguments": { "lock": false }, "control": { "run-oob": true } }
>   {"return": {}, "id": "unlock-dispatcher"}
>   {"return": {}, "id": "lock-dispatcher-cmd"}
> 
>   Here the first "return" is the reply to the unlock command, the
>   second "return" is the reply to the lock command.  After this
>   command, main thread is released.
> 
> 10. On source, resume the postcopy migration:
> 
>   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }}
>   {"return": {}}
>   {"execute": "query-migrate"}
>   {"return": {"status": "completed", ...}}
> 
> Here's the change log:
> 
> v5:
> - add some more r-bs
> - fix error path in ram_load_postcopy to always check on "ret" [Dave]
> - move init/destroy of three new sems into migration object
>   init/finalize functions
> - dropped patch "migration: delay the postcopy-active state switch",
>   meanwhile touch up patch 6 to check against
>   POSTCOPY_INCOMING_RUNNING state when trying to switch to
>   postcopy-pause state. [Dave]
> - drop two patches that introduce qmp/hmp of migrate-pause, instead
>   re-use migrate-cancel to do manual trigger of postcopy recovery.
> - add a new patch to let migrate_cancel to pause migration if it's
>   already in postcopy phase.
> - add a new command "migrate-recover" to re-assign the incoming port,
>   instead of reusing migrate-incoming.
> - since now I used migrate-recover command instead of migrate-incoming
>   itself, I dropped quite a few patches that are not really relevant
>   now, so the series got smaller:
>         migration: return incoming task tag for sockets
>         migration: return incoming task tag for exec
>         migration: return incoming task tag for fd
>         migration: store listen task tag
>         migration: allow migrate_incoming for paused VM
> 
> v4:
> - fix two compile errors that patchew reported
> - for QMP: do s/2.11/2.12/g
> - fix migrate-incoming logic to be more strict
> 
> v3:
> - add r-bs correspondingly
> - in ram_load_postcopy() capture error if postcopy_place_page() failed
>   [Dave]
> - remove "break" if there is a "goto" before that [Dave]
> - ram_dirty_bitmap_reload(): use PRIx64 where needed, add some more
>   print sizes [Dave]
> - remove RAMState.ramblock_to_sync, instead use local counter [Dave]
> - init tag in tcp_start_incoming_migration() [Dave]
> - more traces when transmiting the recv bitmap [Dave]
> - postcopy_pause_incoming(): do shutdown before taking rp lock [Dave]
> - add one more patch to postpone the state switch of postcopy-active [Dave]
> - refactor the migrate_incoming handling according to the email
>   discussion [Dave]
> - add manual trigger to pause postcopy (two new patches added to
>   introduce "migrate-pause" command for QMP/HMP). [Dave]
> 
> v2:
> - rebased to alexey's received bitmap v9
> - add Dave's r-bs for patches: 2/5/6/8/9/13/14/15/16/20/21
> - patch 1: use target page size to calc bitmap [Dave]
> - patch 3: move trace_*() after EINTR check [Dave]
> - patch 4: dropped since I can use bitmap_complement() [Dave]
> - patch 7: check file error right after data is read in both
>   qemu_loadvm_section_start_full() and qemu_loadvm_section_part_end(),
>   meanwhile also check in check_section_footer() [Dave]
> - patch 8/9: fix error_report/commit message in both patches [Dave]
> - patch 10: dropped (new parameter "x-postcopy-fast")
> - patch 11: split the "postcopy-paused" patch into two, one to
>   introduce the new state, the other to implement the logic. Also,
>   print something when paused [Dave]
> - patch 17: removed do_resume label, introduced migration_prepare()
>   [Dave]
> - patch 18: removed do_pause label using a new loop [Dave]
> - patch 20: removed incorrect comment [Dave]
> - patch 21: use 256B buffer in qemu_savevm_send_recv_bitmap(), add
>   trace in loadvm_handle_recv_bitmap() [Dave]
> - patch 22: fix MIG_RP_MSG_RECV_BITMAP for (1) endianess (2) 32/64bit
>   machines. More info in the commit message update.
> - patch 23: add one check on migration state [Dave]
> - patch 24: use macro instead of magic 1 [Dave]
> - patch 26: use more trace_*() instead of one, and use one sem to
>   replace mutex+cond. [Dave]
> - move sem init/destroy into migration_instance_init() and
>   migration_instance_finalize (new function after rebase).
> - patch 29: squashed this patch most into:
>   "migration: implement "postcopy-pause" src logic" [Dave]
> - split the two fix patches out of the series
> - fixed two places where I misused "wake/woke/woken". [Dave]
> - add new patch "bitmap: provide to_le/from_le helpers" to solve the
>   bitmap endianess issue [Dave]
> - appended migrate_incoming series to this series, since that one is
>   depending on the paused state.  Using explicit g_source_remove() for
>   listening ports [Dan]
> 
> FUTURE TODO LIST
> - support migrate_cancel during PAUSED/RECOVER state
> - when anything wrong happens during PAUSED/RECOVER, switching back to
>   PAUSED state on both sides
> 
> As we all know that postcopy migration has a potential risk to lost
> the VM if the network is broken during the migration. This series
> tries to solve the problem by allowing the migration to pause at the
> failure point, and do recovery after the link is reconnected.
> 
> There was existing work on this issue from Md Haris Iqbal:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> 
> This series is a totally re-work of the issue, based on Alexey
> Perevalov's recved bitmap v8 series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
> 
> Two new status are added to support the migration (used on both
> sides):
> 
>   MIGRATION_STATUS_POSTCOPY_PAUSED
>   MIGRATION_STATUS_POSTCOPY_RECOVER
> 
> The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
> network failure is detected. It is a phase that we'll be in for a long
> time as long as the failure is detected, and we'll be there until a
> recovery is triggered.  In this state, all the threads (on source:
> send thread, return-path thread; destination: ram-load thread,
> page-fault thread) will be halted.
> 
> The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
> a recovery, both source/destination VM will jump into this stage, do
> whatever it needs to prepare the recovery (e.g., currently the most
> important thing is to synchronize the dirty bitmap, please see commit
> messages for more information). After the preparation is ready, the
> source will do the final handshake with destination, then both sides
> will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.
> 
> New commands/messages are defined as well to satisfy the need:
> 
> MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
> delivering received bitmaps
> 
> MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
> handshake of postcopy recovery.
> 
> Here's some more details on how the whole failure/recovery routine is
> happened:
> 
> - start migration
> - ... (switch from precopy to postcopy)
> - both sides are in "postcopy-active" state
> - ... (failure happened, e.g., network unplugged)
> - both sides switch to "postcopy-paused" state
>   - all the migration threads are stopped on both sides
> - ... (both VMs hanged)
> - ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
>   source side, "-r" means "recover")
> - both sides switch to "postcopy-recover" state
>   - on source: send-thread, return-path-thread will be waked up
>   - on dest: ram-load-thread waked up, fault-thread still paused
> - source calls new savevmhandler hook resume_prepare() (currently,
>   only ram is providing the hook):
>   - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
>     - src sends MIG_CMD_RECV_BITMAP to dst
>     - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
>       - src uses the recved bitmap to rebuild dirty bitmap
> - source do final handshake with destination
>   - src sends MIG_CMD_RESUME to dst, telling "src is ready"
>     - when dst receives the command, fault thread will be waked up,
>       meanwhile, dst switch back to "postcopy-active"
>   - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
>     - when src receives the ack, state switch to "postcopy-active"
> - postcopy migration continued
> 
> Testing:
> 
> As I said, it's still an extremely simple test. I used socat to create
> a socket bridge:
> 
>   socat tcp-listen:6666 tcp-connect:localhost:5555 &
> 
> Then do the migration via the bridge. I emulated the network failure
> by killing the socat process (bridge down), then tries to recover the
> migration using the other channel (default dst channel). It looks
> like:
> 
>         port:6666    +------------------+
>         +----------> | socat bridge [1] |-------+
>         |            +------------------+       |
>         |         (Original channel)            |
>         |                                       | port: 5555
>      +---------+  (Recovery channel)            +--->+---------+
>      | src VM  |------------------------------------>| dst VM  |
>      +---------+                                     +---------+
> 
> Known issues/notes:
> 
> - currently destination listening port still cannot change. E.g., the
>   recovery should be using the same port on destination for
>   simplicity. (on source, we can specify new URL)
> 
> - the patch: "migration: let dst listen on port always" is still
>   hacky, it just kept the incoming accept open forever for now...
> 
> - some migration numbers might still be inaccurate, like total
>   migration time, etc. (But I don't really think that matters much
>   now)
> 
> - the patches are very lightly tested.
> 
> - Dave reported one problem that may hang destination main loop thread
>   (one vcpu thread holds the BQL) and the rest. I haven't encountered
>   it yet, but it does not mean this series can survive with it.
> 
> - other potential issues that I may have forgotten or unnoticed...
> 
> Anyway, the work is still in preliminary stage. Any suggestions and
> comments are greatly welcomed.  Thanks.
> 
> Peter Xu (28):
>   migration: better error handling with QEMUFile
>   migration: reuse mis->userfault_quit_fd
>   migration: provide postcopy_fault_thread_notify()
>   migration: new postcopy-pause state
>   migration: implement "postcopy-pause" src logic
>   migration: allow dst vm pause on postcopy
>   migration: allow src return path to pause
>   migration: allow send_rq to fail
>   migration: allow fault thread to pause
>   qmp: hmp: add migrate "resume" option
>   migration: pass MigrationState to migrate_init()
>   migration: rebuild channel on source
>   migration: new state "postcopy-recover"
>   migration: wakeup dst ram-load-thread for recover
>   migration: new cmd MIG_CMD_RECV_BITMAP
>   migration: new message MIG_RP_MSG_RECV_BITMAP
>   migration: new cmd MIG_CMD_POSTCOPY_RESUME
>   migration: new message MIG_RP_MSG_RESUME_ACK
>   migration: introduce SaveVMHandlers.resume_prepare
>   migration: synchronize dirty bitmap for resume
>   migration: setup ramstate for resume
>   migration: final handshake for the resume
>   migration: free SocketAddress where allocated
>   migration: init dst in migration_object_init too
>   io: let watcher of the channel run in same ctx
>   migration: allow migrate_cancel to pause postcopy
>   qmp/migration: new command migrate-recover
>   hmp/migration: add migrate_recover command
> 
>  hmp-commands.hx              |  28 ++-
>  hmp.c                        |  14 +-
>  hmp.h                        |   1 +
>  include/migration/register.h |   2 +
>  io/channel.c                 |   2 +-
>  migration/migration.c        | 549 ++++++++++++++++++++++++++++++++++++++-----
>  migration/migration.h        |  24 +-
>  migration/postcopy-ram.c     | 110 +++++++--
>  migration/postcopy-ram.h     |   2 +
>  migration/ram.c              | 247 ++++++++++++++++++-
>  migration/ram.h              |   3 +
>  migration/savevm.c           | 233 +++++++++++++++++-
>  migration/savevm.h           |   3 +
>  migration/socket.c           |   4 +-
>  migration/trace-events       |  21 ++
>  qapi/migration.json          |  35 ++-
>  16 files changed, 1172 insertions(+), 106 deletions(-)
> 
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2018-01-11 16:59 ` Dr. David Alan Gilbert
@ 2018-01-12  9:27   ` Peter Xu
  2018-01-12 12:27     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2018-01-12  9:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Tree is pushed here for better reference and testing (online tree
> > includes monitor OOB series):
> > 
> >   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> > 
> > This version removed quite a few patches related to migrate-incoming,
> > instead I introduced a new command "migrate-recover" to trigger the
> > recovery channel on destination side to simplify the code.
> 
> I've got this setup on a couple of my test hosts, and I'm using
> iptables to try breaking the connection.
> 
> See below for where I got stuck.
> 
> > To test this two series altogether, please checkout above tree and
> > build.  Note: to test on small and single host, one need to disable
> > full bandwidth postcopy migration otherwise it'll complete very fast.
> > Basically a simple patch like this would help:
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 4de3b551fe..c0206023d7 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >       * will notice we're in POSTCOPY_ACTIVE and not actually
> >       * wrap their state up here
> >       */
> > -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> >      if (migrate_postcopy_ram()) {
> >          /* Ping just for debugging, helps line traces up */
> >          qemu_savevm_send_ping(ms->to_dst_file, 2);
> > 
> > This patch is included already in above github tree.  Please feel free
> > to drop this patch when want to test on big machines and between real
> > hosts.
> > 
> > Detailed Test Procedures (QMP only)
> > ===================================
> > 
> > 1. start source QEMU.
> > 
> > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> >      -smp 4 -m 1G -qmp stdio \
> >      -name peter-vm,debug-threads=on \
> >      -netdev user,id=net0 \
> >      -device e1000,netdev=net0 \
> >      -global migration.x-max-bandwidth=4096 \
> >      -global migration.x-postcopy-ram=on \
> >      /images/fedora-25.qcow2
> >
> > 2. start destination QEMU.
> > 
> > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> >      -smp 4 -m 1G -qmp stdio \
> >      -name peter-vm,debug-threads=on \
> >      -netdev user,id=net0 \
> >      -device e1000,netdev=net0 \
> >      -global migration.x-max-bandwidth=4096 \
> >      -global migration.x-postcopy-ram=on \
> >      -incoming tcp:0.0.0.0:5555 \
> >      /images/fedora-25.qcow2
> 
> I'm using:
> ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh
> and I've got both the HMP on the stdio, and the QMP via a telnet
> 
> > 
> > 3. On source, do QMP handshake as normal:
> > 
> >   {"execute": "qmp_capabilities"}
> >   {"return": {}}
> > 
> > 4. On destination, do QMP handshake to enable OOB:
> > 
> >   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> >   {"return": {}}
> > 
> > 5. On source, trigger initial migrate command, switch to postcopy:
> > 
> >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
> >   {"return": {}}
> >   {"execute": "query-migrate"}
> >   {"return": {"expected-downtime": 300, "status": "active", ...}}
> >   {"execute": "migrate-start-postcopy"}
> >   {"return": {}}
> >   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
> >   {"execute": "query-migrate"}
> >   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> > 
> > 6. On source, manually trigger a "fake network down" using
> >    "migrate-cancel" command:
> > 
> >   {"execute": "migrate_cancel"}
> >   {"return": {}}
> 
> Before I do that, I'm breaking the network connection by running on the
> source:
> iptables -A INPUT -p tcp --source-port 8888 -j DROP
> iptables -A INPUT -p tcp --destination-port 8888 -j DROP

This is tricky... I think tcp keepalive may help, but for sure I
think we do need a way to cancel the migration on both side.  Please
see below comment.

> 
> >   During postcopy, it'll not really cancel the migration, but pause
> >   it.  On both sides, we should see this on stderr:
> > 
> >   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > 
> >   It means now both sides are in postcopy-pause state.
> 
> Now, here we start to have a problem; I do the migrate-cancel on the
> source, that works and goes into pause; but remember the network is
> broken, so the destination hasn't received the news.
> 
> > 7. (Optional) On destination side, let's try to hang the main thread
> >    using the new x-oob-test command, providing a "lock=true" param:
> > 
> >    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
> >     "arguments": { "lock": true } }
> > 
> >    After sending this command, we should not see any "return", because
> >    main thread is blocked already.  But we can still use the monitor
> >    since the monitor now has dedicated IOThread.
> > 
> > 8. On destination side, provide a new incoming port using the new
> >    command "migrate-recover" (note that if step 7 is carried out, we
> >    _must_ use OOB form, otherwise the command will hang.  With OOB,
> >    this command will return immediately):
> > 
> >   {"execute": "migrate-recover", "id": "recover-cmd",
> >    "arguments": { "uri": "tcp:localhost:5556" },
> >    "control": { "run-oob": true } }
> >   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
> >    "event": "MIGRATION", "data": {"status": "setup"}}
> >   {"return": {}, "id": "recover-cmd"}
> > 
> >    We can see that the command will success even if main thread is
> >    locked up.
> 
> Because the destination didn't get the news of the pause, I get:
> {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}}

This is normal since we didn't fail on destination, while...

> 
> and I can't explicitly cause a cancel on the destination:
> {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}}

... this is not normal.  I have two questions:

1. Have you provided

  "control": {"run-oob": true}

  field when sending command "migrate_cancel"?  Just to mention that
  we shouldn't do it in oob way for migrate_cancel.  Or it can be a
  monitor-oob bug.

2. Do we need to support "migrate_cancel" on destination?

For (2), I think we need it, but for now it only works on source for
sure.  So I think maybe I should add that support.

> 
> So I think we need a way out of this on the destination.

So that's my 2nd question.  How about we do this: migrate_cancel will
cancel incoming migration if:

        a. there is one incoming migration in progress, and
        b. postcopy is enabled

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2018-01-12  9:27   ` Peter Xu
@ 2018-01-12 12:27     ` Dr. David Alan Gilbert
  2018-01-24  6:19       ` Peter Xu
  0 siblings, 1 reply; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-12 12:27 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Tree is pushed here for better reference and testing (online tree
> > > includes monitor OOB series):
> > > 
> > >   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> > > 
> > > This version removed quite a few patches related to migrate-incoming,
> > > instead I introduced a new command "migrate-recover" to trigger the
> > > recovery channel on destination side to simplify the code.
> > 
> > I've got this setup on a couple of my test hosts, and I'm using
> > iptables to try breaking the connection.
> > 
> > See below for where I got stuck.
> > 
> > > To test this two series altogether, please checkout above tree and
> > > build.  Note: to test on small and single host, one need to disable
> > > full bandwidth postcopy migration otherwise it'll complete very fast.
> > > Basically a simple patch like this would help:
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 4de3b551fe..c0206023d7 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > >       * will notice we're in POSTCOPY_ACTIVE and not actually
> > >       * wrap their state up here
> > >       */
> > > -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > > +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > >      if (migrate_postcopy_ram()) {
> > >          /* Ping just for debugging, helps line traces up */
> > >          qemu_savevm_send_ping(ms->to_dst_file, 2);
> > > 
> > > This patch is included already in above github tree.  Please feel free
> > > to drop this patch when want to test on big machines and between real
> > > hosts.
> > > 
> > > Detailed Test Procedures (QMP only)
> > > ===================================
> > > 
> > > 1. start source QEMU.
> > > 
> > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > >      -smp 4 -m 1G -qmp stdio \
> > >      -name peter-vm,debug-threads=on \
> > >      -netdev user,id=net0 \
> > >      -device e1000,netdev=net0 \
> > >      -global migration.x-max-bandwidth=4096 \
> > >      -global migration.x-postcopy-ram=on \
> > >      /images/fedora-25.qcow2
> > >
> > > 2. start destination QEMU.
> > > 
> > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > >      -smp 4 -m 1G -qmp stdio \
> > >      -name peter-vm,debug-threads=on \
> > >      -netdev user,id=net0 \
> > >      -device e1000,netdev=net0 \
> > >      -global migration.x-max-bandwidth=4096 \
> > >      -global migration.x-postcopy-ram=on \
> > >      -incoming tcp:0.0.0.0:5555 \
> > >      /images/fedora-25.qcow2
> > 
> > I'm using:
> > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh
> > and I've got both the HMP on the stdio, and the QMP via a telnet
> > 
> > > 
> > > 3. On source, do QMP handshake as normal:
> > > 
> > >   {"execute": "qmp_capabilities"}
> > >   {"return": {}}
> > > 
> > > 4. On destination, do QMP handshake to enable OOB:
> > > 
> > >   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> > >   {"return": {}}
> > > 
> > > 5. On source, trigger initial migrate command, switch to postcopy:
> > > 
> > >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
> > >   {"return": {}}
> > >   {"execute": "query-migrate"}
> > >   {"return": {"expected-downtime": 300, "status": "active", ...}}
> > >   {"execute": "migrate-start-postcopy"}
> > >   {"return": {}}
> > >   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
> > >   {"execute": "query-migrate"}
> > >   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> > > 
> > > 6. On source, manually trigger a "fake network down" using
> > >    "migrate-cancel" command:
> > > 
> > >   {"execute": "migrate_cancel"}
> > >   {"return": {}}
> > 
> > Before I do that, I'm breaking the network connection by running on the
> > source:
> > iptables -A INPUT -p tcp --source-port 8888 -j DROP
> > iptables -A INPUT -p tcp --destination-port 8888 -j DROP
> 
> This is tricky... I think tcp keepalive may help, but for sure I
> think we do need a way to cancel the migration on both side.  Please
> see below comment.
> 
> > 
> > >   During postcopy, it'll not really cancel the migration, but pause
> > >   it.  On both sides, we should see this on stderr:
> > > 
> > >   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > > 
> > >   It means now both sides are in postcopy-pause state.
> > 
> > Now, here we start to have a problem; I do the migrate-cancel on the
> > source, that works and goes into pause; but remember the network is
> > broken, so the destination hasn't received the news.
> > 
> > > 7. (Optional) On destination side, let's try to hang the main thread
> > >    using the new x-oob-test command, providing a "lock=true" param:
> > > 
> > >    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
> > >     "arguments": { "lock": true } }
> > > 
> > >    After sending this command, we should not see any "return", because
> > >    main thread is blocked already.  But we can still use the monitor
> > >    since the monitor now has dedicated IOThread.
> > > 
> > > 8. On destination side, provide a new incoming port using the new
> > >    command "migrate-recover" (note that if step 7 is carried out, we
> > >    _must_ use OOB form, otherwise the command will hang.  With OOB,
> > >    this command will return immediately):
> > > 
> > >   {"execute": "migrate-recover", "id": "recover-cmd",
> > >    "arguments": { "uri": "tcp:localhost:5556" },
> > >    "control": { "run-oob": true } }
> > >   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
> > >    "event": "MIGRATION", "data": {"status": "setup"}}
> > >   {"return": {}, "id": "recover-cmd"}
> > > 
> > >    We can see that the command will success even if main thread is
> > >    locked up.
> > 
> > Because the destination didn't get the news of the pause, I get:
> > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}}
> 
> This is normal since we didn't fail on destination, while...
> 
> > 
> > and I can't explicitly cause a cancel on the destination:
> > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}}
> 
> ... this is not normal.  I have two questions:
> 
> 1. Have you provided
> 
>   "control": {"run-oob": true}
> 
>   field when sending command "migrate_cancel"?  Just to mention that
>   we shouldn't do it in oob way for migrate_cancel.  Or it can be a
>   monitor-oob bug.

Yes, I probably did and probably shouldn't have.

> 2. Do we need to support "migrate_cancel" on destination?
> 
> For (2), I think we need it, but for now it only works on source for
> sure.  So I think maybe I should add that support.
> 
> > 
> > So I think we need a way out of this on the destination.
> 
> So that's my 2nd question.  How about we do this: migrate_cancel will
> cancel incoming migration if:
> 
>         a. there is one incoming migration in progress, and
>         b. postcopy is enabled

Yes, I think that should work; but it should only 'cancel' in the same
way that it causes it to go to 'paused' mode.

One other problem I've hit is that it seems easy to 'upset' the OOB
monitor; for example if I do:

{"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
and repeat it:
{"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }

it gives me an error, that's OK but then if I discounnect and reconnect the monitor a few
times it's really upset;  I've had it:
  a) Disconnect immediately when the telnet connects
  b) I've also had it not respond to any commands
  c) I've also seen a hang at system_powerdown where:

    the main thread is in:
        #0  0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92
        #1  0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547
        #2  0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91
        #3  0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517
        #4  0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924

    and the monitor thread is in:
        #0  0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288)
            at ../sysdeps/unix/sysv/linux/accept4.c:37
        #1  0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430)
            at /home/dgilbert/peter/qemu/util/osdep.c:431
        #2  0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340
        #3  0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>)
            at /home/dgilbert/peter/qemu/chardev/char-socket.c:746
        #4  0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
        #5  0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0
        #6  0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0
        #7  0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69
        #8  0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2018-01-12 12:27     ` Dr. David Alan Gilbert
@ 2018-01-24  6:19       ` Peter Xu
  2018-01-24  9:05         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2018-01-24  6:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

On Fri, Jan 12, 2018 at 12:27:42PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Tree is pushed here for better reference and testing (online tree
> > > > includes monitor OOB series):
> > > > 
> > > >   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> > > > 
> > > > This version removed quite a few patches related to migrate-incoming,
> > > > instead I introduced a new command "migrate-recover" to trigger the
> > > > recovery channel on destination side to simplify the code.
> > > 
> > > I've got this setup on a couple of my test hosts, and I'm using
> > > iptables to try breaking the connection.
> > > 
> > > See below for where I got stuck.
> > > 
> > > > To test this two series altogether, please checkout above tree and
> > > > build.  Note: to test on small and single host, one need to disable
> > > > full bandwidth postcopy migration otherwise it'll complete very fast.
> > > > Basically a simple patch like this would help:
> > > > 
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index 4de3b551fe..c0206023d7 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > > >       * will notice we're in POSTCOPY_ACTIVE and not actually
> > > >       * wrap their state up here
> > > >       */
> > > > -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > > > +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > > >      if (migrate_postcopy_ram()) {
> > > >          /* Ping just for debugging, helps line traces up */
> > > >          qemu_savevm_send_ping(ms->to_dst_file, 2);
> > > > 
> > > > This patch is included already in above github tree.  Please feel free
> > > > to drop this patch when want to test on big machines and between real
> > > > hosts.
> > > > 
> > > > Detailed Test Procedures (QMP only)
> > > > ===================================
> > > > 
> > > > 1. start source QEMU.
> > > > 
> > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > > >      -smp 4 -m 1G -qmp stdio \
> > > >      -name peter-vm,debug-threads=on \
> > > >      -netdev user,id=net0 \
> > > >      -device e1000,netdev=net0 \
> > > >      -global migration.x-max-bandwidth=4096 \
> > > >      -global migration.x-postcopy-ram=on \
> > > >      /images/fedora-25.qcow2
> > > >
> > > > 2. start destination QEMU.
> > > > 
> > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > > >      -smp 4 -m 1G -qmp stdio \
> > > >      -name peter-vm,debug-threads=on \
> > > >      -netdev user,id=net0 \
> > > >      -device e1000,netdev=net0 \
> > > >      -global migration.x-max-bandwidth=4096 \
> > > >      -global migration.x-postcopy-ram=on \
> > > >      -incoming tcp:0.0.0.0:5555 \
> > > >      /images/fedora-25.qcow2
> > > 
> > > I'm using:
> > > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh
> > > and I've got both the HMP on the stdio, and the QMP via a telnet
> > > 
> > > > 
> > > > 3. On source, do QMP handshake as normal:
> > > > 
> > > >   {"execute": "qmp_capabilities"}
> > > >   {"return": {}}
> > > > 
> > > > 4. On destination, do QMP handshake to enable OOB:
> > > > 
> > > >   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> > > >   {"return": {}}
> > > > 
> > > > 5. On source, trigger initial migrate command, switch to postcopy:
> > > > 
> > > >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
> > > >   {"return": {}}
> > > >   {"execute": "query-migrate"}
> > > >   {"return": {"expected-downtime": 300, "status": "active", ...}}
> > > >   {"execute": "migrate-start-postcopy"}
> > > >   {"return": {}}
> > > >   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
> > > >   {"execute": "query-migrate"}
> > > >   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> > > > 
> > > > 6. On source, manually trigger a "fake network down" using
> > > >    "migrate-cancel" command:
> > > > 
> > > >   {"execute": "migrate_cancel"}
> > > >   {"return": {}}
> > > 
> > > Before I do that, I'm breaking the network connection by running on the
> > > source:
> > > iptables -A INPUT -p tcp --source-port 8888 -j DROP
> > > iptables -A INPUT -p tcp --destination-port 8888 -j DROP
> > 
> > This is tricky... I think tcp keepalive may help, but for sure I
> > think we do need a way to cancel the migration on both side.  Please
> > see below comment.
> > 
> > > 
> > > >   During postcopy, it'll not really cancel the migration, but pause
> > > >   it.  On both sides, we should see this on stderr:
> > > > 
> > > >   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > > > 
> > > >   It means now both sides are in postcopy-pause state.
> > > 
> > > Now, here we start to have a problem; I do the migrate-cancel on the
> > > source, that works and goes into pause; but remember the network is
> > > broken, so the destination hasn't received the news.
> > > 
> > > > 7. (Optional) On destination side, let's try to hang the main thread
> > > >    using the new x-oob-test command, providing a "lock=true" param:
> > > > 
> > > >    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
> > > >     "arguments": { "lock": true } }
> > > > 
> > > >    After sending this command, we should not see any "return", because
> > > >    main thread is blocked already.  But we can still use the monitor
> > > >    since the monitor now has dedicated IOThread.
> > > > 
> > > > 8. On destination side, provide a new incoming port using the new
> > > >    command "migrate-recover" (note that if step 7 is carried out, we
> > > >    _must_ use OOB form, otherwise the command will hang.  With OOB,
> > > >    this command will return immediately):
> > > > 
> > > >   {"execute": "migrate-recover", "id": "recover-cmd",
> > > >    "arguments": { "uri": "tcp:localhost:5556" },
> > > >    "control": { "run-oob": true } }
> > > >   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
> > > >    "event": "MIGRATION", "data": {"status": "setup"}}
> > > >   {"return": {}, "id": "recover-cmd"}
> > > > 
> > > >    We can see that the command will success even if main thread is
> > > >    locked up.
> > > 
> > > Because the destination didn't get the news of the pause, I get:
> > > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}}
> > 
> > This is normal since we didn't fail on destination, while...
> > 
> > > 
> > > and I can't explicitly cause a cancel on the destination:
> > > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}}
> > 
> > ... this is not normal.  I have two questions:
> > 
> > 1. Have you provided
> > 
> >   "control": {"run-oob": true}
> > 
> >   field when sending command "migrate_cancel"?  Just to mention that
> >   we shouldn't do it in oob way for migrate_cancel.  Or it can be a
> >   monitor-oob bug.
> 
> Yes, I probably did and probably shouldn't have.
> 
> > 2. Do we need to support "migrate_cancel" on destination?
> > 
> > For (2), I think we need it, but for now it only works on source for
> > sure.  So I think maybe I should add that support.
> > 
> > > 
> > > So I think we need a way out of this on the destination.
> > 
> > So that's my 2nd question.  How about we do this: migrate_cancel will
> > cancel incoming migration if:
> > 
> >         a. there is one incoming migration in progress, and
> >         b. postcopy is enabled
> 
> Yes, I think that should work; but it should only 'cancel' in the same
> way that it causes it to go to 'paused' mode.

Yes.

> 
> One other problem I've hit is that it seems easy to 'upset' the OOB
> monitor; for example if I do:
> 
> {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> and repeat it:
> {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> 
> it gives me an error,

Is the error like this?

{"id": 1, "error": {"class": "CommandNotFound",
 "desc": "Capabilities negotiation is already complete, command ignored"}}

I think an error is by design?  Say, we only allow the QMP negociation
to happen once for a session IMHO.

> that's OK but then if I discounnect and reconnect the monitor a few
> times it's really upset;  I've had it:
>   a) Disconnect immediately when the telnet connects
>   b) I've also had it not respond to any commands
>   c) I've also seen a hang at system_powerdown where:
> 
>     the main thread is in:
>         #0  0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92
>         #1  0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547
>         #2  0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91
>         #3  0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517
>         #4  0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924
> 
>     and the monitor thread is in:
>         #0  0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288)
>             at ../sysdeps/unix/sysv/linux/accept4.c:37
>         #1  0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430)
>             at /home/dgilbert/peter/qemu/util/osdep.c:431
>         #2  0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340
>         #3  0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>)
>             at /home/dgilbert/peter/qemu/chardev/char-socket.c:746
>         #4  0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
>         #5  0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0
>         #6  0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0
>         #7  0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69
>         #8  0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308

Hmm, this seems to be another more general problem on how we do
accept().  It seems that we are doing accept() synchronouslyly now
even in a GMainLoop, assuming that we will always return fast enough
since we have been notified of a read event of the listening port.
But that can be untrue if the client disconnects very quickly, I
guess.

I think doing async accept() might help?  Maybe Dan would know
better.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy
  2017-12-19 10:58   ` Dr. David Alan Gilbert
@ 2018-01-24  8:28     ` Peter Xu
  2018-01-24  9:06       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Xu @ 2018-01-24  8:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > It was allowed in the past to even cancel a postcopy migration, but it
> > does not really make sense, and no one should be using it, since
> > cancelling a migration during postcopy means crashing the VM at no time.
> > 
> > Let's just use re-use this command as a way to pause the postcopy
> > migration when we detected that we are in postcopy migration.  This can
> > be used when we want to trigger the postcopy recovery manually.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Yes, OK, this is a little odd without having any flags or anything, but
> it's essentially the saem reason that cancel exists - to stop a
> migration we know that's broken for some reason.
> 
> (I could argue whether this should be special cased in migrate_fd_cancel
> instead, but that's just a preference).
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Firstly, thanks for the r-b.

Now after knowing your iptable test, I think we reached a consensus
that we need to provide a command (for example, reuse migrate-cancel)
to allow destination to shutdown its incoming migration port too.  At
the same time, if we want to make sure the command can always work on
destination, we'd better also let that command to be OOB-capable.

However I'm not sure whether I can let migrate-cancel be OOB-capable
since recently we added bdrv_invalidate_cache_all() into
migrate_fd_cancel().  That invalidation needs some mutex which might
block.  I don't know whether it means migrate-cancel will no longer be
suitable as an OOB command now.

So, maybe now I can do this: firstly, drop this patch; then add a new
command to do the shutdown explicitly (allow either src/dst to
shutdown its migration fd) and keep migrate-cancel untouched. In that
case, I can make sure the new command will be OOB-compatible.

What do you think?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery
  2018-01-24  6:19       ` Peter Xu
@ 2018-01-24  9:05         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-24  9:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Jan 12, 2018 at 12:27:42PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Jan 11, 2018 at 04:59:32PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > > > > Tree is pushed here for better reference and testing (online tree
> > > > > includes monitor OOB series):
> > > > > 
> > > > >   https://github.com/xzpeter/qemu/tree/postcopy-recover-all
> > > > > 
> > > > > This version removed quite a few patches related to migrate-incoming,
> > > > > instead I introduced a new command "migrate-recover" to trigger the
> > > > > recovery channel on destination side to simplify the code.
> > > > 
> > > > I've got this setup on a couple of my test hosts, and I'm using
> > > > iptables to try breaking the connection.
> > > > 
> > > > See below for where I got stuck.
> > > > 
> > > > > To test this two series altogether, please checkout above tree and
> > > > > build.  Note: to test on small and single host, one need to disable
> > > > > full bandwidth postcopy migration otherwise it'll complete very fast.
> > > > > Basically a simple patch like this would help:
> > > > > 
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index 4de3b551fe..c0206023d7 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -1904,7 +1904,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > > > >       * will notice we're in POSTCOPY_ACTIVE and not actually
> > > > >       * wrap their state up here
> > > > >       */
> > > > > -    qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > > > > +    // qemu_file_set_rate_limit(ms->to_dst_file, INT64_MAX);
> > > > >      if (migrate_postcopy_ram()) {
> > > > >          /* Ping just for debugging, helps line traces up */
> > > > >          qemu_savevm_send_ping(ms->to_dst_file, 2);
> > > > > 
> > > > > This patch is included already in above github tree.  Please feel free
> > > > > to drop this patch when want to test on big machines and between real
> > > > > hosts.
> > > > > 
> > > > > Detailed Test Procedures (QMP only)
> > > > > ===================================
> > > > > 
> > > > > 1. start source QEMU.
> > > > > 
> > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > > > >      -smp 4 -m 1G -qmp stdio \
> > > > >      -name peter-vm,debug-threads=on \
> > > > >      -netdev user,id=net0 \
> > > > >      -device e1000,netdev=net0 \
> > > > >      -global migration.x-max-bandwidth=4096 \
> > > > >      -global migration.x-postcopy-ram=on \
> > > > >      /images/fedora-25.qcow2
> > > > >
> > > > > 2. start destination QEMU.
> > > > > 
> > > > > $qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
> > > > >      -smp 4 -m 1G -qmp stdio \
> > > > >      -name peter-vm,debug-threads=on \
> > > > >      -netdev user,id=net0 \
> > > > >      -device e1000,netdev=net0 \
> > > > >      -global migration.x-max-bandwidth=4096 \
> > > > >      -global migration.x-postcopy-ram=on \
> > > > >      -incoming tcp:0.0.0.0:5555 \
> > > > >      /images/fedora-25.qcow2
> > > > 
> > > > I'm using:
> > > > ./x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 4 -m 16G -drive file=/home/vms/rhel71.qcow2,id=d,cache=none,if=none -device virtio-blk,drive=d -vnc 0:0 -incoming tcp:0:8888 -chardev socket,port=4000,host=0,id=mon,server,nowait,telnet -mon chardev=mon,id=mon,mode=control -nographic -chardev stdio,mux=on,id=monh -mon chardev=monh,mode=readline --device isa-serial,chardev=monh
> > > > and I've got both the HMP on the stdio, and the QMP via a telnet
> > > > 
> > > > > 
> > > > > 3. On source, do QMP handshake as normal:
> > > > > 
> > > > >   {"execute": "qmp_capabilities"}
> > > > >   {"return": {}}
> > > > > 
> > > > > 4. On destination, do QMP handshake to enable OOB:
> > > > > 
> > > > >   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> > > > >   {"return": {}}
> > > > > 
> > > > > 5. On source, trigger initial migrate command, switch to postcopy:
> > > > > 
> > > > >   {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
> > > > >   {"return": {}}
> > > > >   {"execute": "query-migrate"}
> > > > >   {"return": {"expected-downtime": 300, "status": "active", ...}}
> > > > >   {"execute": "migrate-start-postcopy"}
> > > > >   {"return": {}}
> > > > >   {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
> > > > >   {"execute": "query-migrate"}
> > > > >   {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}
> > > > > 
> > > > > 6. On source, manually trigger a "fake network down" using
> > > > >    "migrate-cancel" command:
> > > > > 
> > > > >   {"execute": "migrate_cancel"}
> > > > >   {"return": {}}
> > > > 
> > > > Before I do that, I'm breaking the network connection by running on the
> > > > source:
> > > > iptables -A INPUT -p tcp --source-port 8888 -j DROP
> > > > iptables -A INPUT -p tcp --destination-port 8888 -j DROP
> > > 
> > > This is tricky... I think tcp keepalive may help, but for sure I
> > > think we do need a way to cancel the migration on both side.  Please
> > > see below comment.
> > > 
> > > > 
> > > > >   During postcopy, it'll not really cancel the migration, but pause
> > > > >   it.  On both sides, we should see this on stderr:
> > > > > 
> > > > >   qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
> > > > > 
> > > > >   It means now both sides are in postcopy-pause state.
> > > > 
> > > > Now, here we start to have a problem; I do the migrate-cancel on the
> > > > source, that works and goes into pause; but remember the network is
> > > > broken, so the destination hasn't received the news.
> > > > 
> > > > > 7. (Optional) On destination side, let's try to hang the main thread
> > > > >    using the new x-oob-test command, providing a "lock=true" param:
> > > > > 
> > > > >    {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
> > > > >     "arguments": { "lock": true } }
> > > > > 
> > > > >    After sending this command, we should not see any "return", because
> > > > >    main thread is blocked already.  But we can still use the monitor
> > > > >    since the monitor now has dedicated IOThread.
> > > > > 
> > > > > 8. On destination side, provide a new incoming port using the new
> > > > >    command "migrate-recover" (note that if step 7 is carried out, we
> > > > >    _must_ use OOB form, otherwise the command will hang.  With OOB,
> > > > >    this command will return immediately):
> > > > > 
> > > > >   {"execute": "migrate-recover", "id": "recover-cmd",
> > > > >    "arguments": { "uri": "tcp:localhost:5556" },
> > > > >    "control": { "run-oob": true } }
> > > > >   {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
> > > > >    "event": "MIGRATION", "data": {"status": "setup"}}
> > > > >   {"return": {}, "id": "recover-cmd"}
> > > > > 
> > > > >    We can see that the command will success even if main thread is
> > > > >    locked up.
> > > > 
> > > > Because the destination didn't get the news of the pause, I get:
> > > > {"id": "recover-cmd", "error": {"class": "GenericError", "desc": "Migrate recover can only be run when postcopy is paused."}}
> > > 
> > > This is normal since we didn't fail on destination, while...
> > > 
> > > > 
> > > > and I can't explicitly cause a cancel on the destination:
> > > > {"id": "cancel-cmd", "error": {"class": "GenericError", "desc": "The command migrate_cancel does not support OOB"}}
> > > 
> > > ... this is not normal.  I have two questions:
> > > 
> > > 1. Have you provided
> > > 
> > >   "control": {"run-oob": true}
> > > 
> > >   field when sending command "migrate_cancel"?  Just to mention that
> > >   we shouldn't do it in oob way for migrate_cancel.  Or it can be a
> > >   monitor-oob bug.
> > 
> > Yes, I probably did and probably shouldn't have.
> > 
> > > 2. Do we need to support "migrate_cancel" on destination?
> > > 
> > > For (2), I think we need it, but for now it only works on source for
> > > sure.  So I think maybe I should add that support.
> > > 
> > > > 
> > > > So I think we need a way out of this on the destination.
> > > 
> > > So that's my 2nd question.  How about we do this: migrate_cancel will
> > > cancel incoming migration if:
> > > 
> > >         a. there is one incoming migration in progress, and
> > >         b. postcopy is enabled
> > 
> > Yes, I think that should work; but it should only 'cancel' in the same
> > way that it causes it to go to 'paused' mode.
> 
> Yes.
> 
> > 
> > One other problem I've hit is that it seems easy to 'upset' the OOB
> > monitor; for example if I do:
> > 
> > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> > and repeat it:
> > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> > 
> > it gives me an error,
> 
> Is the error like this?
> 
> {"id": 1, "error": {"class": "CommandNotFound",
>  "desc": "Capabilities negotiation is already complete, command ignored"}}
> 
> I think an error is by design?  Say, we only allow the QMP negociation
> to happen once for a session IMHO.

I can't remember which error it was, but yes I'm OK with that error
happening, it's what happens next that's the problem (teh a,b,c below)

Dave

> > that's OK but then if I discounnect and reconnect the monitor a few
> > times it's really upset;  I've had it:
> >   a) Disconnect immediately when the telnet connects
> >   b) I've also had it not respond to any commands
> >   c) I've also seen a hang at system_powerdown where:
> > 
> >     the main thread is in:
> >         #0  0x00007f37aa4d3ef7 in pthread_join (threadid=139876803868416, thread_return=thread_return@entry=0x7ffc174367b0) at pthread_join.c:92
> >         #1  0x000055644e5c1f5f in qemu_thread_join (thread=<optimized out>) at /home/dgilbert/peter/qemu/util/qemu-thread-posix.c:547
> >         #2  0x000055644e30c688 in iothread_stop (iothread=<optimized out>) at /home/dgilbert/peter/qemu/iothread.c:91
> >         #3  0x000055644e21f122 in monitor_cleanup () at /home/dgilbert/peter/qemu/monitor.c:4517
> >         #4  0x000055644e1e1925 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dgilbert/peter/qemu/vl.c:4924
> > 
> >     and the monitor thread is in:
> >         #0  0x00007fdd93de871f in accept4 (fd=fd@entry=10, addr=..., addr@entry=..., addr_len=addr_len@entry=0x7fdd80004430, flags=flags@entry=524288)
> >             at ../sysdeps/unix/sysv/linux/accept4.c:37
> >         #1  0x000055645f42d9ec in qemu_accept (s=10, addr=addr@entry=0x7fdd800043b0, addrlen=addrlen@entry=0x7fdd80004430)
> >             at /home/dgilbert/peter/qemu/util/osdep.c:431
> >         #2  0x000055645f3ea7a1 in qio_channel_socket_accept (ioc=0x556460610f10, errp=errp@entry=0x0) at /home/dgilbert/peter/qemu/io/channel-socket.c:340
> >         #3  0x000055645f3db6aa in tcp_chr_accept (channel=0x556460610f10, cond=<optimized out>, opaque=<optimized out>)
> >             at /home/dgilbert/peter/qemu/chardev/char-socket.c:746
> >         #4  0x00007fdd94b2479a in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> >         #5  0x00007fdd94b24ae8 in g_main_context_iterate.isra.24 () at /lib64/libglib-2.0.so.0
> >         #6  0x00007fdd94b24dba in g_main_loop_run () at /lib64/libglib-2.0.so.0
> >         #7  0x000055645f17f516 in iothread_run (opaque=0x55646063e8c0) at /home/dgilbert/peter/qemu/iothread.c:69
> >         #8  0x00007fdd9c0fcdc5 in start_thread (arg=0x7fdd8cf79700) at pthread_create.c:308
> 
> Hmm, this seems to be another more general problem on how we do
> accept().  It seems that we are doing accept() synchronouslyly now
> even in a GMainLoop, assuming that we will always return fast enough
> since we have been notified of a read event of the listening port.
> But that can be untrue if the client disconnects very quickly, I
> guess.


> I think doing async accept() might help?  Maybe Dan would know
> better.
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy
  2018-01-24  8:28     ` Peter Xu
@ 2018-01-24  9:06       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-24  9:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Andrea Arcangeli, Daniel P . Berrange, Juan Quintela,
	Alexey Perevalov

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Dec 19, 2017 at 10:58:59AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > It was allowed in the past to even cancel a postcopy migration, but it
> > > does not really make sense, and no one should be using it, since
> > > cancelling a migration during postcopy means crashing the VM at no time.
> > > 
> > > Let's just use re-use this command as a way to pause the postcopy
> > > migration when we detected that we are in postcopy migration.  This can
> > > be used when we want to trigger the postcopy recovery manually.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Yes, OK, this is a little odd without having any flags or anything, but
> > it's essentially the saem reason that cancel exists - to stop a
> > migration we know that's broken for some reason.
> > 
> > (I could argue whether this should be special cased in migrate_fd_cancel
> > instead, but that's just a preference).
> > 
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Firstly, thanks for the r-b.
> 
> Now after knowing your iptable test, I think we reached a consensus
> that we need to provide a command (for example, reuse migrate-cancel)
> to allow destination to shutdown its incoming migration port too.  At
> the same time, if we want to make sure the command can always work on
> destination, we'd better also let that command to be OOB-capable.
> 
> However I'm not sure whether I can let migrate-cancel be OOB-capable
> since recently we added bdrv_invalidate_cache_all() into
> migrate_fd_cancel().  That invalidation needs some mutex which might
> block.  I don't know whether it means migrate-cancel will no longer be
> suitable as an OOB command now.
> 
> So, maybe now I can do this: firstly, drop this patch; then add a new
> command to do the shutdown explicitly (allow either src/dst to
> shutdown its migration fd) and keep migrate-cancel untouched. In that
> case, I can make sure the new command will be OOB-compatible.
> 
> What do you think?

Yes I'm fine with that; misusing another command did feel a bit odd.

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-01-24  9:06 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-05  6:52 [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 01/28] migration: better error handling with QEMUFile Peter Xu
2017-12-05 11:40   ` Dr. David Alan Gilbert
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 02/28] migration: reuse mis->userfault_quit_fd Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 03/28] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 04/28] migration: new postcopy-pause state Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 05/28] migration: implement "postcopy-pause" src logic Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 06/28] migration: allow dst vm pause on postcopy Peter Xu
2017-12-14 13:10   ` Dr. David Alan Gilbert
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 07/28] migration: allow src return path to pause Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 08/28] migration: allow send_rq to fail Peter Xu
2017-12-14 13:21   ` Dr. David Alan Gilbert
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 09/28] migration: allow fault thread to pause Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 10/28] qmp: hmp: add migrate "resume" option Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 11/28] migration: pass MigrationState to migrate_init() Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 12/28] migration: rebuild channel on source Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 13/28] migration: new state "postcopy-recover" Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 14/28] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 15/28] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 16/28] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 17/28] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 18/28] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 19/28] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-12-05  6:52 ` [Qemu-devel] [PATCH v5 20/28] migration: synchronize dirty bitmap for resume Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 21/28] migration: setup ramstate " Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 22/28] migration: final handshake for the resume Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 23/28] migration: free SocketAddress where allocated Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 24/28] migration: init dst in migration_object_init too Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 25/28] io: let watcher of the channel run in same ctx Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 26/28] migration: allow migrate_cancel to pause postcopy Peter Xu
2017-12-19 10:58   ` Dr. David Alan Gilbert
2018-01-24  8:28     ` Peter Xu
2018-01-24  9:06       ` Dr. David Alan Gilbert
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 27/28] qmp/migration: new command migrate-recover Peter Xu
2017-12-05  6:53 ` [Qemu-devel] [PATCH v5 28/28] hmp/migration: add migrate_recover command Peter Xu
2017-12-05  6:55 ` [Qemu-devel] [PATCH v5 00/28] Migration: postcopy failure recovery Peter Xu
2017-12-05 18:43 ` Dr. David Alan Gilbert
2017-12-06  2:39   ` Peter Xu
2018-01-11 16:59 ` Dr. David Alan Gilbert
2018-01-12  9:27   ` Peter Xu
2018-01-12 12:27     ` Dr. David Alan Gilbert
2018-01-24  6:19       ` Peter Xu
2018-01-24  9:05         ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.