All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery
@ 2018-05-02 10:47 Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context Peter Xu
                   ` (23 more replies)
  0 siblings, 24 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Tree is pushed here for better reference and testing:

  https://github.com/xzpeter/qemu/tree/postcopy-recovery-support

Note that now OOB is still off by default; we need this extra line
applied to the old test scripts to allow OOB to work (instead of "-qmp
stdio"):

  -chardev stdio,id=char0 -mon chardev=char0,mode=control,x-oob=on

After Dave's postcopy shared memory work, we'll need extra work to
allow the postcopy recovery series to work with shared memories (e.g.,
DPDK).  That will be a TODO item as a follow-up work of this series.

Please review.  Thanks.

v8:
- rebase to master
- fix trace_ram_state_resume_prepare() to take uint64_t [Dave]
- add a patch to introduce mgmt_lock, then take it in migrate-pause
  command to protect the QEMUFile [Dave]

Detailed Test Procedures (QMP only)
===================================

1. start source QEMU.

$qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
     -smp 4 -m 1G \
     -chardev stdio,id=char0 -mon chardev=char0,mode=control,x-oob=on \
     -name peter-vm,debug-threads=on \
     -netdev user,id=net0 \
     -device e1000,netdev=net0 \
     -global migration.x-max-bandwidth=4096 \
     -global migration.x-postcopy-ram=on \
     /images/fedora-25.qcow2

2. start destination QEMU.

$qemu -M q35,kernel-irqchip=split -enable-kvm -snapshot \
     -smp 4 -m 1G \
     -chardev stdio,id=char0 -mon chardev=char0,mode=control,x-oob=on \
     -name peter-vm,debug-threads=on \
     -netdev user,id=net0 \
     -device e1000,netdev=net0 \
     -global migration.x-max-bandwidth=4096 \
     -global migration.x-postcopy-ram=on \
     -incoming tcp:0.0.0.0:5555 \
     /images/fedora-25.qcow2

3. On source, do QMP handshake as normal:

  {"execute": "qmp_capabilities"}
  {"return": {}}

4. On destination, do QMP handshake to enable OOB:

  {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
  {"return": {}}

5. On source, trigger initial migrate command, switch to postcopy:

  {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5555" } }
  {"return": {}}
  {"execute": "query-migrate"}
  {"return": {"expected-downtime": 300, "status": "active", ...}}
  {"execute": "migrate-start-postcopy"}
  {"return": {}}
  {"timestamp": {"seconds": 1512454728, "microseconds": 768096}, "event": "STOP"}
  {"execute": "query-migrate"}
  {"return": {"expected-downtime": 44472, "status": "postcopy-active", ...}}

6. On source, manually trigger a "fake network down" using
   "migrate-cancel" command:

  {"execute": "migrate_cancel"}
  {"return": {}}

  During postcopy, it'll not really cancel the migration, but pause
  it.  On both sides, we should see this on stderr:

  qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.

  It means now both sides are in postcopy-pause state.

7. (Optional) On destination side, let's try to hang the main thread
   using the new x-oob-test command, providing a "lock=true" param:

   {"execute": "x-oob-test", "id": "lock-dispatcher-cmd",
    "arguments": { "lock": true } }

   After sending this command, we should not see any "return", because
   main thread is blocked already.  But we can still use the monitor
   since the monitor now has dedicated IOThread.

8. On destination side, provide a new incoming port using the new
   command "migrate-recover" (note that if step 7 is carried out, we
   _must_ use OOB form, otherwise the command will hang.  With OOB,
   this command will return immediately):

  {"execute": "migrate-recover", "id": "recover-cmd",
   "arguments": { "uri": "tcp:localhost:5556" },
   "control": { "run-oob": true } }
  {"timestamp": {"seconds": 1512454976, "microseconds": 186053},
   "event": "MIGRATION", "data": {"status": "setup"}}
  {"return": {}, "id": "recover-cmd"}

   We can see that the command will success even if main thread is
   locked up.

9. (Optional) This step is only needed if step 7 is carried out. On
   destination, let's unlock the main thread before resuming the
   migration, this time with "lock=false" to unlock the main thread
   (since system running needs the main thread). Note that we _must_
   use OOB command here too:

  {"execute": "x-oob-test", "id": "unlock-dispatcher",
   "arguments": { "lock": false }, "control": { "run-oob": true } }
  {"return": {}, "id": "unlock-dispatcher"}
  {"return": {}, "id": "lock-dispatcher-cmd"}

  Here the first "return" is the reply to the unlock command, the
  second "return" is the reply to the lock command.  After this
  command, main thread is released.

10. On source, resume the postcopy migration:

  {"execute": "migrate", "arguments": { "uri": "tcp:localhost:5556", "resume": true }}
  {"return": {}}
  {"execute": "query-migrate"}
  {"return": {"status": "completed", ...}}

==================

As we all know that postcopy migration has a potential risk to lost
the VM if the network is broken during the migration. This series
tries to solve the problem by allowing the migration to pause at the
failure point, and do recovery after the link is reconnected.

There was existing work on this issue from Md Haris Iqbal:

https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html

This series is a totally re-work of the issue, based on Alexey
Perevalov's recved bitmap v8 series:

https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html

Two new status are added to support the migration (used on both
sides):

  MIGRATION_STATUS_POSTCOPY_PAUSED
  MIGRATION_STATUS_POSTCOPY_RECOVER

The MIGRATION_STATUS_POSTCOPY_PAUSED state will be set when the
network failure is detected. It is a phase that we'll be in for a long
time as long as the failure is detected, and we'll be there until a
recovery is triggered.  In this state, all the threads (on source:
send thread, return-path thread; destination: ram-load thread,
page-fault thread) will be halted.

The MIGRATION_STATUS_POSTCOPY_RECOVER state is short. If we triggered
a recovery, both source/destination VM will jump into this stage, do
whatever it needs to prepare the recovery (e.g., currently the most
important thing is to synchronize the dirty bitmap, please see commit
messages for more information). After the preparation is ready, the
source will do the final handshake with destination, then both sides
will switch back to MIGRATION_STATUS_POSTCOPY_ACTIVE again.

New commands/messages are defined as well to satisfy the need:

MIG_CMD_RECV_BITMAP & MIG_RP_MSG_RECV_BITMAP are introduced for
delivering received bitmaps

MIG_CMD_RESUME & MIG_RP_MSG_RESUME_ACK are introduced to do the final
handshake of postcopy recovery.

Here's some more details on how the whole failure/recovery routine is
happened:

- start migration
- ... (switch from precopy to postcopy)
- both sides are in "postcopy-active" state
- ... (failure happened, e.g., network unplugged)
- both sides switch to "postcopy-paused" state
  - all the migration threads are stopped on both sides
- ... (both VMs hanged)
- ... (user triggers recovery using "migrate -r -d tcp:HOST:PORT" on
  source side, "-r" means "recover")
- both sides switch to "postcopy-recover" state
  - on source: send-thread, return-path-thread will be waked up
  - on dest: ram-load-thread waked up, fault-thread still paused
- source calls new savevmhandler hook resume_prepare() (currently,
  only ram is providing the hook):
  - ram_resume_prepare(): for each ramblock, fetch recved bitmap by:
    - src sends MIG_CMD_RECV_BITMAP to dst
    - dst replies MIG_RP_MSG_RECV_BITMAP to src, with bitmap data
      - src uses the recved bitmap to rebuild dirty bitmap
- source do final handshake with destination
  - src sends MIG_CMD_RESUME to dst, telling "src is ready"
    - when dst receives the command, fault thread will be waked up,
      meanwhile, dst switch back to "postcopy-active"
  - dst sends MIG_RP_MSG_RESUME_ACK to src, telling "dst is ready"
    - when src receives the ack, state switch to "postcopy-active"
- postcopy migration continued

Testing:

As I said, it's still an extremely simple test. I used socat to create
a socket bridge:

  socat tcp-listen:6666 tcp-connect:localhost:5555 &

Then do the migration via the bridge. I emulated the network failure
by killing the socat process (bridge down), then tries to recover the
migration using the other channel (default dst channel). It looks
like:

        port:6666    +------------------+
        +----------> | socat bridge [1] |-------+
        |            +------------------+       |
        |         (Original channel)            |
        |                                       | port: 5555
     +---------+  (Recovery channel)            +--->+---------+
     | src VM  |------------------------------------>| dst VM  |
     +---------+                                     +---------+

Known issues/notes:

- currently destination listening port still cannot change. E.g., the
  recovery should be using the same port on destination for
  simplicity. (on source, we can specify new URL)

- the patch: "migration: let dst listen on port always" is still
  hacky, it just kept the incoming accept open forever for now...

- some migration numbers might still be inaccurate, like total
  migration time, etc. (But I don't really think that matters much
  now)

- the patches are very lightly tested.

- Dave reported one problem that may hang destination main loop thread
  (one vcpu thread holds the BQL) and the rest. I haven't encountered
  it yet, but it does not mean this series can survive with it.

- other potential issues that I may have forgotten or unnoticed...

Anyway, the work is still in preliminary stage. Any suggestions and
comments are greatly welcomed.  Thanks.

Peter Xu (24):
  migration: let incoming side use thread context
  migration: new postcopy-pause state
  migration: implement "postcopy-pause" src logic
  migration: allow dst vm pause on postcopy
  migration: allow src return path to pause
  migration: allow fault thread to pause
  qmp: hmp: add migrate "resume" option
  migration: rebuild channel on source
  migration: new state "postcopy-recover"
  migration: wakeup dst ram-load-thread for recover
  migration: new cmd MIG_CMD_RECV_BITMAP
  migration: new message MIG_RP_MSG_RECV_BITMAP
  migration: new cmd MIG_CMD_POSTCOPY_RESUME
  migration: new message MIG_RP_MSG_RESUME_ACK
  migration: introduce SaveVMHandlers.resume_prepare
  migration: synchronize dirty bitmap for resume
  migration: setup ramstate for resume
  migration: final handshake for the resume
  migration: init dst in migration_object_init too
  qmp/migration: new command migrate-recover
  hmp/migration: add migrate_recover command
  migration: introduce lock for to_dst_file
  migration/qmp: add command migrate-pause
  migration/hmp: add migrate_pause command

 qapi/migration.json          |  48 +++-
 hmp.h                        |   2 +
 include/migration/register.h |   2 +
 migration/migration.h        |  21 ++
 migration/ram.h              |   3 +
 migration/savevm.h           |   3 +
 hmp.c                        |  23 +-
 migration/channel.c          |   3 +-
 migration/exec.c             |   9 +-
 migration/fd.c               |   9 +-
 migration/migration.c        | 546 +++++++++++++++++++++++++++++++++++++++----
 migration/postcopy-ram.c     |  54 ++++-
 migration/ram.c              | 234 +++++++++++++++++++
 migration/savevm.c           | 191 ++++++++++++++-
 migration/socket.c           |   7 +-
 hmp-commands.hx              |  34 ++-
 migration/trace-events       |  21 ++
 17 files changed, 1136 insertions(+), 74 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-08 14:36   ` Juan Quintela
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state Peter Xu
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

The old incoming migration is running in main thread and default
gcontext.  With the new qio_channel_add_watch_full() we can now let it
run in the thread's own gcontext (if there is one).

Currently this patch does nothing alone.  But when any of the incoming
migration is run in another iothread (e.g., the upcoming migrate-recover
command), this patch will bind the incoming logic to the iothread
instead of the main thread (which may already get page faulted and
hanged).

RDMA is not considered for now since it's not even using the QIO watch
framework at all.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/exec.c   | 9 ++++-----
 migration/fd.c     | 9 ++++-----
 migration/socket.c | 7 ++++---
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/migration/exec.c b/migration/exec.c
index 0bc5a427dd..9d0f82f1f0 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -65,9 +65,8 @@ void exec_start_incoming_migration(const char *command, Error **errp)
     }
 
     qio_channel_set_name(ioc, "migration-exec-incoming");
-    qio_channel_add_watch(ioc,
-                          G_IO_IN,
-                          exec_accept_incoming_migration,
-                          NULL,
-                          NULL);
+    qio_channel_add_watch_full(ioc, G_IO_IN,
+                               exec_accept_incoming_migration,
+                               NULL, NULL,
+                               g_main_context_get_thread_default());
 }
diff --git a/migration/fd.c b/migration/fd.c
index cd06182d1e..9a380bbbc4 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -66,9 +66,8 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
     }
 
     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
-    qio_channel_add_watch(ioc,
-                          G_IO_IN,
-                          fd_accept_incoming_migration,
-                          NULL,
-                          NULL);
+    qio_channel_add_watch_full(ioc, G_IO_IN,
+                               fd_accept_incoming_migration,
+                               NULL, NULL,
+                               g_main_context_get_thread_default());
 }
diff --git a/migration/socket.c b/migration/socket.c
index 122d8ccfbe..8d4cf76295 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -160,9 +160,10 @@ static void socket_start_incoming_migration(SocketAddress *saddr,
         return;
     }
 
-    qio_net_listener_set_client_func(listener,
-                                     socket_accept_incoming_migration,
-                                     NULL, NULL);
+    qio_net_listener_set_client_func_full(listener,
+                                          socket_accept_incoming_migration,
+                                          NULL, NULL,
+                                          g_main_context_get_thread_default());
 }
 
 void tcp_start_incoming_migration(const char *host_port, Error **errp)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-08 15:16   ` Juan Quintela
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 03/24] migration: implement "postcopy-pause" src logic Peter Xu
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing a new state "postcopy-paused", which can be used when the
postcopy migration is paused. It is targeted for postcopy network
failure recovery.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 5 ++++-
 migration/migration.c | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index f3974c6807..97891aec89 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -89,6 +89,8 @@
 #
 # @postcopy-active: like active, but now in postcopy mode. (since 2.5)
 #
+# @postcopy-paused: during postcopy but paused. (since 2.12)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -106,7 +108,8 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'postcopy-active', 'completed', 'failed', 'colo',
+            'active', 'postcopy-active', 'postcopy-paused',
+            'completed', 'failed', 'colo',
             'pre-switchover', 'device' ] }
 
 ##
diff --git a/migration/migration.c b/migration/migration.c
index 0bdb28e144..57ff1010e5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -569,6 +569,7 @@ static bool migration_is_setup_or_active(int state)
     switch (state) {
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -649,6 +650,7 @@ static void fill_source_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 03/24] migration: implement "postcopy-pause" src logic
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 04/24] migration: allow dst vm pause on postcopy Peter Xu
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Now when network down for postcopy, the source side will not fail the
migration. Instead we convert the status into this new paused state, and
we will try to wait for a rescue in the future.

If a recovery is detected, migration_thread() will reset its local
variables to prepare for that.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |  3 ++
 migration/migration.c  | 99 +++++++++++++++++++++++++++++++++++++++++++++++---
 migration/trace-events |  1 +
 3 files changed, 97 insertions(+), 6 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 7c69598c54..9bf3418973 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -194,6 +194,9 @@ struct MigrationState
     bool send_configuration;
     /* Whether we send section footer during migration */
     bool send_section_footer;
+
+    /* Needed by postcopy-pause state */
+    QemuSemaphore postcopy_pause_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/migration.c b/migration/migration.c
index 57ff1010e5..b9dac8c3b9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2240,6 +2240,80 @@ bool migrate_colo_enabled(void)
     return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO];
 }
 
+typedef enum MigThrError {
+    /* No error detected */
+    MIG_THR_ERR_NONE = 0,
+    /* Detected error, but resumed successfully */
+    MIG_THR_ERR_RECOVERED = 1,
+    /* Detected fatal error, need to exit */
+    MIG_THR_ERR_FATAL = 2,
+} MigThrError;
+
+/*
+ * We don't return until we are in a safe state to continue current
+ * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
+ * MIG_THR_ERR_FATAL if unrecovery failure happened.
+ */
+static MigThrError postcopy_pause(MigrationState *s)
+{
+    assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    /* Current channel is possibly broken. Release it. */
+    assert(s->to_dst_file);
+    qemu_file_shutdown(s->to_dst_file);
+    qemu_fclose(s->to_dst_file);
+    s->to_dst_file = NULL;
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    /*
+     * We wait until things fixed up. Then someone will setup the
+     * status back for us.
+     */
+    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&s->postcopy_pause_sem);
+    }
+
+    trace_postcopy_pause_continued();
+
+    return MIG_THR_ERR_RECOVERED;
+}
+
+static MigThrError migration_detect_error(MigrationState *s)
+{
+    int ret;
+
+    /* Try to detect any file errors */
+    ret = qemu_file_get_error(s->to_dst_file);
+
+    if (!ret) {
+        /* Everything is fine */
+        return MIG_THR_ERR_NONE;
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+        /*
+         * For postcopy, we allow the network to be down for a
+         * while. After that, it can be continued by a
+         * recovery phase.
+         */
+        return postcopy_pause(s);
+    } else {
+        /*
+         * For precopy (or postcopy with error outside IO), we fail
+         * with no time.
+         */
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        trace_migration_thread_file_err();
+
+        /* Time to stop the migration, now. */
+        return MIG_THR_ERR_FATAL;
+    }
+}
+
 static void migration_calculate_complete(MigrationState *s)
 {
     uint64_t bytes = qemu_ftell(s->to_dst_file);
@@ -2396,6 +2470,7 @@ static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    MigThrError thr_error;
 
     rcu_register_thread();
 
@@ -2445,13 +2520,22 @@ static void *migration_thread(void *opaque)
             }
         }
 
-        if (qemu_file_get_error(s->to_dst_file)) {
-            if (migration_is_setup_or_active(s->state)) {
-                migrate_set_state(&s->state, s->state,
-                                  MIGRATION_STATUS_FAILED);
-            }
-            trace_migration_thread_file_err();
+        /*
+         * Try to detect any kind of failures, and see whether we
+         * should stop the migration now.
+         */
+        thr_error = migration_detect_error(s);
+        if (thr_error == MIG_THR_ERR_FATAL) {
+            /* Stop migration */
             break;
+        } else if (thr_error == MIG_THR_ERR_RECOVERED) {
+            /*
+             * Just recovered from a e.g. network failure, reset all
+             * the local variables. This is important to avoid
+             * breaking transferred_bytes and bandwidth calculation
+             */
+            s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+            s->iteration_initial_bytes = 0;
         }
 
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -2609,6 +2693,7 @@ static void migration_instance_finalize(Object *obj)
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
+    qemu_sem_destroy(&ms->postcopy_pause_sem);
     error_free(ms->error);
 }
 
@@ -2638,6 +2723,8 @@ static void migration_instance_init(Object *obj)
     params->has_x_multifd_channels = true;
     params->has_x_multifd_page_count = true;
     params->has_xbzrle_cache_size = true;
+
+    qemu_sem_init(&ms->postcopy_pause_sem, 0);
 }
 
 /*
diff --git a/migration/trace-events b/migration/trace-events
index d6be74b7a7..409b4b8be3 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,7 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 04/24] migration: allow dst vm pause on postcopy
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (2 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 03/24] migration: implement "postcopy-pause" src logic Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 05/24] migration: allow src return path to pause Peter Xu
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

When there is IO error on the incoming channel (e.g., network down),
instead of bailing out immediately, we allow the dst vm to switch to the
new POSTCOPY_PAUSE state. Currently it is still simple - it waits the
new semaphore, until someone poke it for another attempt.

One note is that here on ram loading thread we cannot detect the
POSTCOPY_ACTIVE state, but we need to detect the more specific
POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all
the device states.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |  3 +++
 migration/migration.c  |  1 +
 migration/savevm.c     | 63 ++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/trace-events |  2 ++
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 9bf3418973..ad96cc2c85 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -73,6 +73,9 @@ struct MigrationIncomingState {
      * live migration, to calculate vCPU block time
      * */
     struct PostcopyBlocktimeContext *blocktime_ctx;
+
+    /* notify PAUSED postcopy incoming migrations to try to continue */
+    QemuSemaphore postcopy_pause_sem_dst;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/migration.c b/migration/migration.c
index b9dac8c3b9..5ad3a79354 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -159,6 +159,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
                                                    sizeof(struct PostCopyFD));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
 
         init_dirty_bitmap_incoming_migration();
 
diff --git a/migration/savevm.c b/migration/savevm.c
index e2be02afe4..8ad99b1eaa 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1564,8 +1564,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
  */
 static void *postcopy_ram_listen_thread(void *opaque)
 {
-    QEMUFile *f = opaque;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    QEMUFile *f = mis->from_src_file;
     int load_res;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -1579,6 +1579,14 @@ static void *postcopy_ram_listen_thread(void *opaque)
      */
     qemu_file_set_blocking(f, true);
     load_res = qemu_loadvm_state_main(f, mis);
+
+    /*
+     * This is tricky, but, mis->from_src_file can change after it
+     * returns, when postcopy recovery happened. In the future, we may
+     * want a wrapper for the QEMUFile handle.
+     */
+    f = mis->from_src_file;
+
     /* And non-blocking again so we don't block in any cleanup */
     qemu_file_set_blocking(f, false);
 
@@ -1668,7 +1676,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     /* Start up the listening thread and wait for it to signal ready */
     qemu_sem_init(&mis->listen_thread_sem, 0);
     qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, mis->from_src_file,
+                       postcopy_ram_listen_thread, NULL,
                        QEMU_THREAD_DETACHED);
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
@@ -2055,11 +2063,44 @@ void qemu_loadvm_state_cleanup(void)
     }
 }
 
+/* Return true if we should continue the migration, or false. */
+static bool postcopy_pause_incoming(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_incoming();
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_PAUSED);
+
+    assert(mis->from_src_file);
+    qemu_file_shutdown(mis->from_src_file);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+
+    assert(mis->to_src_file);
+    qemu_file_shutdown(mis->to_src_file);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_fclose(mis->to_src_file);
+    mis->to_src_file = NULL;
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    error_report("Detected IO failure for postcopy. "
+                 "Migration paused.");
+
+    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        qemu_sem_wait(&mis->postcopy_pause_sem_dst);
+    }
+
+    trace_postcopy_pause_incoming_continued();
+
+    return true;
+}
+
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret = 0;
 
+retry:
     while (true) {
         section_type = qemu_get_byte(f);
 
@@ -2104,6 +2145,24 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 out:
     if (ret < 0) {
         qemu_file_set_error(f, ret);
+
+        /*
+         * Detect whether it is:
+         *
+         * 1. postcopy running (after receiving all device data, which
+         *    must be in POSTCOPY_INCOMING_RUNNING state.  Note that
+         *    POSTCOPY_INCOMING_LISTENING is still not enough, it's
+         *    still receiving device states).
+         * 2. network failure (-EIO)
+         *
+         * If so, we try to wait for a recovery.
+         */
+        if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+            ret == -EIO && postcopy_pause_incoming(mis)) {
+            /* Reset f to point to the newly created channel */
+            f = mis->from_src_file;
+            goto retry;
+        }
     }
     return ret;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 409b4b8be3..e23ec019be 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -100,6 +100,8 @@ open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_continued(void) ""
+postcopy_pause_incoming(void) ""
+postcopy_pause_incoming_continued(void) ""
 postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 05/24] migration: allow src return path to pause
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (3 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 04/24] migration: allow dst vm pause on postcopy Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 06/24] migration: allow fault thread " Peter Xu
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Let the thread pause for network issues.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |  1 +
 migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--
 migration/trace-events |  2 ++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index ad96cc2c85..db28a66a92 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -200,6 +200,7 @@ struct MigrationState
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
+    QemuSemaphore postcopy_pause_rp_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/migration.c b/migration/migration.c
index 5ad3a79354..09dbf5350c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1774,6 +1774,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
+/* Return true to retry, false to quit */
+static bool postcopy_pause_return_path_thread(MigrationState *s)
+{
+    trace_postcopy_pause_return_path();
+
+    qemu_sem_wait(&s->postcopy_pause_rp_sem);
+
+    trace_postcopy_pause_return_path_continued();
+
+    return true;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1790,6 +1802,8 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+
+retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1881,13 +1895,28 @@ static void *source_return_path_thread(void *opaque)
             break;
         }
     }
-    if (qemu_file_get_error(rp)) {
+
+out:
+    res = qemu_file_get_error(rp);
+    if (res) {
+        if (res == -EIO) {
+            /*
+             * Maybe there is something we can do: it looks like a
+             * network down issue, and we pause for a recovery.
+             */
+            if (postcopy_pause_return_path_thread(ms)) {
+                /* Reload rp, reset the rest */
+                rp = ms->rp_state.from_dst_file;
+                ms->rp_state.error = false;
+                goto retry;
+            }
+        }
+
         trace_source_return_path_thread_bad_end();
         mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
-out:
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
     return NULL;
@@ -2695,6 +2724,7 @@ static void migration_instance_finalize(Object *obj)
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
+    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
     error_free(ms->error);
 }
 
@@ -2726,6 +2756,7 @@ static void migration_instance_init(Object *obj)
     params->has_xbzrle_cache_size = true;
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
+    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
 }
 
 /*
diff --git a/migration/trace-events b/migration/trace-events
index e23ec019be..cd971bf9fe 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -99,6 +99,8 @@ migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
+postcopy_pause_return_path(void) ""
+postcopy_pause_return_path_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 06/24] migration: allow fault thread to pause
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (4 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 05/24] migration: allow src return path to pause Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 07/24] qmp: hmp: add migrate "resume" option Peter Xu
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Allows the fault thread to stop handling page faults temporarily. When
network failure happened (and if we expect a recovery afterwards), we
should not allow the fault thread to continue sending things to source,
instead, it should halt for a while until the connection is rebuilt.

When the dest main thread noticed the failure, it kicks the fault thread
to switch to pause state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h    |  1 +
 migration/migration.c    |  1 +
 migration/postcopy-ram.c | 54 ++++++++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c       |  3 +++
 migration/trace-events   |  2 ++
 5 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index db28a66a92..83ebc18b1e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -76,6 +76,7 @@ struct MigrationIncomingState {
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
+    QemuSemaphore postcopy_pause_sem_fault;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/migration.c b/migration/migration.c
index 09dbf5350c..9d184b3f36 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -160,6 +160,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
+        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
 
         init_dirty_bitmap_incoming_migration();
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 8ceeaa2a93..658b750a8e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -830,6 +830,17 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
                                       affected_cpu);
 }
 
+static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fault_thread();
+
+    qemu_sem_wait(&mis->postcopy_pause_sem_fault);
+
+    trace_postcopy_pause_fault_thread_continued();
+
+    return true;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -880,6 +891,22 @@ static void *postcopy_ram_fault_thread(void *opaque)
             break;
         }
 
+        if (!mis->to_src_file) {
+            /*
+             * Possibly someone tells us that the return path is
+             * broken already using the event. We should hold until
+             * the channel is rebuilt.
+             */
+            if (postcopy_pause_fault_thread(mis)) {
+                mis->last_rb = NULL;
+                /* Continue to read the userfaultfd */
+            } else {
+                error_report("%s: paused but don't allow to continue",
+                             __func__);
+                break;
+            }
+        }
+
         if (pfd[1].revents) {
             uint64_t tmp64 = 0;
 
@@ -942,18 +969,37 @@ static void *postcopy_ram_fault_thread(void *opaque)
                     (uintptr_t)(msg.arg.pagefault.address),
                                 msg.arg.pagefault.feat.ptid, rb);
 
+retry:
             /*
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
             if (rb != mis->last_rb) {
                 mis->last_rb = rb;
-                migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                         rb_offset, qemu_ram_pagesize(rb));
+                ret = migrate_send_rp_req_pages(mis,
+                                                qemu_ram_get_idstr(rb),
+                                                rb_offset,
+                                                qemu_ram_pagesize(rb));
             } else {
                 /* Save some space */
-                migrate_send_rp_req_pages(mis, NULL,
-                                         rb_offset, qemu_ram_pagesize(rb));
+                ret = migrate_send_rp_req_pages(mis,
+                                                NULL,
+                                                rb_offset,
+                                                qemu_ram_pagesize(rb));
+            }
+
+            if (ret) {
+                /* May be network failure, try to wait for recovery */
+                if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                    /* We got reconnected somehow, try to continue */
+                    mis->last_rb = NULL;
+                    goto retry;
+                } else {
+                    /* This is a unavoidable fault */
+                    error_report("%s: migrate_send_rp_req_pages() get %d",
+                                 __func__, ret);
+                    break;
+                }
             }
         }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8ad99b1eaa..6ee69d8283 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2083,6 +2083,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    /* Notify the fault thread for the invalidated file handle */
+    postcopy_fault_thread_notify(mis);
+
     error_report("Detected IO failure for postcopy. "
                  "Migration paused.");
 
diff --git a/migration/trace-events b/migration/trace-events
index cd971bf9fe..7f836499d1 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -101,6 +101,8 @@ open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
 postcopy_pause_return_path(void) ""
 postcopy_pause_return_path_continued(void) ""
+postcopy_pause_fault_thread(void) ""
+postcopy_pause_fault_thread_continued(void) ""
 postcopy_pause_continued(void) ""
 postcopy_pause_incoming(void) ""
 postcopy_pause_incoming_continued(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 07/24] qmp: hmp: add migrate "resume" option
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (5 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 06/24] migration: allow fault thread " Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 08/24] migration: rebuild channel on source Peter Xu
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It will be used when we want to resume one paused migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 5 ++++-
 hmp.c                 | 4 +++-
 migration/migration.c | 2 +-
 hmp-commands.hx       | 7 ++++---
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 97891aec89..17b78c9938 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1031,6 +1031,8 @@
 # @detach: this argument exists only for compatibility reasons and
 #          is ignored by QEMU
 #
+# @resume: resume one paused migration, default "off". (since 2.12)
+#
 # Returns: nothing on success
 #
 # Since: 0.14.0
@@ -1052,7 +1054,8 @@
 #
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+           '*detach': 'bool', '*resume': 'bool' } }
 
 ##
 # @migrate-incoming:
diff --git a/hmp.c b/hmp.c
index 898e25f3e1..59d5341590 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1919,10 +1919,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
     bool detach = qdict_get_try_bool(qdict, "detach", false);
     bool blk = qdict_get_try_bool(qdict, "blk", false);
     bool inc = qdict_get_try_bool(qdict, "inc", false);
+    bool resume = qdict_get_try_bool(qdict, "resume", false);
     const char *uri = qdict_get_str(qdict, "uri");
     Error *err = NULL;
 
-    qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err);
+    qmp_migrate(uri, !!blk, blk, !!inc, inc,
+                false, false, true, resume, &err);
     if (err) {
         hmp_handle_error(mon, &err);
         return;
diff --git a/migration/migration.c b/migration/migration.c
index 9d184b3f36..9faf22d3ae 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1408,7 +1408,7 @@ bool migration_is_blocked(Error **errp)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
                  bool has_inc, bool inc, bool has_detach, bool detach,
-                 Error **errp)
+                 bool has_resume, bool resume, Error **errp)
 {
     Error *local_err = NULL;
     MigrationState *s = migrate_get_current();
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 35d862a5d2..078ded20cd 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -897,13 +897,14 @@ ETEXI
 
     {
         .name       = "migrate",
-        .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-        .params     = "[-d] [-b] [-i] uri",
+        .args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
+        .params     = "[-d] [-b] [-i] [-r] uri",
         .help       = "migrate to URI (using -d to not wait for completion)"
 		      "\n\t\t\t -b for migration without shared storage with"
 		      " full copy of disk\n\t\t\t -i for migration without "
 		      "shared storage with incremental copy of disk "
-		      "(base image shared between src and destination)",
+		      "(base image shared between src and destination)"
+                      "\n\t\t\t -r to resume a paused migration",
         .cmd        = hmp_migrate,
     },
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 08/24] migration: rebuild channel on source
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (6 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 07/24] qmp: hmp: add migrate "resume" option Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 09/24] migration: new state "postcopy-recover" Peter Xu
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch detects the "resume" flag of migration command, rebuild the
channels only if the flag is set.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 91 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 70 insertions(+), 21 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9faf22d3ae..78b0046e59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1406,49 +1406,75 @@ bool migration_is_blocked(Error **errp)
     return false;
 }
 
-void qmp_migrate(const char *uri, bool has_blk, bool blk,
-                 bool has_inc, bool inc, bool has_detach, bool detach,
-                 bool has_resume, bool resume, Error **errp)
+/* Returns true if continue to migrate, or false if error detected */
+static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
+                            bool resume, Error **errp)
 {
     Error *local_err = NULL;
-    MigrationState *s = migrate_get_current();
-    const char *p;
+
+    if (resume) {
+        if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            error_setg(errp, "Cannot resume if there is no "
+                       "paused migration");
+            return false;
+        }
+        /* This is a resume, skip init status */
+        return true;
+    }
 
     if (migration_is_setup_or_active(s->state) ||
         s->state == MIGRATION_STATUS_CANCELLING ||
         s->state == MIGRATION_STATUS_COLO) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
-        return;
+        return false;
     }
+
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         error_setg(errp, "Guest is waiting for an incoming migration");
-        return;
+        return false;
     }
 
     if (migration_is_blocked(errp)) {
-        return;
+        return false;
     }
 
-    if ((has_blk && blk) || (has_inc && inc)) {
+    if (blk || blk_inc) {
         if (migrate_use_block() || migrate_use_block_incremental()) {
             error_setg(errp, "Command options are incompatible with "
                        "current migration capabilities");
-            return;
+            return false;
         }
         migrate_set_block_enabled(true, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
-            return;
+            return false;
         }
         s->must_remove_block_options = true;
     }
 
-    if (has_inc && inc) {
+    if (blk_inc) {
         migrate_set_block_incremental(s, true);
     }
 
     migrate_init(s);
 
+    return true;
+}
+
+void qmp_migrate(const char *uri, bool has_blk, bool blk,
+                 bool has_inc, bool inc, bool has_detach, bool detach,
+                 bool has_resume, bool resume, Error **errp)
+{
+    Error *local_err = NULL;
+    MigrationState *s = migrate_get_current();
+    const char *p;
+
+    if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
+                         has_resume && resume, errp)) {
+        /* Error detected, put into errp */
+        return;
+    }
+
     if (strstart(uri, "tcp:", &p)) {
         tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -1923,7 +1949,8 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms)
+static int open_return_path_on_source(MigrationState *ms,
+                                      bool create_thread)
 {
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
@@ -1932,6 +1959,12 @@ static int open_return_path_on_source(MigrationState *ms)
     }
 
     trace_open_return_path_on_source();
+
+    if (!create_thread) {
+        /* We're done */
+        return 0;
+    }
+
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 
@@ -2588,6 +2621,9 @@ static void *migration_thread(void *opaque)
 
 void migrate_fd_connect(MigrationState *s, Error *error_in)
 {
+    int64_t rate_limit;
+    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
+
     s->expected_downtime = s->parameters.downtime_limit;
     s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
     if (error_in) {
@@ -2596,12 +2632,21 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         return;
     }
 
-    qemu_file_set_blocking(s->to_dst_file, true);
-    qemu_file_set_rate_limit(s->to_dst_file,
-                             s->parameters.max_bandwidth / XFER_LIMIT_RATIO);
+    if (resume) {
+        /* This is a resumed migration */
+        rate_limit = INT64_MAX;
+    } else {
+        /* This is a fresh new migration */
+        rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO;
+        s->expected_downtime = s->parameters.downtime_limit;
+        s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
-    /* Notify before starting migration thread */
-    notifier_list_notify(&migration_state_notifiers, s);
+        /* Notify before starting migration thread */
+        notifier_list_notify(&migration_state_notifiers, s);
+    }
+
+    qemu_file_set_rate_limit(s->to_dst_file, rate_limit);
+    qemu_file_set_blocking(s->to_dst_file, true);
 
     /*
      * Open the return path. For postcopy, it is used exclusively. For
@@ -2609,15 +2654,19 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_use_return_path()) {
-        if (open_return_path_on_source(s)) {
+        if (open_return_path_on_source(s, !resume)) {
             error_report("Unable to open return-path for postcopy");
-            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
             return;
         }
     }
 
+    if (resume) {
+        /* TODO: do the resume logic */
+        return;
+    }
+
     if (multifd_save_setup() != 0) {
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                           MIGRATION_STATUS_FAILED);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 09/24] migration: new state "postcopy-recover"
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (7 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 08/24] migration: rebuild channel on source Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 10/24] migration: wakeup dst ram-load-thread for recover Peter Xu
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new migration state "postcopy-recover". If a migration
procedure is paused and the connection is rebuilt afterward
successfully, we'll switch the source VM state from "postcopy-paused" to
the new state "postcopy-recover", then we'll do the resume logic in the
migration thread (along with the return path thread).

This patch only do the state switch on source side. Another following up
patch will handle the state switching on destination side using the same
status bit.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   |  4 ++-
 migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 17b78c9938..25a6776af6 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -91,6 +91,8 @@
 #
 # @postcopy-paused: during postcopy but paused. (since 2.12)
 #
+# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -109,7 +111,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
-            'completed', 'failed', 'colo',
+            'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device' ] }
 
 ##
diff --git a/migration/migration.c b/migration/migration.c
index 78b0046e59..304e8bf2af 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -572,6 +572,7 @@ static bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
@@ -653,6 +654,7 @@ static void fill_source_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
          /* TODO add some postcopy stats */
         info->has_status = true;
         info->has_total_time = true;
@@ -2313,6 +2315,13 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+/* Return zero if success, or <0 for error */
+static int postcopy_do_resume(MigrationState *s)
+{
+    /* TODO: do the resume logic */
+    return 0;
+}
+
 /*
  * We don't return until we are in a safe state to continue current
  * postcopy migration.  Returns MIG_THR_ERR_RECOVERED if recovered, or
@@ -2321,29 +2330,55 @@ typedef enum MigThrError {
 static MigThrError postcopy_pause(MigrationState *s)
 {
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-                      MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    /* Current channel is possibly broken. Release it. */
-    assert(s->to_dst_file);
-    qemu_file_shutdown(s->to_dst_file);
-    qemu_fclose(s->to_dst_file);
-    s->to_dst_file = NULL;
+    while (true) {
+        migrate_set_state(&s->state, s->state,
+                          MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-    error_report("Detected IO failure for postcopy. "
-                 "Migration paused.");
+        /* Current channel is possibly broken. Release it. */
+        assert(s->to_dst_file);
+        qemu_file_shutdown(s->to_dst_file);
+        qemu_fclose(s->to_dst_file);
+        s->to_dst_file = NULL;
 
-    /*
-     * We wait until things fixed up. Then someone will setup the
-     * status back for us.
-     */
-    while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
-        qemu_sem_wait(&s->postcopy_pause_sem);
-    }
+        error_report("Detected IO failure for postcopy. "
+                     "Migration paused.");
 
-    trace_postcopy_pause_continued();
+        /*
+         * We wait until things fixed up. Then someone will setup the
+         * status back for us.
+         */
+        while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+            qemu_sem_wait(&s->postcopy_pause_sem);
+        }
+
+        if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+            /* Woken up by a recover procedure. Give it a shot */
 
-    return MIG_THR_ERR_RECOVERED;
+            /*
+             * Firstly, let's wake up the return path now, with a new
+             * return path channel.
+             */
+            qemu_sem_post(&s->postcopy_pause_rp_sem);
+
+            /* Do the resume logic */
+            if (postcopy_do_resume(s) == 0) {
+                /* Let's continue! */
+                trace_postcopy_pause_continued();
+                return MIG_THR_ERR_RECOVERED;
+            } else {
+                /*
+                 * Something wrong happened during the recovery, let's
+                 * pause again. Pause is always better than throwing
+                 * data away.
+                 */
+                continue;
+            }
+        } else {
+            /* This is not right... Time to quit. */
+            return MIG_THR_ERR_FATAL;
+        }
+    }
 }
 
 static MigThrError migration_detect_error(MigrationState *s)
@@ -2663,7 +2698,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     }
 
     if (resume) {
-        /* TODO: do the resume logic */
+        /* Wakeup the main migration thread to do the recovery */
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+        qemu_sem_post(&s->postcopy_pause_sem);
         return;
     }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 10/24] migration: wakeup dst ram-load-thread for recover
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (8 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 09/24] migration: new state "postcopy-recover" Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 11/24] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

On the destination side, we cannot wake up all the threads when we got
reconnected. The first thing to do is to wake up the main load thread,
so that we can continue to receive valid messages from source again and
reply when needed.

At this point, we switch the destination VM state from postcopy-paused
back to postcopy-recover.

Now we are finally ready to do the resume logic.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 304e8bf2af..32f6d686b7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -440,8 +440,34 @@ static void migration_incoming_process(void)
 
 void migration_fd_process_incoming(QEMUFile *f)
 {
-    migration_incoming_setup(f);
-    migration_incoming_process();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        /* Resumed from a paused postcopy migration */
+
+        mis->from_src_file = f;
+        /* Postcopy has standalone thread to do vm load */
+        qemu_file_set_blocking(f, true);
+
+        /* Re-configure the return path */
+        mis->to_src_file = qemu_file_get_return_path(f);
+
+        migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER);
+
+        /*
+         * Here, we only wake up the main loading thread (while the
+         * fault thread will still be waiting), so that we can receive
+         * commands from source now, and answer it if needed. The
+         * fault thread will be woken up afterwards until we are sure
+         * that source is ready to reply to page requests.
+         */
+        qemu_sem_post(&mis->postcopy_pause_sem_dst);
+    } else {
+        /* New incoming migration */
+        migration_incoming_setup(f);
+        migration_incoming_process();
+    }
 }
 
 void migration_ioc_process_incoming(QIOChannel *ioc)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 11/24] migration: new cmd MIG_CMD_RECV_BITMAP
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (9 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 10/24] migration: wakeup dst ram-load-thread for recover Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 12/24] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for
one ramblock.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.h     |  1 +
 migration/savevm.c     | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index cf4f0d37ca..b8cee00d41 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -47,6 +47,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
diff --git a/migration/savevm.c b/migration/savevm.c
index 6ee69d8283..9f4a95d411 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -81,6 +81,7 @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
 };
 
@@ -98,6 +99,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
+    [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -956,6 +958,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
+{
+    size_t len;
+    char buf[256];
+
+    trace_savevm_send_recv_bitmap(block_name);
+
+    buf[0] = len = strlen(block_name);
+    memcpy(buf + 1, block_name, len);
+
+    qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1801,6 +1816,49 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
     return ret;
 }
 
+/*
+ * Handle request that source requests for recved_bitmap on
+ * destination. Payload format:
+ *
+ * len (1 byte) + ramblock_name (<255 bytes)
+ */
+static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
+                                     uint16_t len)
+{
+    QEMUFile *file = mis->from_src_file;
+    RAMBlock *rb;
+    char block_name[256];
+    size_t cnt;
+
+    cnt = qemu_get_counted_string(file, block_name);
+    if (!cnt) {
+        error_report("%s: failed to read block name", __func__);
+        return -EINVAL;
+    }
+
+    /* Validate before using the data */
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    if (len != cnt + 1) {
+        error_report("%s: invalid payload length (%d)", __func__, len);
+        return -EINVAL;
+    }
+
+    rb = qemu_ram_block_by_name(block_name);
+    if (!rb) {
+        error_report("%s: block '%s' not found", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* TODO: send the bitmap back to source */
+
+    trace_loadvm_handle_recv_bitmap(block_name);
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
@@ -1874,6 +1932,9 @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
+
+    case MIG_CMD_RECV_BITMAP:
+        return loadvm_handle_recv_bitmap(mis, len);
     }
 
     return 0;
diff --git a/migration/trace-events b/migration/trace-events
index 7f836499d1..5bee6d525a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -12,6 +12,7 @@ loadvm_state_cleanup(void) ""
 loadvm_handle_cmd_packaged(unsigned int length) "%u"
 loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
+loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
@@ -34,6 +35,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 12/24] migration: new message MIG_RP_MSG_RECV_BITMAP
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (10 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 11/24] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 13/24] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send
received bitmap of ramblock back to source.

This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only
the header (including the ramblock name), and it was appended with the
whole ramblock received bitmap on the destination side.

When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP),
it parses it, convert it to the dirty bitmap by inverting the bits.

One thing to mention is that, when we send the recv bitmap, we are doing
these things in extra:

- converting the bitmap to little endian, to support when hosts are
  using different endianess on src/dst.

- do proper alignment for 8 bytes, to support when hosts are using
  different word size (32/64 bits) on src/dst.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |   2 +
 migration/ram.h        |   3 ++
 migration/migration.c  |  68 +++++++++++++++++++++++
 migration/ram.c        | 144 +++++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c     |   2 +-
 migration/trace-events |   3 ++
 6 files changed, 221 insertions(+), 1 deletion(-)

diff --git a/migration/migration.h b/migration/migration.h
index 83ebc18b1e..b5f89da402 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -259,6 +259,8 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name);
 
 void dirty_bitmap_mig_before_vm_start(void);
 void init_dirty_bitmap_incoming_migration(void);
diff --git a/migration/ram.h b/migration/ram.h
index 5030be110a..73d979e9de 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -63,5 +63,8 @@ int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name);
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 32f6d686b7..4276d18130 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -95,6 +95,7 @@ enum mig_rp_message_type {
 
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
+    MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
 
     MIG_RP_MSG_MAX
 };
@@ -519,6 +520,45 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
     migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
 }
 
+void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
+                                 char *block_name)
+{
+    char buf[512];
+    int len;
+    int64_t res;
+
+    /*
+     * First, we send the header part. It contains only the len of
+     * idstr, and the idstr itself.
+     */
+    len = strlen(block_name);
+    buf[0] = len;
+    memcpy(buf + 1, block_name, len);
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: MSG_RP_RECV_BITMAP only used for recovery",
+                     __func__);
+        return;
+    }
+
+    migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf);
+
+    /*
+     * Next, we dump the received bitmap to the stream.
+     *
+     * TODO: currently we are safe since we are the only one that is
+     * using the to_src_file handle (fault thread is still paused),
+     * and it's ok even not taking the mutex. However the best way is
+     * to take the lock before sending the message header, and release
+     * the lock after sending the bitmap.
+     */
+    qemu_mutex_lock(&mis->rp_mutex);
+    res = ramblock_recv_bitmap_send(mis->to_src_file, block_name);
+    qemu_mutex_unlock(&mis->rp_mutex);
+
+    trace_migrate_send_rp_recv_bitmap(block_name, res);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1797,6 +1837,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+    [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1841,6 +1882,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s)
     return true;
 }
 
+static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+
+    if (!block) {
+        error_report("%s: invalid block name '%s'", __func__, block_name);
+        return -EINVAL;
+    }
+
+    /* Fetch the received bitmap and refresh the dirty bitmap */
+    return ram_dirty_bitmap_reload(s, block);
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -1946,6 +2000,20 @@ retry:
             migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
+        case MIG_RP_MSG_RECV_BITMAP:
+            if (header_len < 1) {
+                error_report("%s: missing block name", __func__);
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            /* Format: len (1B) + idstr (<255B). This ends the idstr. */
+            buf[buf[0] + 1] = '\0';
+            if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/ram.c b/migration/ram.c
index 912810c18e..3512383991 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -187,6 +187,70 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
                       nr);
 }
 
+#define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
+
+/*
+ * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes).
+ *
+ * Returns >0 if success with sent bytes, or <0 if error.
+ */
+int64_t ramblock_recv_bitmap_send(QEMUFile *file,
+                                  const char *block_name)
+{
+    RAMBlock *block = qemu_ram_block_by_name(block_name);
+    unsigned long *le_bitmap, nbits;
+    uint64_t size;
+
+    if (!block) {
+        error_report("%s: invalid block name: %s", __func__, block_name);
+        return -1;
+    }
+
+    nbits = block->used_length >> TARGET_PAGE_BITS;
+
+    /*
+     * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit
+     * machines we may need 4 more bytes for padding (see below
+     * comment). So extend it a bit before hand.
+     */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    /*
+     * Always use little endian when sending the bitmap. This is
+     * required that when source and destination VMs are not using the
+     * same endianess. (Note: big endian won't work.)
+     */
+    bitmap_to_le(le_bitmap, block->receivedmap, nbits);
+
+    /* Size of the bitmap, in bytes */
+    size = nbits / 8;
+
+    /*
+     * size is always aligned to 8 bytes for 64bit machines, but it
+     * may not be true for 32bit machines. We need this padding to
+     * make sure the migration can survive even between 32bit and
+     * 64bit machines.
+     */
+    size = ROUND_UP(size, 8);
+
+    qemu_put_be64(file, size);
+    qemu_put_buffer(file, (const uint8_t *)le_bitmap, size);
+    /*
+     * Mark as an end, in case the middle part is screwed up due to
+     * some "misterious" reason.
+     */
+    qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING);
+    qemu_fflush(file);
+
+    free(le_bitmap);
+
+    if (qemu_file_get_error(file)) {
+        return qemu_file_get_error(file);
+    }
+
+    return size + sizeof(size);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -3100,6 +3164,86 @@ static bool ram_has_postcopy(void *opaque)
     return migrate_postcopy_ram();
 }
 
+/*
+ * Read the received bitmap, revert it as the initial dirty bitmap.
+ * This is only used when the postcopy migration is paused but wants
+ * to resume from a middle point.
+ */
+int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
+{
+    int ret = -EINVAL;
+    QEMUFile *file = s->rp_state.from_dst_file;
+    unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
+    uint64_t local_size = nbits / 8;
+    uint64_t size, end_mark;
+
+    trace_ram_dirty_bitmap_reload_begin(block->idstr);
+
+    if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: incorrect state %s", __func__,
+                     MigrationStatus_str(s->state));
+        return -EINVAL;
+    }
+
+    /*
+     * Note: see comments in ramblock_recv_bitmap_send() on why we
+     * need the endianess convertion, and the paddings.
+     */
+    local_size = ROUND_UP(local_size, 8);
+
+    /* Add paddings */
+    le_bitmap = bitmap_new(nbits + BITS_PER_LONG);
+
+    size = qemu_get_be64(file);
+
+    /* The size of the bitmap should match with our ramblock */
+    if (size != local_size) {
+        error_report("%s: ramblock '%s' bitmap size mismatch "
+                     "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
+                     block->idstr, size, local_size);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
+    end_mark = qemu_get_be64(file);
+
+    ret = qemu_file_get_error(file);
+    if (ret || size != local_size) {
+        error_report("%s: read bitmap failed for ramblock '%s': %d"
+                     " (size 0x%"PRIx64", got: 0x%"PRIx64")",
+                     __func__, block->idstr, ret, local_size, size);
+        ret = -EIO;
+        goto out;
+    }
+
+    if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
+        error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64,
+                     __func__, block->idstr, end_mark);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    /*
+     * Endianess convertion. We are during postcopy (though paused).
+     * The dirty bitmap won't change. We can directly modify it.
+     */
+    bitmap_from_le(block->bmap, le_bitmap, nbits);
+
+    /*
+     * What we received is "received bitmap". Revert it as the initial
+     * dirty bitmap for this ramblock.
+     */
+    bitmap_complement(block->bmap, block->bmap, nbits);
+
+    trace_ram_dirty_bitmap_reload_complete(block->idstr);
+
+    ret = 0;
+out:
+    free(le_bitmap);
+    return ret;
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
diff --git a/migration/savevm.c b/migration/savevm.c
index 9f4a95d411..7176b350d5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1852,7 +1852,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
         return -EINVAL;
     }
 
-    /* TODO: send the bitmap back to source */
+    migrate_send_rp_recv_bitmap(mis, block_name);
 
     trace_loadvm_handle_recv_bitmap(block_name);
 
diff --git a/migration/trace-events b/migration/trace-events
index 5bee6d525a..72e57089f3 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -79,6 +79,8 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_dirty_bitmap_reload_begin(char *str) "%s"
+ram_dirty_bitmap_reload_complete(char *str) "%s"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
@@ -90,6 +92,7 @@ migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
 migrate_pending(uint64_t size, uint64_t max, uint64_t pre, uint64_t compat, uint64_t post) "pending size %" PRIu64 " max %" PRIu64 " (pre = %" PRIu64 " compat=%" PRIu64 " post=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64
 migration_completion_file_err(void) ""
 migration_completion_postcopy_end(void) ""
 migration_completion_postcopy_end_after_complete(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 13/24] migration: new cmd MIG_CMD_POSTCOPY_RESUME
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (11 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 12/24] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 14/24] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Introducing this new command to be sent when the source VM is ready to
resume the paused migration.  What the destination does here is
basically release the fault thread to continue service page faults.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/savevm.h     |  1 +
 migration/savevm.c     | 35 +++++++++++++++++++++++++++++++++++
 migration/trace-events |  2 ++
 3 files changed, 38 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index b8cee00d41..b24ff073ad 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -47,6 +47,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
+void qemu_savevm_send_postcopy_resume(QEMUFile *f);
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
diff --git a/migration/savevm.c b/migration/savevm.c
index 7176b350d5..a7e793eef7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -80,6 +80,7 @@ enum qemu_vm_cmd {
     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                       were previously sent during
                                       precopy but are dirty. */
+    MIG_CMD_POSTCOPY_RESUME,       /* resume postcopy on dest */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
@@ -98,6 +99,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
+    [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
@@ -958,6 +960,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
 }
 
+void qemu_savevm_send_postcopy_resume(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_resume();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL);
+}
+
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
 {
     size_t len;
@@ -1768,6 +1776,30 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
     return LOADVM_QUIT;
 }
 
+static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
+{
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        error_report("%s: illegal resume received", __func__);
+        /* Don't fail the load, only for this. */
+        return 0;
+    }
+
+    /*
+     * This means source VM is ready to resume the postcopy migration.
+     * It's time to switch state and release the fault thread to
+     * continue service page faults.
+     */
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    qemu_sem_post(&mis->postcopy_pause_sem_fault);
+
+    trace_loadvm_postcopy_handle_resume();
+
+    /* TODO: Tell source that "we are ready" */
+
+    return 0;
+}
+
 /**
  * Immediately following this command is a blob of data containing an embedded
  * chunk of migration stream; read it and load it.
@@ -1933,6 +1965,9 @@ static int loadvm_process_command(QEMUFile *f)
     case MIG_CMD_POSTCOPY_RAM_DISCARD:
         return loadvm_postcopy_ram_handle_discard(mis, len);
 
+    case MIG_CMD_POSTCOPY_RESUME:
+        return loadvm_postcopy_handle_resume(mis);
+
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
     }
diff --git a/migration/trace-events b/migration/trace-events
index 72e57089f3..29cc10f299 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -18,6 +18,7 @@ loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
 loadvm_postcopy_handle_run_cpu_sync(void) ""
 loadvm_postcopy_handle_run_vmstart(void) ""
+loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
@@ -35,6 +36,7 @@ savevm_send_open_return_path(void) ""
 savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
+savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_header(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 14/24] migration: new message MIG_RP_MSG_RESUME_ACK
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (12 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 13/24] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 15/24] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t
is used as payload to let the source know whether destination is ready
to continue the migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |  3 +++
 migration/migration.c  | 37 +++++++++++++++++++++++++++++++++++++
 migration/savevm.c     |  3 ++-
 migration/trace-events |  1 +
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/migration/migration.h b/migration/migration.h
index b5f89da402..e8f0afb8eb 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -24,6 +24,8 @@
 
 struct PostcopyBlocktimeContext;
 
+#define  MIGRATION_RESUME_ACK_VALUE  (1)
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -261,6 +263,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
 void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
                                  char *block_name);
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
 void init_dirty_bitmap_incoming_migration(void);
diff --git a/migration/migration.c b/migration/migration.c
index 4276d18130..ac1780e8f0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -96,6 +96,7 @@ enum mig_rp_message_type {
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
     MIG_RP_MSG_RECV_BITMAP,  /* send recved_bitmap back to source */
+    MIG_RP_MSG_RESUME_ACK,   /* tell source that we are ready to resume */
 
     MIG_RP_MSG_MAX
 };
@@ -559,6 +560,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
     trace_migrate_send_rp_recv_bitmap(block_name, res);
 }
 
+void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf);
+}
+
 MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp)
 {
     MigrationCapabilityStatusList *head = NULL;
@@ -1838,6 +1847,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
     [MIG_RP_MSG_RECV_BITMAP]    = { .len = -1, .name = "RECV_BITMAP" },
+    [MIG_RP_MSG_RESUME_ACK]     = { .len =  4, .name = "RESUME_ACK" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1895,6 +1905,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
     return ram_dirty_bitmap_reload(s, block);
 }
 
+static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
+{
+    trace_source_return_path_thread_resume_ack(value);
+
+    if (value != MIGRATION_RESUME_ACK_VALUE) {
+        error_report("%s: illegal resume_ack value %"PRIu32,
+                     __func__, value);
+        return -1;
+    }
+
+    /* Now both sides are active. */
+    migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    /* TODO: notify send thread that time to continue send pages */
+
+    return 0;
+}
+
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -2014,6 +2043,14 @@ retry:
             }
             break;
 
+        case MIG_RP_MSG_RESUME_ACK:
+            tmp32 = ldl_be_p(buf);
+            if (migrate_handle_rp_resume_ack(ms, tmp32)) {
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            break;
+
         default:
             break;
         }
diff --git a/migration/savevm.c b/migration/savevm.c
index a7e793eef7..6a2d77cbf3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1795,7 +1795,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
 
     trace_loadvm_postcopy_handle_resume();
 
-    /* TODO: Tell source that "we are ready" */
+    /* Tell source that "we are ready" */
+    migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE);
 
     return 0;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 29cc10f299..40a7217829 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -120,6 +120,7 @@ source_return_path_thread_entry(void) ""
 source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "0x%x"
 source_return_path_thread_shut(uint32_t val) "0x%x"
+source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 15/24] migration: introduce SaveVMHandlers.resume_prepare
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (13 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 14/24] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 16/24] migration: synchronize dirty bitmap for resume Peter Xu
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This is hook function to be called when a postcopy migration wants to
resume from a failure. For each module, it should provide its own
recovery logic before we switch to the postcopy-active state.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h |  2 ++
 migration/savevm.h           |  1 +
 migration/migration.c        | 20 +++++++++++++++++++-
 migration/savevm.c           | 25 +++++++++++++++++++++++++
 migration/trace-events       |  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index f6f12f9b1a..d287f4c317 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -64,6 +64,8 @@ typedef struct SaveVMHandlers {
     LoadStateHandler *load_state;
     int (*load_setup)(QEMUFile *f, void *opaque);
     int (*load_cleanup)(void *opaque);
+    /* Called when postcopy migration wants to resume from failure */
+    int (*resume_prepare)(MigrationState *s, void *opaque);
 } SaveVMHandlers;
 
 int register_savevm_live(DeviceState *dev,
diff --git a/migration/savevm.h b/migration/savevm.h
index b24ff073ad..a5e65b8ae3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,6 +31,7 @@
 
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/migration.c b/migration/migration.c
index ac1780e8f0..a8c50783f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2449,7 +2449,25 @@ typedef enum MigThrError {
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
-    /* TODO: do the resume logic */
+    int ret;
+
+    /*
+     * Call all the resume_prepare() hooks, so that modules can be
+     * ready for the migration resume.
+     */
+    ret = qemu_savevm_state_resume_prepare(s);
+    if (ret) {
+        error_report("%s: resume_prepare() failure detected: %d",
+                     __func__, ret);
+        return ret;
+    }
+
+    /*
+     * TODO: handshake with dest using MIG_CMD_RESUME,
+     * MIG_RP_MSG_RESUME_ACK, then switch source state to
+     * "postcopy-active"
+     */
+
     return 0;
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 6a2d77cbf3..8e7653badc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1031,6 +1031,31 @@ void qemu_savevm_state_setup(QEMUFile *f)
     }
 }
 
+int qemu_savevm_state_resume_prepare(MigrationState *s)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    trace_savevm_state_resume_prepare();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->resume_prepare) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        ret = se->ops->resume_prepare(s, se->opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
 /*
  * this function has three return values:
  *   negative: there was one error, and we have -errno.
diff --git a/migration/trace-events b/migration/trace-events
index 40a7217829..be36fbccfe 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -39,6 +39,7 @@ savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
+savevm_state_resume_prepare(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_cleanup(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 16/24] migration: synchronize dirty bitmap for resume
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (14 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 15/24] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate " Peter Xu
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

This patch implements the first part of core RAM resume logic for
postcopy. ram_resume_prepare() is provided for the work.

When the migration is interrupted by network failure, the dirty bitmap
on the source side will be meaningless, because even the dirty bit is
cleared, it is still possible that the sent page was lost along the way
to destination. Here instead of continue the migration with the old
dirty bitmap on source, we ask the destination side to send back its
received bitmap, then invert it to be our initial dirty bitmap.

The source side send thread will issue the MIG_CMD_RECV_BITMAP requests,
once per ramblock, to ask for the received bitmap. On destination side,
MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap.
Data will be received on the return-path thread of source, and the main
migration thread will be notified when all the ramblock bitmaps are
synchronized.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h  |  1 +
 migration/migration.c  |  2 ++
 migration/ram.c        | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  4 ++++
 4 files changed, 54 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index e8f0afb8eb..2f0dc34ae9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -135,6 +135,7 @@ struct MigrationState
         QEMUFile     *from_dst_file;
         QemuThread    rp_thread;
         bool          error;
+        QemuSemaphore rp_sem;
     } rp_state;
 
     double mbps;
diff --git a/migration/migration.c b/migration/migration.c
index a8c50783f7..960827d3ea 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2962,6 +2962,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
+    qemu_sem_destroy(&ms->rp_state.rp_sem);
     error_free(ms->error);
 }
 
@@ -2994,6 +2995,7 @@ static void migration_instance_init(Object *obj)
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
+    qemu_sem_init(&ms->rp_state.rp_sem, 0);
 }
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index 3512383991..118460109b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "savevm.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -3164,6 +3165,38 @@ static bool ram_has_postcopy(void *opaque)
     return migrate_postcopy_ram();
 }
 
+/* Sync all the dirty bitmap with destination VM.  */
+static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
+{
+    RAMBlock *block;
+    QEMUFile *file = s->to_dst_file;
+    int ramblock_count = 0;
+
+    trace_ram_dirty_bitmap_sync_start();
+
+    RAMBLOCK_FOREACH(block) {
+        qemu_savevm_send_recv_bitmap(file, block->idstr);
+        trace_ram_dirty_bitmap_request(block->idstr);
+        ramblock_count++;
+    }
+
+    trace_ram_dirty_bitmap_sync_wait();
+
+    /* Wait until all the ramblocks' dirty bitmap synced */
+    while (ramblock_count--) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    trace_ram_dirty_bitmap_sync_complete();
+
+    return 0;
+}
+
+static void ram_dirty_bitmap_reload_notify(MigrationState *s)
+{
+    qemu_sem_post(&s->rp_state.rp_sem);
+}
+
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -3238,12 +3271,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
 
     trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
+    /*
+     * We succeeded to sync bitmap for current ramblock. If this is
+     * the last one to sync, we need to notify the main send thread.
+     */
+    ram_dirty_bitmap_reload_notify(s);
+
     ret = 0;
 out:
     free(le_bitmap);
     return ret;
 }
 
+static int ram_resume_prepare(MigrationState *s, void *opaque)
+{
+    RAMState *rs = *(RAMState **)opaque;
+
+    return ram_dirty_bitmap_sync_all(s, rs);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
     .save_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
@@ -3255,6 +3301,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .save_cleanup = ram_save_cleanup,
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
+    .resume_prepare = ram_resume_prepare,
 };
 
 void ram_mig_init(void)
diff --git a/migration/trace-events b/migration/trace-events
index be36fbccfe..53243e17ec 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -82,8 +82,12 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p"
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx"
+ram_dirty_bitmap_request(char *str) "%s"
 ram_dirty_bitmap_reload_begin(char *str) "%s"
 ram_dirty_bitmap_reload_complete(char *str) "%s"
+ram_dirty_bitmap_sync_start(void) ""
+ram_dirty_bitmap_sync_wait(void) ""
+ram_dirty_bitmap_sync_complete(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate for resume
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (15 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 16/24] migration: synchronize dirty bitmap for resume Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-08 10:54   ` Dr. David Alan Gilbert
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 18/24] migration: final handshake for the resume Peter Xu
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

After we updated the dirty bitmaps of ramblocks, we also need to update
the critical fields in RAMState to make sure it is ready for a resume.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c        | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events |  1 +
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 118460109b..2821e15cc2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2291,6 +2291,41 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
+{
+    RAMBlock *block;
+    uint64_t pages = 0;
+
+    /*
+     * Postcopy is not using xbzrle/compression, so no need for that.
+     * Also, since source are already halted, we don't need to care
+     * about dirty page logging as well.
+     */
+
+    RAMBLOCK_FOREACH(block) {
+        pages += bitmap_count_one(block->bmap,
+                                  block->used_length >> TARGET_PAGE_BITS);
+    }
+
+    /* This may not be aligned with current bitmaps. Recalculate. */
+    rs->migration_dirty_pages = pages;
+
+    rs->last_seen_block = NULL;
+    rs->last_sent_block = NULL;
+    rs->last_page = 0;
+    rs->last_version = ram_list.version;
+    /*
+     * Disable the bulk stage, otherwise we'll resend the whole RAM no
+     * matter what we have sent.
+     */
+    rs->ram_bulk_stage = false;
+
+    /* Update RAMState cache of output QEMUFile */
+    rs->f = out;
+
+    trace_ram_state_resume_prepare(pages);
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
@@ -3286,8 +3321,16 @@ out:
 static int ram_resume_prepare(MigrationState *s, void *opaque)
 {
     RAMState *rs = *(RAMState **)opaque;
+    int ret;
 
-    return ram_dirty_bitmap_sync_all(s, rs);
+    ret = ram_dirty_bitmap_sync_all(s, rs);
+    if (ret) {
+        return ret;
+    }
+
+    ram_state_resume_prepare(rs, s->to_dst_file);
+
+    return 0;
 }
 
 static SaveVMHandlers savevm_ram_handlers = {
diff --git a/migration/trace-events b/migration/trace-events
index 53243e17ec..46c5ca1dba 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -88,6 +88,7 @@ ram_dirty_bitmap_reload_complete(char *str) "%s"
 ram_dirty_bitmap_sync_start(void) ""
 ram_dirty_bitmap_sync_wait(void) ""
 ram_dirty_bitmap_sync_complete(void) ""
+ram_state_resume_prepare(uint64_t v) "%ld"
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 18/24] migration: final handshake for the resume
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (16 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate " Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 19/24] migration: init dst in migration_object_init too Peter Xu
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Finish the last step to do the final handshake for the recovery.

First source sends one MIG_CMD_RESUME to dst, telling that source is
ready to resume.

Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that
dest is ready to resume (after switch to postcopy-active state).

When source received the RESUME_ACK, it switches its state to
postcopy-active, and finally the recovery is completed.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 960827d3ea..eec84ccc3a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1919,7 +1919,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
-    /* TODO: notify send thread that time to continue send pages */
+    /* Notify send thread that time to continue send pages */
+    qemu_sem_post(&s->rp_state.rp_sem);
 
     return 0;
 }
@@ -2446,6 +2447,21 @@ typedef enum MigThrError {
     MIG_THR_ERR_FATAL = 2,
 } MigThrError;
 
+static int postcopy_resume_handshake(MigrationState *s)
+{
+    qemu_savevm_send_postcopy_resume(s->to_dst_file);
+
+    while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
+        qemu_sem_wait(&s->rp_state.rp_sem);
+    }
+
+    if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
+    return -1;
+}
+
 /* Return zero if success, or <0 for error */
 static int postcopy_do_resume(MigrationState *s)
 {
@@ -2463,10 +2479,14 @@ static int postcopy_do_resume(MigrationState *s)
     }
 
     /*
-     * TODO: handshake with dest using MIG_CMD_RESUME,
-     * MIG_RP_MSG_RESUME_ACK, then switch source state to
-     * "postcopy-active"
+     * Last handshake with destination on the resume (destination will
+     * switch to postcopy-active afterwards)
      */
+    ret = postcopy_resume_handshake(s);
+    if (ret) {
+        error_report("%s: handshake failed: %d", __func__, ret);
+        return ret;
+    }
 
     return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 19/24] migration: init dst in migration_object_init too
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (17 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 18/24] migration: final handshake for the resume Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 20/24] qmp/migration: new command migrate-recover Peter Xu
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Though we may not need it, now we init both the src/dst migration
objects in migration_object_init() so that even incoming migration
object would be thread safe (it was not).

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index eec84ccc3a..c059579fee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -106,6 +106,7 @@ enum mig_rp_message_type {
    dynamic creation of migration */
 
 static MigrationState *current_migration;
+static MigrationIncomingState *current_incoming;
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 static int migration_maybe_pause(MigrationState *s,
@@ -121,6 +122,22 @@ void migration_object_init(void)
     assert(!current_migration);
     current_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION));
 
+    /*
+     * Init the migrate incoming object as well no matter whether
+     * we'll use it or not.
+     */
+    assert(!current_incoming);
+    current_incoming = g_new0(MigrationIncomingState, 1);
+    current_incoming->state = MIGRATION_STATUS_NONE;
+    current_incoming->postcopy_remote_fds =
+        g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
+    qemu_mutex_init(&current_incoming->rp_mutex);
+    qemu_event_init(&current_incoming->main_thread_load_event, false);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
+
+    init_dirty_bitmap_incoming_migration();
+
     if (!migration_object_check(current_migration, &err)) {
         error_report_err(err);
         exit(1);
@@ -151,24 +168,8 @@ MigrationState *migrate_get_current(void)
 
 MigrationIncomingState *migration_incoming_get_current(void)
 {
-    static bool once;
-    static MigrationIncomingState mis_current;
-
-    if (!once) {
-        mis_current.state = MIGRATION_STATUS_NONE;
-        memset(&mis_current, 0, sizeof(MigrationIncomingState));
-        mis_current.postcopy_remote_fds = g_array_new(FALSE, TRUE,
-                                                   sizeof(struct PostCopyFD));
-        qemu_mutex_init(&mis_current.rp_mutex);
-        qemu_event_init(&mis_current.main_thread_load_event, false);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0);
-        qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0);
-
-        init_dirty_bitmap_incoming_migration();
-
-        once = true;
-    }
-    return &mis_current;
+    assert(current_incoming);
+    return current_incoming;
 }
 
 void migration_incoming_state_destroy(void)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 20/24] qmp/migration: new command migrate-recover
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (18 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 19/24] migration: init dst in migration_object_init too Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 21/24] hmp/migration: add migrate_recover command Peter Xu
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

The first allow-oob=true command.  It's used on destination side when
the postcopy migration is paused and ready for a recovery.  After
execution, a new migration channel will be established for postcopy to
continue.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 20 ++++++++++++++++++++
 migration/migration.h |  1 +
 migration/migration.c | 24 ++++++++++++++++++++++++
 migration/savevm.c    |  3 +++
 4 files changed, 48 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 25a6776af6..a1c2c238ab 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1191,3 +1191,23 @@
 # Since: 2.9
 ##
 { 'command': 'xen-colo-do-checkpoint' }
+
+##
+# @migrate-recover:
+#
+# Provide a recovery migration stream URI.
+#
+# @uri: the URI to be used for the recovery of migration stream.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "migrate-recover",
+#      "arguments": { "uri": "tcp:192.168.1.200:12345" } }
+# <- { "return": {} }
+#
+# Since: 2.12
+##
+{ 'command': 'migrate-recover', 'data': { 'uri': 'str' },
+  'allow-oob': true }
diff --git a/migration/migration.h b/migration/migration.h
index 2f0dc34ae9..012bcd352b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -77,6 +77,7 @@ struct MigrationIncomingState {
     struct PostcopyBlocktimeContext *blocktime_ctx;
 
     /* notify PAUSED postcopy incoming migrations to try to continue */
+    bool postcopy_recover_triggered;
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
 };
diff --git a/migration/migration.c b/migration/migration.c
index c059579fee..03d1fc7bc3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1470,6 +1470,30 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
     once = false;
 }
 
+void qmp_migrate_recover(const char *uri, Error **errp)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        error_setg(errp, "Migrate recover can only be run "
+                   "when postcopy is paused.");
+        return;
+    }
+
+    if (atomic_cmpxchg(&mis->postcopy_recover_triggered,
+                       false, true) == true) {
+        error_setg(errp, "Migrate recovery is triggered already");
+        return;
+    }
+
+    /*
+     * Note that this call will never start a real migration; it will
+     * only re-setup the migration stream and poke existing migration
+     * to continue using that newly established channel.
+     */
+    qemu_start_incoming_migration(uri, errp);
+}
+
 bool migration_is_blocked(Error **errp)
 {
     if (qemu_savevm_state_blocked(errp)) {
diff --git a/migration/savevm.c b/migration/savevm.c
index 8e7653badc..4251125831 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2190,6 +2190,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_incoming();
 
+    /* Clear the triggered bit to allow one recovery */
+    mis->postcopy_recover_triggered = false;
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 21/24] hmp/migration: add migrate_recover command
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (19 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 20/24] qmp/migration: new command migrate-recover Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file Peter Xu
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Sister command to migrate-recover in QMP.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp.h           |  1 +
 hmp.c           | 10 ++++++++++
 hmp-commands.hx | 13 +++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/hmp.h b/hmp.h
index 4e2ec375b0..b6b56c8161 100644
--- a/hmp.h
+++ b/hmp.h
@@ -68,6 +68,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
+void hmp_migrate_recover(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
diff --git a/hmp.c b/hmp.c
index 59d5341590..4ac2126969 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1517,6 +1517,16 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_migrate_recover(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    const char *uri = qdict_get_str(qdict, "uri");
+
+    qmp_migrate_recover(uri, &err);
+
+    hmp_handle_error(mon, &err);
+}
+
 /* Kept for backwards compatibility */
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 078ded20cd..8facba7b4d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -957,7 +957,20 @@ STEXI
 @findex migrate_incoming
 Continue an incoming migration using the @var{uri} (that has the same syntax
 as the -incoming option).
+ETEXI
 
+    {
+        .name       = "migrate_recover",
+        .args_type  = "uri:s",
+        .params     = "uri",
+        .help       = "Continue a paused incoming postcopy migration",
+        .cmd        = hmp_migrate_recover,
+    },
+
+STEXI
+@item migrate_recover @var{uri}
+@findex migrate_recover
+Continue a paused incoming postcopy migration using the @var{uri}.
 ETEXI
 
     {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (20 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 21/24] hmp/migration: add migrate_recover command Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-08 11:31   ` Dr. David Alan Gilbert
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause Peter Xu
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 24/24] migration/hmp: add migrate_pause command Peter Xu
  23 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Let's introduce a lock for that QEMUFile since we are going to operate
on it in multiple threads.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  6 ++++++
 migration/channel.c   |  3 ++-
 migration/migration.c | 22 +++++++++++++++++++---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 012bcd352b..f6b9e774f9 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -114,6 +114,12 @@ struct MigrationState
     QemuThread thread;
     QEMUBH *cleanup_bh;
     QEMUFile *to_dst_file;
+    /*
+     * Protects to_dst_file pointer.  We need to make sure we won't
+     * yield or hang during the critical section, since this lock will
+     * be used in OOB command handler.
+     */
+    QemuMutex qemu_file_lock;
 
     /* bytes already send at the beggining of current interation */
     uint64_t iteration_initial_bytes;
diff --git a/migration/channel.c b/migration/channel.c
index c5eaf0fa0e..716192bf75 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -74,8 +74,9 @@ void migration_channel_connect(MigrationState *s,
         } else {
             QEMUFile *f = qemu_fopen_channel_output(ioc);
 
+            qemu_mutex_lock(&s->qemu_file_lock);
             s->to_dst_file = f;
-
+            qemu_mutex_unlock(&s->qemu_file_lock);
         }
     }
     migrate_fd_connect(s, error);
diff --git a/migration/migration.c b/migration/migration.c
index 03d1fc7bc3..25f26052d2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1229,6 +1229,7 @@ static void migrate_fd_cleanup(void *opaque)
 
     if (s->to_dst_file) {
         Error *local_err = NULL;
+        QEMUFile *tmp;
 
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -1241,8 +1242,15 @@ static void migrate_fd_cleanup(void *opaque)
         if (multifd_save_cleanup(&local_err) != 0) {
             error_report_err(local_err);
         }
-        qemu_fclose(s->to_dst_file);
+        qemu_mutex_lock(&s->qemu_file_lock);
+        tmp = s->to_dst_file;
         s->to_dst_file = NULL;
+        qemu_mutex_unlock(&s->qemu_file_lock);
+        /*
+         * Close the file handle without the lock to make sure the
+         * critical section won't block for long.
+         */
+        qemu_fclose(tmp);
     }
 
     assert((s->state != MIGRATION_STATUS_ACTIVE) &&
@@ -2526,14 +2534,20 @@ static MigThrError postcopy_pause(MigrationState *s)
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     while (true) {
+        QEMUFile *file;
+
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
         /* Current channel is possibly broken. Release it. */
         assert(s->to_dst_file);
-        qemu_file_shutdown(s->to_dst_file);
-        qemu_fclose(s->to_dst_file);
+        qemu_mutex_lock(&s->qemu_file_lock);
+        file = s->to_dst_file;
         s->to_dst_file = NULL;
+        qemu_mutex_unlock(&s->qemu_file_lock);
+
+        qemu_file_shutdown(file);
+        qemu_fclose(file);
 
         error_report("Detected IO failure for postcopy. "
                      "Migration paused.");
@@ -3002,6 +3016,7 @@ static void migration_instance_finalize(Object *obj)
     MigrationParameters *params = &ms->parameters;
 
     qemu_mutex_destroy(&ms->error_mutex);
+    qemu_mutex_destroy(&ms->qemu_file_lock);
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
     qemu_sem_destroy(&ms->pause_sem);
@@ -3041,6 +3056,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
+    qemu_mutex_init(&ms->qemu_file_lock);
 }
 
 /*
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (21 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  2018-05-08 13:20   ` Dr. David Alan Gilbert
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 24/24] migration/hmp: add migrate_pause command Peter Xu
  23 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

It pauses an ongoing migration.  Currently it only supports postcopy.
Note that this command will work on either side of the migration.
Basically when we trigger this on one side, it'll interrupt the other
side as well since the other side will get notified on the disconnect
event.

However, it's still possible that the other side is not notified, for
example, when the network is totally broken, or due to some firewall
configuration changes.  In that case, we will also need to run the same
command on the other side so both sides will go into the paused state.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json   | 16 ++++++++++++++++
 migration/migration.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index a1c2c238ab..bf403d2dd2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1211,3 +1211,19 @@
 ##
 { 'command': 'migrate-recover', 'data': { 'uri': 'str' },
   'allow-oob': true }
+
+##
+# @migrate-pause:
+#
+# Pause a migration.  Currently it only supports postcopy.
+#
+# Returns: nothing.
+#
+# Example:
+#
+# -> { "execute": "migrate-pause" }
+# <- { "return": {} }
+#
+# Since: 2.12
+##
+{ 'command': 'migrate-pause', 'allow-oob': true }
diff --git a/migration/migration.c b/migration/migration.c
index 25f26052d2..03e4950976 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1502,6 +1502,35 @@ void qmp_migrate_recover(const char *uri, Error **errp)
     qemu_start_incoming_migration(uri, errp);
 }
 
+void qmp_migrate_pause(Error **errp)
+{
+    MigrationState *ms = migrate_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int ret;
+
+    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        /* Source side, during postcopy */
+        qemu_mutex_lock(&ms->qemu_file_lock);
+        ret = qemu_file_shutdown(ms->to_dst_file);
+        qemu_mutex_unlock(&ms->qemu_file_lock);
+        if (ret) {
+            error_setg(errp, "Failed to pause source migration");
+        }
+        return;
+    }
+
+    if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        ret = qemu_file_shutdown(mis->from_src_file);
+        if (ret) {
+            error_setg(errp, "Failed to pause destination migration");
+        }
+        return;
+    }
+
+    error_setg(errp, "migrate-pause is currently only supported "
+               "during postcopy-active state");
+}
+
 bool migration_is_blocked(Error **errp)
 {
     if (qemu_savevm_state_blocked(errp)) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v8 24/24] migration/hmp: add migrate_pause command
  2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
                   ` (22 preceding siblings ...)
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause Peter Xu
@ 2018-05-02 10:47 ` Peter Xu
  23 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2018-05-02 10:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli, Dr . David Alan Gilbert, peterx

Wrapper for QMP command "migrate-pause".

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hmp.h           |  1 +
 hmp.c           |  9 +++++++++
 hmp-commands.hx | 14 ++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/hmp.h b/hmp.h
index b6b56c8161..20f27439d3 100644
--- a/hmp.h
+++ b/hmp.h
@@ -69,6 +69,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
 void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
 void hmp_migrate_recover(Monitor *mon, const QDict *qdict);
+void hmp_migrate_pause(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
diff --git a/hmp.c b/hmp.c
index 4ac2126969..35a23d8cb8 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1527,6 +1527,15 @@ void hmp_migrate_recover(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_migrate_pause(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+
+    qmp_migrate_pause(&err);
+
+    hmp_handle_error(mon, &err);
+}
+
 /* Kept for backwards compatibility */
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8facba7b4d..579d501bd4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -971,6 +971,20 @@ STEXI
 @item migrate_recover @var{uri}
 @findex migrate_recover
 Continue a paused incoming postcopy migration using the @var{uri}.
+ETEXI
+
+    {
+        .name       = "migrate_pause",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Pause an ongoing migration (postcopy-only)",
+        .cmd        = hmp_migrate_pause,
+    },
+
+STEXI
+@item migrate_pause
+@findex migrate_pause
+Pause an ongoing migration.  Currently it only supports postcopy.
 ETEXI
 
     {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate for resume
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate " Peter Xu
@ 2018-05-08 10:54   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 32+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-08 10:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> After we updated the dirty bitmaps of ramblocks, we also need to update
> the critical fields in RAMState to make sure it is ready for a resume.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c        | 45 ++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events |  1 +
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 118460109b..2821e15cc2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2291,6 +2291,41 @@ static int ram_init_all(RAMState **rsp)
>      return 0;
>  }
>  
> +static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
> +{
> +    RAMBlock *block;
> +    uint64_t pages = 0;
> +
> +    /*
> +     * Postcopy is not using xbzrle/compression, so no need for that.
> +     * Also, since source are already halted, we don't need to care
> +     * about dirty page logging as well.
> +     */
> +
> +    RAMBLOCK_FOREACH(block) {
> +        pages += bitmap_count_one(block->bmap,
> +                                  block->used_length >> TARGET_PAGE_BITS);
> +    }
> +
> +    /* This may not be aligned with current bitmaps. Recalculate. */
> +    rs->migration_dirty_pages = pages;
> +
> +    rs->last_seen_block = NULL;
> +    rs->last_sent_block = NULL;
> +    rs->last_page = 0;
> +    rs->last_version = ram_list.version;
> +    /*
> +     * Disable the bulk stage, otherwise we'll resend the whole RAM no
> +     * matter what we have sent.
> +     */
> +    rs->ram_bulk_stage = false;
> +
> +    /* Update RAMState cache of output QEMUFile */
> +    rs->f = out;
> +
> +    trace_ram_state_resume_prepare(pages);
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -3286,8 +3321,16 @@ out:
>  static int ram_resume_prepare(MigrationState *s, void *opaque)
>  {
>      RAMState *rs = *(RAMState **)opaque;
> +    int ret;
>  
> -    return ram_dirty_bitmap_sync_all(s, rs);
> +    ret = ram_dirty_bitmap_sync_all(s, rs);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    ram_state_resume_prepare(rs, s->to_dst_file);
> +
> +    return 0;
>  }
>  
>  static SaveVMHandlers savevm_ram_handlers = {
> diff --git a/migration/trace-events b/migration/trace-events
> index 53243e17ec..46c5ca1dba 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -88,6 +88,7 @@ ram_dirty_bitmap_reload_complete(char *str) "%s"
>  ram_dirty_bitmap_sync_start(void) ""
>  ram_dirty_bitmap_sync_wait(void) ""
>  ram_dirty_bitmap_sync_complete(void) ""
> +ram_state_resume_prepare(uint64_t v) "%ld"

That should be "%" PRId64

But otherwise looks OK, so with that change:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file Peter Xu
@ 2018-05-08 11:31   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 32+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-08 11:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> Let's introduce a lock for that QEMUFile since we are going to operate
> on it in multiple threads.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.h |  6 ++++++
>  migration/channel.c   |  3 ++-
>  migration/migration.c | 22 +++++++++++++++++++---
>  3 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 012bcd352b..f6b9e774f9 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -114,6 +114,12 @@ struct MigrationState
>      QemuThread thread;
>      QEMUBH *cleanup_bh;
>      QEMUFile *to_dst_file;
> +    /*
> +     * Protects to_dst_file pointer.  We need to make sure we won't
> +     * yield or hang during the critical section, since this lock will
> +     * be used in OOB command handler.
> +     */
> +    QemuMutex qemu_file_lock;

So what are the rules on access to_dst_file?
You only seem to be taking the lock when closing or setting the
to_dst_file.  Which I think given the problem we were trying
to fix is OK, but it needs to be commented to say why it's safe.

Dave

>      /* bytes already send at the beggining of current interation */
>      uint64_t iteration_initial_bytes;
> diff --git a/migration/channel.c b/migration/channel.c
> index c5eaf0fa0e..716192bf75 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -74,8 +74,9 @@ void migration_channel_connect(MigrationState *s,
>          } else {
>              QEMUFile *f = qemu_fopen_channel_output(ioc);
>  
> +            qemu_mutex_lock(&s->qemu_file_lock);
>              s->to_dst_file = f;
> -
> +            qemu_mutex_unlock(&s->qemu_file_lock);
>          }
>      }
>      migrate_fd_connect(s, error);
> diff --git a/migration/migration.c b/migration/migration.c
> index 03d1fc7bc3..25f26052d2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1229,6 +1229,7 @@ static void migrate_fd_cleanup(void *opaque)
>  
>      if (s->to_dst_file) {
>          Error *local_err = NULL;
> +        QEMUFile *tmp;
>  
>          trace_migrate_fd_cleanup();
>          qemu_mutex_unlock_iothread();
> @@ -1241,8 +1242,15 @@ static void migrate_fd_cleanup(void *opaque)
>          if (multifd_save_cleanup(&local_err) != 0) {
>              error_report_err(local_err);
>          }
> -        qemu_fclose(s->to_dst_file);
> +        qemu_mutex_lock(&s->qemu_file_lock);
> +        tmp = s->to_dst_file;
>          s->to_dst_file = NULL;
> +        qemu_mutex_unlock(&s->qemu_file_lock);
> +        /*
> +         * Close the file handle without the lock to make sure the
> +         * critical section won't block for long.
> +         */
> +        qemu_fclose(tmp);
>      }
>  
>      assert((s->state != MIGRATION_STATUS_ACTIVE) &&
> @@ -2526,14 +2534,20 @@ static MigThrError postcopy_pause(MigrationState *s)
>      assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>  
>      while (true) {
> +        QEMUFile *file;
> +
>          migrate_set_state(&s->state, s->state,
>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
>          /* Current channel is possibly broken. Release it. */
>          assert(s->to_dst_file);
> -        qemu_file_shutdown(s->to_dst_file);
> -        qemu_fclose(s->to_dst_file);
> +        qemu_mutex_lock(&s->qemu_file_lock);
> +        file = s->to_dst_file;
>          s->to_dst_file = NULL;
> +        qemu_mutex_unlock(&s->qemu_file_lock);
> +
> +        qemu_file_shutdown(file);
> +        qemu_fclose(file);
>  
>          error_report("Detected IO failure for postcopy. "
>                       "Migration paused.");
> @@ -3002,6 +3016,7 @@ static void migration_instance_finalize(Object *obj)
>      MigrationParameters *params = &ms->parameters;
>  
>      qemu_mutex_destroy(&ms->error_mutex);
> +    qemu_mutex_destroy(&ms->qemu_file_lock);
>      g_free(params->tls_hostname);
>      g_free(params->tls_creds);
>      qemu_sem_destroy(&ms->pause_sem);
> @@ -3041,6 +3056,7 @@ static void migration_instance_init(Object *obj)
>      qemu_sem_init(&ms->postcopy_pause_sem, 0);
>      qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
>      qemu_sem_init(&ms->rp_state.rp_sem, 0);
> +    qemu_mutex_init(&ms->qemu_file_lock);
>  }
>  
>  /*
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause Peter Xu
@ 2018-05-08 13:20   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 32+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-08 13:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange, Juan Quintela,
	Andrea Arcangeli

* Peter Xu (peterx@redhat.com) wrote:
> It pauses an ongoing migration.  Currently it only supports postcopy.
> Note that this command will work on either side of the migration.
> Basically when we trigger this on one side, it'll interrupt the other
> side as well since the other side will get notified on the disconnect
> event.
> 
> However, it's still possible that the other side is not notified, for
> example, when the network is totally broken, or due to some firewall
> configuration changes.  In that case, we will also need to run the same
> command on the other side so both sides will go into the paused state.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(I don't really like the name 'pause' but I can't think of a better one)

> ---
>  qapi/migration.json   | 16 ++++++++++++++++
>  migration/migration.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index a1c2c238ab..bf403d2dd2 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1211,3 +1211,19 @@
>  ##
>  { 'command': 'migrate-recover', 'data': { 'uri': 'str' },
>    'allow-oob': true }
> +
> +##
> +# @migrate-pause:
> +#
> +# Pause a migration.  Currently it only supports postcopy.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "migrate-pause" }
> +# <- { "return": {} }
> +#
> +# Since: 2.12
> +##
> +{ 'command': 'migrate-pause', 'allow-oob': true }
> diff --git a/migration/migration.c b/migration/migration.c
> index 25f26052d2..03e4950976 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1502,6 +1502,35 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>      qemu_start_incoming_migration(uri, errp);
>  }
>  
> +void qmp_migrate_pause(Error **errp)
> +{
> +    MigrationState *ms = migrate_get_current();
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    int ret;
> +
> +    if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        /* Source side, during postcopy */
> +        qemu_mutex_lock(&ms->qemu_file_lock);
> +        ret = qemu_file_shutdown(ms->to_dst_file);
> +        qemu_mutex_unlock(&ms->qemu_file_lock);
> +        if (ret) {
> +            error_setg(errp, "Failed to pause source migration");
> +        }
> +        return;
> +    }
> +
> +    if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +        ret = qemu_file_shutdown(mis->from_src_file);
> +        if (ret) {
> +            error_setg(errp, "Failed to pause destination migration");
> +        }
> +        return;
> +    }
> +
> +    error_setg(errp, "migrate-pause is currently only supported "
> +               "during postcopy-active state");
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>      if (qemu_savevm_state_blocked(errp)) {
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context Peter Xu
@ 2018-05-08 14:36   ` Juan Quintela
  0 siblings, 0 replies; 32+ messages in thread
From: Juan Quintela @ 2018-05-08 14:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange,
	Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> The old incoming migration is running in main thread and default
> gcontext.  With the new qio_channel_add_watch_full() we can now let it
> run in the thread's own gcontext (if there is one).
>
> Currently this patch does nothing alone.  But when any of the incoming
> migration is run in another iothread (e.g., the upcoming migrate-recover
> command), this patch will bind the incoming logic to the iothread
> instead of the main thread (which may already get page faulted and
> hanged).
>
> RDMA is not considered for now since it's not even using the QIO watch
> framework at all.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state
  2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state Peter Xu
@ 2018-05-08 15:16   ` Juan Quintela
  2018-05-09  4:05     ` Peter Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Juan Quintela @ 2018-05-08 15:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange,
	Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> Introducing a new state "postcopy-paused", which can be used when the
> postcopy migration is paused. It is targeted for postcopy network
> failure recovery.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

> +# @postcopy-paused: during postcopy but paused. (since 2.12)
> +#

since 2.13
will fix it when I queue it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state
  2018-05-08 15:16   ` Juan Quintela
@ 2018-05-09  4:05     ` Peter Xu
  2018-05-09  6:59       ` Juan Quintela
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2018-05-09  4:05 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange,
	Andrea Arcangeli, Dr . David Alan Gilbert

On Tue, May 08, 2018 at 05:16:15PM +0200, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > Introducing a new state "postcopy-paused", which can be used when the
> > postcopy migration is paused. It is targeted for postcopy network
> > failure recovery.
> >
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> > +# @postcopy-paused: during postcopy but paused. (since 2.12)
> > +#
> 
> since 2.13
> will fix it when I queue it.

Thanks, Juan.

Do you want to queue only the first patches or the seris?  Do you want
me to repost to fix minor tweaks (e.g., what Dave pointed out in the
other thread, and s/2.12/2.13/ stuff)?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state
  2018-05-09  4:05     ` Peter Xu
@ 2018-05-09  6:59       ` Juan Quintela
  0 siblings, 0 replies; 32+ messages in thread
From: Juan Quintela @ 2018-05-09  6:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Alexey Perevalov, Daniel P . Berrange,
	Andrea Arcangeli, Dr . David Alan Gilbert

Peter Xu <peterx@redhat.com> wrote:
> On Tue, May 08, 2018 at 05:16:15PM +0200, Juan Quintela wrote:
>> Peter Xu <peterx@redhat.com> wrote:
>> > Introducing a new state "postcopy-paused", which can be used when the
>> > postcopy migration is paused. It is targeted for postcopy network
>> > failure recovery.
>> >
>> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> 
>> > +# @postcopy-paused: during postcopy but paused. (since 2.12)
>> > +#
>> 
>> since 2.13
>> will fix it when I queue it.
>
> Thanks, Juan.
>
> Do you want to queue only the first patches or the seris?  Do you want
> me to repost to fix minor tweaks (e.g., what Dave pointed out in the
> other thread, and s/2.12/2.13/ stuff)?

Trivial stuff I will fix it by hand.

Thanks, Juan.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-05-09  6:57 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 10:47 [Qemu-devel] [PATCH v8 00/24] Migration: postcopy failure recovery Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 01/24] migration: let incoming side use thread context Peter Xu
2018-05-08 14:36   ` Juan Quintela
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 02/24] migration: new postcopy-pause state Peter Xu
2018-05-08 15:16   ` Juan Quintela
2018-05-09  4:05     ` Peter Xu
2018-05-09  6:59       ` Juan Quintela
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 03/24] migration: implement "postcopy-pause" src logic Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 04/24] migration: allow dst vm pause on postcopy Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 05/24] migration: allow src return path to pause Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 06/24] migration: allow fault thread " Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 07/24] qmp: hmp: add migrate "resume" option Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 08/24] migration: rebuild channel on source Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 09/24] migration: new state "postcopy-recover" Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 10/24] migration: wakeup dst ram-load-thread for recover Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 11/24] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 12/24] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 13/24] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 14/24] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 15/24] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 16/24] migration: synchronize dirty bitmap for resume Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 17/24] migration: setup ramstate " Peter Xu
2018-05-08 10:54   ` Dr. David Alan Gilbert
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 18/24] migration: final handshake for the resume Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 19/24] migration: init dst in migration_object_init too Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 20/24] qmp/migration: new command migrate-recover Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 21/24] hmp/migration: add migrate_recover command Peter Xu
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 22/24] migration: introduce lock for to_dst_file Peter Xu
2018-05-08 11:31   ` Dr. David Alan Gilbert
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 23/24] migration/qmp: add command migrate-pause Peter Xu
2018-05-08 13:20   ` Dr. David Alan Gilbert
2018-05-02 10:47 ` [Qemu-devel] [PATCH v8 24/24] migration/hmp: add migrate_pause command Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.