All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: qemu-devel@nongnu.org
Cc: xiecl.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com,
	quintela@redhat.com, armbru@redhat.com, yunhong.jiang@intel.com,
	eddie.dong@intel.com, peter.huangpeng@huawei.com,
	dgilbert@redhat.com,
	zhanghailiang <zhang.zhanghailiang@huawei.com>,
	arei.gonglei@huawei.com, stefanha@redhat.com,
	amit.shah@redhat.com, zhangchen.fnst@cn.fujitsu.com,
	hongyang.yang@easystack.cn
Subject: [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol
Date: Mon, 22 Feb 2016 10:40:01 +0800	[thread overview]
Message-ID: <1456108832-24212-8-git-send-email-zhang.zhanghailiang@huawei.com> (raw)
In-Reply-To: <1456108832-24212-1-git-send-email-zhang.zhanghailiang@huawei.com>

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points:

                   Primary               Secondary
                                            initial work
'checkpoint-ready'    <-------------------- @

'checkpoint-request'  @ -------------------->
                                            Suspend (Only in hybrid mode)
'checkpoint-reply'    <-------------------- @
                      Suspend&Save state
'vmstate-send'        @ -------------------->
                      Send state            Receive state
'vmstate-received'    <-------------------- @
                      Release packets       Load state
'vmstate-load'        <-------------------- @
                      Resume                Resume (Only in hybrid mode)

                      Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
    one handshake(single direction) for low-latency.
    If more strict synchronization is required, a opposite direction
    sync-point should be added.
 3) Since sync-points are single direction, the remote side may
    go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
v14:
- Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
- Add Reviewd-by tag
v13:
- Refactor colo command related helper functions, use 'Error **errp' parameter
  instead of return value to indicate success or failure.
- Fix some other comments from Markus.

v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'
---
 migration/colo.c | 201 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json |  25 +++++++
 trace-events     |   2 +
 3 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 43e9890..c0ff088 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include <unistd.h>
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -34,22 +35,147 @@ bool migration_incoming_in_colo_state(void)
     return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
+                         Error **errp)
+{
+    int ret;
+
+    if (cmd >= COLO_MESSAGE__MAX) {
+        error_setg(errp, "%s: Invalid cmd", __func__);
+        return;
+    }
+    qemu_put_be32(f, cmd);
+    qemu_fflush(f);
+
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Can't put COLO command");
+    }
+    trace_colo_put_cmd(COLOMessage_lookup[cmd]);
+}
+
+static COLOMessage colo_get_cmd(QEMUFile *f, Error **errp)
+{
+    COLOMessage cmd;
+    int ret;
+
+    cmd = qemu_get_be32(f);
+    ret = qemu_file_get_error(f);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Can't get COLO command");
+        return cmd;
+    }
+    if (cmd >= COLO_MESSAGE__MAX) {
+        error_setg(errp, "%s: Invalid cmd", __func__);
+        return cmd;
+    }
+    trace_colo_get_cmd(COLOMessage_lookup[cmd]);
+    return cmd;
+}
+
+static void colo_get_check_cmd(QEMUFile *f, COLOMessage expect_cmd,
+                               Error **errp)
+{
+    COLOMessage cmd;
+    Error *local_err = NULL;
+
+    cmd = colo_get_cmd(f, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    if (cmd != expect_cmd) {
+        error_setg(errp, "Unexpected COLO command %d, expected %d",
+                          expect_cmd, cmd);
+    }
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+    Error *local_err = NULL;
+
+    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
+                 &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_CHECKPOINT_REPLY, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: suspend and save vm state to colo buffer */
+
+    colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: send vmstate to Secondary */
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_VMSTATE_RECEIVED, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_VMSTATE_LOADED, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    /* TODO: resume Primary */
+
+    return 0;
+out:
+    if (local_err) {
+        error_report_err(local_err);
+    }
+    return -EINVAL;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
+    Error *local_err = NULL;
+    int ret;
+
     s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
     if (!s->rp_state.from_dst_file) {
         error_report("Open QEMUFile from_dst_file failed");
         goto out;
     }
 
+    /*
+     * Wait for Secondary finish loading vm states and enter COLO
+     * restore.
+     */
+    colo_get_check_cmd(s->rp_state.from_dst_file,
+                       COLO_MESSAGE_CHECKPOINT_READY, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     qemu_mutex_lock_iothread();
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
 
-    /*TODO: COLO checkpoint savevm loop*/
+    while (s->state == MIGRATION_STATUS_COLO) {
+        /* start a colo checkpoint */
+        ret = colo_do_checkpoint_transaction(s);
+        if (ret < 0) {
+            goto out;
+        }
+    }
 
 out:
+    /* Throw the unreported error message after exited from loop */
+    if (local_err) {
+        error_report_err(local_err);
+    }
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
@@ -67,9 +193,33 @@ void migrate_start_colo_process(MigrationState *s)
     qemu_mutex_lock_iothread();
 }
 
+static void colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request,
+                                 Error **errp)
+{
+    COLOMessage cmd;
+    Error *local_err = NULL;
+
+    cmd = colo_get_cmd(f, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    switch (cmd) {
+    case COLO_MESSAGE_CHECKPOINT_REQUEST:
+        *checkpoint_request = 1;
+        break;
+    default:
+        *checkpoint_request = 0;
+        error_setg(errp, "Got unknown COLO command: %d", cmd);
+        break;
+    }
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    Error *local_err = NULL;
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -85,9 +235,56 @@ void *colo_process_incoming_thread(void *opaque)
     */
     qemu_file_set_blocking(mis->from_src_file, true);
 
-    /* TODO: COLO checkpoint restore loop */
+    colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
+                 &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    while (mis->state == MIGRATION_STATUS_COLO) {
+        int request;
+
+        colo_wait_handle_cmd(mis->from_src_file, &request, &local_err);
+        if (local_err) {
+            goto out;
+        }
+        assert(request);
+        /* FIXME: This is unnecessary for periodic checkpoint mode */
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        colo_get_check_cmd(mis->from_src_file,
+                           COLO_MESSAGE_VMSTATE_SEND, &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        /* TODO: read migration data into colo buffer */
+
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+
+        /* TODO: load vm state */
+
+        colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
+                     &local_err);
+        if (local_err) {
+            goto out;
+        }
+    }
 
 out:
+    /* Throw the unreported error message after exited from loop */
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index 26a1d37..29afbb9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -718,6 +718,31 @@
 { 'command': 'migrate-start-postcopy' }
 
 ##
+# @COLOMessage
+#
+# The message transmission between PVM and SVM
+#
+# @checkpoint-ready: SVM is ready for checkpointing
+#
+# @checkpoint-request: PVM tells SVM to prepare for new checkpointing
+#
+# @checkpoint-reply: SVM gets PVM's checkpoint request
+#
+# @vmstate-send: VM's state will be sent by PVM.
+#
+# @vmstate-size: The total size of VMstate.
+#
+# @vmstate-received: VM's state has been received by SVM.
+#
+# @vmstate-loaded: VM's state has been loaded by SVM.
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOMessage',
+  'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
+            'vmstate-send', 'vmstate-size', 'vmstate-received',
+            'vmstate-loaded' ] }
+
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/trace-events b/trace-events
index 53714db..97807cd 100644
--- a/trace-events
+++ b/trace-events
@@ -1605,6 +1605,8 @@ postcopy_ram_incoming_cleanup_join(void) ""
 
 # migration/colo.c
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
+colo_put_cmd(const char *msg) "Send '%s' cmd"
+colo_get_cmd(const char *msg) "Receive '%s' cmd"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1

  parent reply	other threads:[~2016-02-22  2:41 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-22  2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
2016-02-22  2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
2016-02-22  2:40 ` zhanghailiang [this message]
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
2016-02-26 12:52   ` Dr. David Alan Gilbert
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
2016-02-26 13:00   ` Dr. David Alan Gilbert
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-02-26 13:16   ` Dr. David Alan Gilbert
2016-02-27 10:03     ` Hailiang Zhang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
2016-02-22  2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process zhanghailiang
2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
2016-02-26 16:36   ` Dr. David Alan Gilbert
2016-02-27  7:54     ` Hailiang Zhang
2016-02-29  9:47       ` Dr. David Alan Gilbert
2016-02-29 12:16         ` Hailiang Zhang
2016-02-29 13:04           ` Dr. David Alan Gilbert
2016-03-01 12:25           ` Dr. David Alan Gilbert
2016-03-02 13:01             ` Hailiang Zhang
2016-03-03 20:13               ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456108832-24212-8-git-send-email-zhang.zhanghailiang@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=xiecl.fnst@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.