From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45779) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zvgxb-0000vh-Dx for qemu-devel@nongnu.org; Mon, 09 Nov 2015 02:33:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZvgxY-0004Dz-4x for qemu-devel@nongnu.org; Mon, 09 Nov 2015 02:33:51 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:24882) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZvgxW-0004Cy-Oa for qemu-devel@nongnu.org; Mon, 09 Nov 2015 02:33:48 -0500 References: <1446551816-15768-1-git-send-email-zhang.zhanghailiang@huawei.com> <1446551816-15768-10-git-send-email-zhang.zhanghailiang@huawei.com> <20151106182632.GH4267@work-vm> <56404270.90606@huawei.com> From: zhanghailiang Message-ID: <56404C39.7080508@huawei.com> Date: Mon, 9 Nov 2015 15:33:13 +0800 MIME-Version: 1.0 In-Reply-To: <56404270.90606@huawei.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v10 09/38] COLO: Implement colo checkpoint protocol List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com On 2015/11/9 14:51, zhanghailiang wrote: > On 2015/11/7 2:26, Dr. David Alan Gilbert wrote: >> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >>> We need communications protocol of user-defined to control the checkpoint >>> process. >>> >>> The new checkpoint request is started by Primary VM, and the interactive process >>> like below: >>> Checkpoint synchronizing points, >>> >>> Primary Secondary >>> 'checkpoint-request' @ -----------------------------> >>> Suspend (In hybrid mode) >>> 'checkpoint-reply' <------------------------------ @ >>> Suspend&Save state >> >> Why is this initial pair necessary? Can't you just start with the vmstate-send >> and save the extra request pair/round trip? On the 2nd checkpoint we know >> the SVM already received the previous checkpoint because we got it's first >> vmstate-load. >> > Er, i have made some mistake, before 'checkpoint-request' command, we have a 'checkpoint-ready' communication, which is sent by SVM to PVM, to tell PVM that SVM is ready for checkpoint. We do the initial work before send 'checkpoint-ready' in Secondary. So, yes, you are right, this 'checkpoint-reply' is unnecessary for simple checkpoint mode. I will remove it or maybe just add more comment about this, and add 'checkpoint-ready' in the comment. > Yes, we can certainly drop this handshaking in simple checkpoint mode. > But we still need to do some initial work (preparing work) in simple checkpoint mode. > And i'm not sure if this initial work is time-wasting or not. We choose to do this preparing work > before send the 'checkpoint-reply' to reducing VM's STOP time as possible as we can. > >> I guess in full-COLO (rather than simple checkpoint) you can get the secondary to >> do some of it's stopping/cleanup after it sends the checkpoint-reply > > Actually, we do it before it sends 'checkpoint-reply' :) > >> but before vmstate-send, so you can hide some of the time. >> > >> (Perhaps add a comment to explain) >> > > OK, I will add more comment about this. > >>> 'vmstate-send' @ -----------------------------> >>> Send state Receive state >>> 'vmstate-received' <------------------------------ @ >>> Release packets Load state >>> 'vmstate-load' <------------------------------ @ >>> Resume Resume (In hybrid mode) >>> >>> Start Comparing (In hybrid mode) >>> NOTE: >>> 1) '@' who sends the message >>> 2) Every sync-point is synchronized by two sides with only >>> one handshake(single direction) for low-latency. >>> If more strict synchronization is required, a opposite direction >>> sync-point should be added. >>> 3) Since sync-points are single direction, the remote side may >>> go forward a lot when this side just receives the sync-point. >>> 4) For now, we only support 'periodic' checkpoint, for which >>> the Secondary VM is not running, later we will support 'hybrid' mode. >>> >>> Signed-off-by: zhanghailiang >>> Signed-off-by: Li Zhijian >>> Signed-off-by: Gonglei >>> Cc: Eric Blake >>> --- >>> v10: >>> - Rename enum COLOCmd to COLOCommand (Eric's suggestion). >>> - Remove unused 'ram-steal' >>> --- >>> migration/colo.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>> qapi-schema.json | 27 ++++++++ >>> trace-events | 2 + >>> 3 files changed, 219 insertions(+), 2 deletions(-) >>> >>> diff --git a/migration/colo.c b/migration/colo.c >>> index 4fdf3a9..2510762 100644 >>> --- a/migration/colo.c >>> +++ b/migration/colo.c >>> @@ -10,10 +10,12 @@ >>> * later. See the COPYING file in the top-level directory. >>> */ >>> >>> +#include >>> #include "sysemu/sysemu.h" >>> #include "migration/colo.h" >>> #include "trace.h" >>> #include "qemu/error-report.h" >>> +#include "qemu/sockets.h" >>> >>> bool colo_supported(void) >>> { >>> @@ -34,6 +36,103 @@ bool migration_incoming_in_colo_state(void) >>> return mis && (mis->state == MIGRATION_STATUS_COLO); >>> } >>> >>> +/* colo checkpoint control helper */ >>> +static int colo_ctl_put(QEMUFile *f, uint32_t cmd, uint64_t value) >>> +{ >>> + int ret = 0; >>> + >>> + qemu_put_be32(f, cmd); >>> + qemu_put_be64(f, value); >>> + qemu_fflush(f); >>> + >>> + ret = qemu_file_get_error(f); >>> + trace_colo_ctl_put(COLOCommand_lookup[cmd], value); >>> + >>> + return ret; >>> +} >>> + >>> +static int colo_ctl_get_cmd(QEMUFile *f, uint32_t *cmd) >>> +{ >>> + int ret = 0; >>> + >>> + *cmd = qemu_get_be32(f); >>> + ret = qemu_file_get_error(f); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + if (*cmd >= COLO_COMMAND_MAX) { >>> + error_report("Invalid colo command, get cmd:%d", *cmd); >>> + return -EINVAL; >>> + } >>> + trace_colo_ctl_get(COLOCommand_lookup[*cmd]); >>> + >>> + return 0; >>> +} >>> + >>> +static int colo_ctl_get(QEMUFile *f, uint32_t require) >>> +{ >>> + int ret; >>> + uint32_t cmd; >>> + uint64_t value; >>> + >>> + ret = colo_ctl_get_cmd(f, &cmd); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + if (cmd != require) { >>> + error_report("Unexpect colo command, expect:%d, but get cmd:%d", >>> + require, cmd); >>> + return -EINVAL; >>> + } >>> + >>> + value = qemu_get_be64(f); >>> + ret = qemu_file_get_error(f); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + return value; >>> +} >> >> Should the return type be uint64_t since you're returning value? >> But then you're also using it to return an error code; so perhaps >> it might be better to have a uint64_t *value parameter to >> return the value separately; or define the range that the value >> can actually take. >> > > Good catch. Use parameter to return 'value' is a good idea, > Will fix it in next version. > > Thanks, > zhanghailiang > >> (Also very minor typo: 'got' not 'get' in a few errors) >> > >>> +static int colo_do_checkpoint_transaction(MigrationState *s) >>> +{ >>> + int ret; >>> + >>> + ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_CHECKPOINT_REPLY); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + /* TODO: suspend and save vm state to colo buffer */ >>> + >>> + ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + /* TODO: send vmstate to Secondary */ >>> + >>> + ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_VMSTATE_RECEIVED); >>> + if (ret < 0) { >>> + goto out; >>> + }> + >>> + ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_VMSTATE_LOADED); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + /* TODO: resume Primary */ >>> + >>> +out: >>> + return ret; >>> +} >>> + >>> static void colo_process_checkpoint(MigrationState *s) >>> { >>> int fd, ret = 0; >>> @@ -51,12 +150,27 @@ static void colo_process_checkpoint(MigrationState *s) >>> goto out; >>> } >>> >>> + /* >>> + * Wait for Secondary finish loading vm states and enter COLO >>> + * restore. >>> + */ >>> + ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_CHECKPOINT_READY); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> qemu_mutex_lock_iothread(); >>> vm_start(); >>> qemu_mutex_unlock_iothread(); >>> trace_colo_vm_state_change("stop", "run"); >>> >>> - /*TODO: COLO checkpoint savevm loop*/ >>> + while (s->state == MIGRATION_STATUS_COLO) { >>> + /* start a colo checkpoint */ >>> + ret = colo_do_checkpoint_transaction(s); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + } >>> >>> out: >>> if (ret < 0) { >>> @@ -79,6 +193,39 @@ void migrate_start_colo_process(MigrationState *s) >>> qemu_mutex_lock_iothread(); >>> } >>> >>> +/* >>> + * return: >>> + * 0: start a checkpoint >>> + * -1: some error happened, exit colo restore >>> + */ >>> +static int colo_wait_handle_cmd(QEMUFile *f, int *checkpoint_request) >>> +{ >>> + int ret; >>> + uint32_t cmd; >>> + uint64_t value; >>> + >>> + ret = colo_ctl_get_cmd(f, &cmd); >>> + if (ret < 0) { >>> + /* do failover ? */ >>> + return ret; >>> + } >>> + /* Fix me: this value should be 0, which is not so good, >>> + * should be used for checking ? >>> + */ >>> + value = qemu_get_be64(f); >>> + if (value != 0) { >> >> should output error message as well? >> >>> + return -EINVAL; >>> + } >>> + >>> + switch (cmd) { >>> + case COLO_COMMAND_CHECKPOINT_REQUEST: >>> + *checkpoint_request = 1; >>> + return 0; >>> + default: >>> + return -EINVAL; >>> + } >>> +} >>> + >>> void *colo_process_incoming_thread(void *opaque) >>> { >>> MigrationIncomingState *mis = opaque; >>> @@ -98,7 +245,48 @@ void *colo_process_incoming_thread(void *opaque) >>> error_report("colo incoming thread: Open QEMUFile to_src_file failed"); >>> goto out; >>> } >>> - /* TODO: COLO checkpoint restore loop */ >>> + >>> + ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_READY, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + while (mis->state == MIGRATION_STATUS_COLO) { >>> + int request = 0; >>> + int ret = colo_wait_handle_cmd(mis->from_src_file, &request); >>> + >>> + if (ret < 0) { >>> + break; >>> + } else { >>> + if (!request) { >>> + continue; >>> + } >>> + } >>> + >>> + ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_CHECKPOINT_REPLY, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + ret = colo_ctl_get(mis->from_src_file, COLO_COMMAND_VMSTATE_SEND); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + /* TODO: read migration data into colo buffer */ >>> + >>> + ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_VMSTATE_RECEIVED, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + >>> + /* TODO: load vm state */ >>> + >>> + ret = colo_ctl_put(mis->to_src_file, COLO_COMMAND_VMSTATE_LOADED, 0); >>> + if (ret < 0) { >>> + goto out; >>> + } >>> + } >>> >>> out: >>> if (ret < 0) { >>> diff --git a/qapi-schema.json b/qapi-schema.json >>> index 22251ec..5c4fe6d 100644 >>> --- a/qapi-schema.json >>> +++ b/qapi-schema.json >>> @@ -702,6 +702,33 @@ >>> '*tls-port': 'int', '*cert-subject': 'str' } } >>> >>> ## >>> +# @COLOCommand >>> +# >>> +# The colo command >>> +# >>> +# @invalid: unknown command >>> +# >>> +# @checkpoint-ready: SVM is ready for checkpointing >>> +# >>> +# @checkpoint-request: PVM tells SVM to prepare for new checkpointing >>> +# >>> +# @checkpoint-reply: SVM gets PVM's checkpoint request >>> +# >>> +# @vmstate-send: VM's state will be sent by PVM. >>> +# >>> +# @vmstate-size: The total size of VMstate. >>> +# >>> +# @vmstate-received: VM's state has been received by SVM >>> +# >>> +# @vmstate-loaded: VM's state has been loaded by SVM >>> +# >>> +# Since: 2.5 >>> +## >>> +{ 'enum': 'COLOCommand', >>> + 'data': [ 'invalid', 'checkpoint-ready', 'checkpoint-request', >>> + 'checkpoint-reply', 'vmstate-send', 'vmstate-size', >>> + 'vmstate-received', 'vmstate-loaded' ] } >>> + >>> # @MouseInfo: >>> # >>> # Information about a mouse device. >>> diff --git a/trace-events b/trace-events >>> index 9cd6391..ee4679c 100644 >>> --- a/trace-events >>> +++ b/trace-events >>> @@ -1499,6 +1499,8 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) "" >>> >>> # migration/colo.c >>> colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'" >>> +colo_ctl_put(const char *msg, uint64_t value) "Send '%s' cmd, value: %" PRIu64"" >>> +colo_ctl_get(const char *msg) "Receive '%s' cmd" >>> >>> # kvm-all.c >>> kvm_ioctl(int type, void *arg) "type 0x%x, arg %p" >>> -- >>> 1.8.3.1 >>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >> . >> >