All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] Live migration for VMs with QEMU backed local storage
@ 2017-06-23  7:42 Bruno Alvisio
  2017-06-23  8:03 ` Roger Pau Monné
  2017-06-29 11:58 ` Wei Liu
  0 siblings, 2 replies; 12+ messages in thread
From: Bruno Alvisio @ 2017-06-23  7:42 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, dave

This patch is the first attempt on adding live migration of instances with local
storage to Xen. This patch just handles very restricted case of fully
virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
A new "-l" option is introduced to "xl migrate" command. If provided, the local
disk should be mirrored during the migration process. If the option is set,
during the VM creation a qemu NBD server is started on the destination. After
the instance is suspended on the source, the QMP "disk-mirror" command is issued
to mirror the disk to destination. Once the mirroring job is complete, the
migration process continues as before. Finally, the NBD server is stopped after
the instance is successfully resumed on the destination node.

A major problem with this patch is that the mirroring of the disk is performed
only after the memory stream is completed and the VM is suspended on the source;
thus the instance is frozen for a long period of time. The reason this happens
is that the QEMU process (needed for the disk mirroring) is started on the
destination node only after the memory copying is completed. One possibility I
was considering to solve this issue (if it is decided that this capability
should be used): Could a "helper" QEMU process be started on the destination
node at the beginning of the migration sequence with the sole purpose of
handling the disk mirroring and kill it at the end of the migration sequence? 

From the suggestions given by Konrad Wilk and Paul Durrant the preferred
approach would be to handle the mirroring of disks by QEMU instead of directly
being handled directly by, for example, blkback. It would be very helpful for me
to have a mental map of all the scenarios that can be encountered regarding
local disk (Xen could start supporting live migration of certain types of local
disks). This are the ones I can think of:
- Fully Virtualized HVM: QEMU emulation
- blkback
- blktap / blktap2 


I have included TODOs in the code. I am sending this patch as is because I first
wanted to get an initial feedback if this is the path the should be pursued. Any
suggestions and ideas on this patch or on how to make a more generic solution
would be really appreciated.

Signed-off-by: Bruno Alvisio <bruno.alvisio@gmail.com>

---
 tools/libxl/libxl.h                  |  16 ++++-
 tools/libxl/libxl_create.c           |  87 +++++++++++++++++++++++++-
 tools/libxl/libxl_internal.h         |  16 +++++
 tools/libxl/libxl_qmp.c              | 115 ++++++++++++++++++++++++++++++++++-
 tools/ocaml/libs/xl/xenlight_stubs.c |   2 +-
 tools/xl/xl.h                        |   1 +
 tools/xl/xl_migrate.c                |  79 +++++++++++++++++++++---
 tools/xl/xl_vmcontrol.c              |   2 +-
 8 files changed, 303 insertions(+), 15 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index cf8687a..81fb2dc 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1294,6 +1294,15 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
                     xentoollog_logger *lg);
 int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
 
+int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__nbd_server_stop(libxl_ctx *ctx, int domid) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+
 /* domain related functions */
 
 /* If the result is ERROR_ABORTED, the domain may or may not exist
@@ -1307,7 +1316,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
-                                int send_back_fd,
+                                int send_back_fd, int copy_local_disks,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1348,7 +1357,7 @@ static inline int libxl_domain_create_restore_0x040400(
     LIBXL_EXTERNAL_CALLERS_ONLY
 {
     return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
-                                       -1, params, ao_how, aop_console_how);
+                                       -1, 0, params, ao_how, aop_console_how);
 }
 
 #define libxl_domain_create_restore libxl_domain_create_restore_0x040400
@@ -1387,6 +1396,9 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
 
+#define QEMU_DRIVE_MIRROR_PORT "11000"
+#define QEMU_DRIVE_MIRROR_DEVICE "ide0-hd0"
+
 /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
  *   If this parameter is true, use co-operative resume. The guest
  *   must support this.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bffbc45..ef99f03 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -27,6 +27,40 @@
 
 #include <xen-xsm/flask/flask.h>
 
+//TODO: These functions were created to be able to call qmp commands from xl.
+//TODO: These functions should be removed since they won't be called from xl.
+int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_drive_mirror(gc, domid, device, target, format);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_query_block_jobs(gc, domid, is_ready);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__nbd_server_stop(libxl_ctx *ctx, int domid){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_nbd_server_stop(gc, domid);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_query_block(gc, domid, device_names);
+    GC_FREE;
+    return rc;
+}
+
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
@@ -1355,6 +1389,51 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
         else
             libxl__spawn_local_dm(egc, &dcs->sdss.dm);
 
+
+        if(dcs->restore_fd >= 0 && dcs->copy_local_disks) {
+             /*
+              * Start and add the NBD server
+              * Host is set it to "::" for now
+              * Port we hard code a port for now
+
+              * This code just handles the case when -M pc is used.
+              * (The config xen_platform_pci = 0)
+              *
+              * Current implementation performs the disk mirroring after the
+              * VM in the source has been suspended. Thus, the VM is frozen
+              * for a long period of time.
+              * Consider doing the mirroring of the drive before the memory
+              * stream is performed.
+              * Consider a solution that handles multiple types of VM configurations
+              *
+              * TODO: Current implementation only works with upstream qemu
+              * TODO: consider the case when qemu-xen-traditional is used.
+              * TODO: Check and copy only those disks which are local
+              * TODO: Assign port dynamically
+              */
+
+            fprintf(stderr, "Starting NBD Server\n");
+            ret = libxl__qmp_nbd_server_start(gc, domid, "::", QEMU_DRIVE_MIRROR_PORT);
+            if (ret) {
+                ret = ERROR_FAIL;
+                fprintf(stderr, "Failed to start NBD Server\n");
+                goto skip_nbd;
+            }else{
+                fprintf(stderr, "Started NBD Server Successfully\n");
+            }
+
+            ret = libxl__qmp_nbd_server_add(gc, domid, QEMU_DRIVE_MIRROR_DEVICE);
+
+            if (ret) {
+                ret = ERROR_FAIL;
+                fprintf(stderr, "Failed to add NBD Server\n");
+                goto skip_nbd;
+            } else {
+                fprintf(stderr, "NBD Add Successful\n");
+            }
+        }
+
+skip_nbd:
         /*
          * Handle the domain's (and the related stubdomain's) access to
          * the VGA framebuffer.
@@ -1602,6 +1681,7 @@ static void domain_create_cb(libxl__egc *egc,
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd, int send_back_fd,
+                            int copy_local_disks,
                             const libxl_domain_restore_params *params,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
@@ -1617,6 +1697,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
     cdcs->dcs.send_back_fd = send_back_fd;
+    cdcs->dcs.copy_local_disks = copy_local_disks;
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -1845,13 +1926,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0, NULL,
                             ao_how, aop_console_how);
 }
 
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
-                                int send_back_fd,
+                                int send_back_fd, int copy_local_disks,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1863,7 +1944,7 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
-                            params, ao_how, aop_console_how);
+                            copy_local_disks, params, ao_how, aop_console_how);
 }
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index afe6652..938481a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1835,6 +1835,21 @@ _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
 /* Start replication */
 _hidden int libxl__qmp_start_replication(libxl__gc *gc, int domid,
                                          bool primary);
+
+/* Add a disk to NBD server */
+ _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
+                                       const char *disk);
+
+/* Mirror disk drive */
+_hidden int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device,
+                                    const char* target, const char* format);
+
+/* Query block devices */
+_hidden int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names);
+
+/* Query existing block jobs*/
+_hidden int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready);
+
 /* Get replication error that occurs when the vm is running */
 _hidden int libxl__qmp_query_xen_replication_status(libxl__gc *gc, int domid);
 /* Do checkpoint */
@@ -3695,6 +3710,7 @@ struct libxl__domain_create_state {
     int restore_fd, libxc_fd;
     int restore_fdfl; /* original flags of restore_fd */
     int send_back_fd;
+    int copy_local_disks;
     libxl_domain_restore_params restore_params;
     uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index eab993a..cbfcf77 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -347,7 +347,9 @@ static libxl__qmp_handler *qmp_init_handler(libxl__gc *gc, uint32_t domid)
     }
     qmp->ctx = CTX;
     qmp->domid = domid;
-    qmp->timeout = 5;
+    //TODO: Changed default timeout because drive-mirror command takes a long
+    //TODO: to return. Consider timeout to be passed as param.
+    qmp->timeout = 600;
 
     LIBXL_STAILQ_INIT(&qmp->callback_list);
 
@@ -1069,6 +1071,117 @@ int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid, const char *disk)
     return qmp_run_command(gc, domid, "nbd-server-add", args, NULL, NULL);
 }
 
+int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device, const char* target, const char* format)
+{
+    libxl__json_object *args = NULL;
+    //TODO: Allow method to receive "sync", "speed", "mode", "granurality", "buf-size"
+    qmp_parameters_add_string(gc, &args, "device", device);
+    qmp_parameters_add_string(gc, &args, "target", target);
+    qmp_parameters_add_string(gc, &args, "sync", "full");
+    qmp_parameters_add_string(gc, &args, "format", format);
+    qmp_parameters_add_string(gc, &args, "mode", "existing");
+    qmp_parameters_add_integer(gc, &args, "granularity", 0);
+    qmp_parameters_add_integer(gc, &args, "buf-size", 0);
+
+    return qmp_run_command(gc, domid, "drive-mirror", args, NULL, NULL);
+}
+
+static int query_block_callback(libxl__qmp_handler *qmp,
+                               const libxl__json_object *response,
+                               void *opaque)
+{
+    const libxl__json_object *blockinfo = NULL;
+    GC_INIT(qmp->ctx);
+    int i, rc = -1;
+
+    for (i = 0; (blockinfo = libxl__json_array_get(response, i)); i++) {
+        const libxl__json_object *d;
+        const char* device_name;
+        d = libxl__json_map_get("device", blockinfo, JSON_STRING);
+        if(!d){
+            goto out;
+        }
+        device_name = libxl__json_object_get_string(d);
+    }
+
+    rc = 0;
+out:
+    GC_FREE;
+    return rc;
+}
+
+static int query_block_jobs_callback(libxl__qmp_handler *qmp,
+                               const libxl__json_object *response,
+                               void *opaque)
+{
+    const libxl__json_object *blockjobinfo = NULL;
+    GC_INIT(qmp->ctx);
+    int i, rc = -1;
+    bool empty = true;
+
+    for (i = 0; (blockjobinfo = libxl__json_array_get(response, i)); i++) {
+        empty = false;
+        const char *bjtype;
+        const char *bjdevice;
+        unsigned int bjlen;
+        unsigned int bjoffset;
+        bool bjbusy;
+        bool bjpaused;
+        const char *bjiostatus;
+        bool bjready;
+
+        const libxl__json_object *type = NULL;
+        const libxl__json_object *device = NULL;
+        const libxl__json_object *len = NULL;
+        const libxl__json_object *offset = NULL;
+        const libxl__json_object *busy = NULL;
+        const libxl__json_object *paused = NULL;
+        const libxl__json_object *io_status = NULL;
+        const libxl__json_object *ready = NULL;
+
+        type = libxl__json_map_get("type", blockjobinfo, JSON_STRING);
+        device = libxl__json_map_get("device", blockjobinfo, JSON_STRING);
+        len = libxl__json_map_get("len", blockjobinfo, JSON_INTEGER);
+        offset = libxl__json_map_get("offset", blockjobinfo, JSON_INTEGER);
+        busy = libxl__json_map_get("busy", blockjobinfo, JSON_BOOL);
+        paused = libxl__json_map_get("type", blockjobinfo, JSON_BOOL);
+        io_status = libxl__json_map_get("io-status", blockjobinfo, JSON_STRING);
+        ready = libxl__json_map_get("ready", blockjobinfo, JSON_BOOL);
+
+        bjtype = libxl__json_object_get_string(type);
+        bjdevice = libxl__json_object_get_string(device);
+        bjlen = libxl__json_object_get_integer(len);
+        bjoffset = libxl__json_object_get_integer(offset);
+        bjbusy = libxl__json_object_get_bool(len);
+        bjpaused = libxl__json_object_get_bool(paused);
+        bjiostatus = libxl__json_object_get_string(io_status);
+        bjready = libxl__json_object_get_bool(ready);
+
+        bool *is_ready = opaque;
+        *is_ready = bjready;
+    }
+
+    if(empty){
+        bool *is_ready = opaque;
+        *is_ready = true;
+    }
+
+    rc = 0;
+
+    GC_FREE;
+    return rc;
+}
+
+int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names)
+{
+    return qmp_run_command(gc, domid, "query-block", NULL, query_block_callback, device_names);
+}
+
+int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready)
+{
+    return qmp_run_command(gc, domid, "query-block-jobs", NULL, query_block_jobs_callback, is_ready);
+}
+
 int libxl__qmp_start_replication(libxl__gc *gc, int domid, bool primary)
 {
     libxl__json_object *args = NULL;
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
index 98b52b9..8791175 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -538,7 +538,7 @@ value stub_libxl_domain_create_restore(value ctx, value domain_config, value par
 
 	caml_enter_blocking_section();
 	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd,
-		-1, &c_params, ao_how, NULL);
+		-1, 0, &c_params, ao_how, NULL);
 	caml_leave_blocking_section();
 
 	free(ao_how);
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index aa95b77..dcdb80d 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -35,6 +35,7 @@ struct domain_create {
     int daemonize;
     int monitor; /* handle guest reboots etc */
     int paused;
+    int copy_local_disks;
     int dryrun;
     int quiet;
     int vnc;
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 1f0e87d..62b78ea 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -177,7 +177,8 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 }
 
 static void migrate_domain(uint32_t domid, const char *rune, int debug,
-                           const char *override_config_file)
+                           const char *override_config_file,
+                           int copy_local_disks, const char* hostname)
 {
     pid_t child = -1;
     int rc;
@@ -186,6 +187,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
     char rc_buf;
     uint8_t *config_data;
     int config_len, flags = LIBXL_SUSPEND_LIVE;
+    char* target;
 
     save_domain_core_begin(domid, override_config_file,
                            &config_data, &config_len);
@@ -232,6 +234,47 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 
     fprintf(stderr, "migration sender: Target has acknowledged transfer.\n");
 
+
+    /*
+     * If the -l was provided, the drive-mirror job is started.
+     * TODO: Move the following code as part of the domain_suspend
+     * TODO: The port should be sent by the destination.
+    */
+    if(copy_local_disks) {
+        fprintf(stderr, "Starting mirror-drive of device %s\n", QEMU_DRIVE_MIRROR_DEVICE);
+        xasprintf(&target, "nbd:%s:%s:exportname=%s", hostname, QEMU_DRIVE_MIRROR_PORT, QEMU_DRIVE_MIRROR_DEVICE);
+        rc = libxl__drive_mirror(ctx, domid, QEMU_DRIVE_MIRROR_DEVICE, target, "raw");
+        if (!rc) {
+            fprintf(stderr, "Drive mirror command returned successfully\n");
+        }else{
+            fprintf(stderr, "Sending drive mirror command failed\n");
+            goto cont;
+        }
+
+        /*
+         * Query job status until it is ready
+         * TODO: This code is just an inefficient busy wait. QMP sends an
+         * TODO: asynchronous message when mirroring job is completed. Consider
+         * TODO: adding the capability to handle asynchronous QMP messages (already done?)
+         */
+        bool job_is_ready = false;
+        while(!job_is_ready) {
+            fprintf(stderr, "Checking for drive-mirror job");
+            rc = libxl__query_block_jobs(ctx, domid, &job_is_ready);
+            if(rc){
+                fprintf(stderr, "Checking block job failed\n");
+                goto cont;
+            }else{
+                fprintf(stderr, "Checking block job succeeded\n");
+            }
+            if(!job_is_ready){
+                fprintf(stderr, "Sleeping 5 sec\n");
+                sleep(5);
+            }
+        }
+    }
+cont:
+
     if (common_domname) {
         xasprintf(&away_domname, "%s--migratedaway", common_domname);
         rc = libxl_domain_rename(ctx, domid, common_domname, away_domname);
@@ -316,7 +359,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int pause_after_migration,
+                            int pause_after_migration, int copy_local_disks,
                             int send_fd, int recv_fd,
                             libxl_checkpointed_stream checkpointed,
                             char *colo_proxy_script,
@@ -343,6 +386,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.daemonize = daemonize;
     dom_info.monitor = monitor;
     dom_info.paused = 1;
+    dom_info.copy_local_disks = copy_local_disks;
     dom_info.migrate_fd = recv_fd;
     dom_info.send_back_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
@@ -423,6 +467,14 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 
     fprintf(stderr, "migration target: Got permission, starting domain.\n");
 
+    fprintf(stderr, "Stopping NBD server\n");
+    rc = libxl__nbd_server_stop(ctx, domid);
+    if (rc){
+        fprintf(stderr, "Failed to stop NBD server\n");
+    }else{
+        fprintf(stderr, "Stopped NBD server successfully\n");
+    }
+
     if (migration_domname) {
         rc = libxl_domain_rename(ctx, domid, migration_domname, common_domname);
         if (rc) goto perhaps_destroy_notify_rc;
@@ -478,6 +530,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 int main_migrate_receive(int argc, char **argv)
 {
     int debug = 0, daemonize = 1, monitor = 1, pause_after_migration = 0;
+    int copy_local_disks = 0;
     libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
     bool userspace_colo_proxy = false;
@@ -490,7 +543,7 @@ int main_migrate_receive(int argc, char **argv)
         COMMON_LONG_OPTS
     };
 
-    SWITCH_FOREACH_OPT(opt, "Fedrp", opts, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrpl", opts, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -516,6 +569,9 @@ int main_migrate_receive(int argc, char **argv)
     case 'p':
         pause_after_migration = 1;
         break;
+    case 'l':
+        copy_local_disks = 1;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -523,7 +579,7 @@ int main_migrate_receive(int argc, char **argv)
         return EXIT_FAILURE;
     }
     migrate_receive(debug, daemonize, monitor, pause_after_migration,
-                    STDOUT_FILENO, STDIN_FILENO,
+                    copy_local_disks, STDOUT_FILENO, STDIN_FILENO,
                     checkpointed, script, userspace_colo_proxy);
 
     return EXIT_SUCCESS;
@@ -536,14 +592,16 @@ int main_migrate(int argc, char **argv)
     const char *ssh_command = "ssh";
     char *rune = NULL;
     char *host;
+    char *hostname;
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
+    int copy_local_disks = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
 
-    SWITCH_FOREACH_OPT(opt, "FC:s:ep", opts, "migrate", 2) {
+    SWITCH_FOREACH_OPT(opt, "FC:s:epl", opts, "migrate", 2) {
     case 'C':
         config_filename = optarg;
         break;
@@ -560,6 +618,9 @@ int main_migrate(int argc, char **argv)
     case 'p':
         pause_after_migration = 1;
         break;
+    case 'l':
+        copy_local_disks = 1;
+        break;
     case 0x100: /* --debug */
         debug = 1;
         break;
@@ -571,6 +632,9 @@ int main_migrate(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    hostname = strchr(host, '@');
+    hostname++;
+
     bool pass_tty_arg = progress_use_cr || (isatty(2) > 0);
 
     if (!ssh_command[0]) {
@@ -587,16 +651,17 @@ int main_migrate(int argc, char **argv)
         } else {
             verbose_len = (minmsglevel_default - minmsglevel) + 2;
         }
-        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s",
+        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s%s",
                   ssh_command, host,
                   pass_tty_arg ? " -t" : "",
                   verbose_len, verbose_buf,
                   daemonize ? "" : " -e",
                   debug ? " -d" : "",
+                  copy_local_disks ? " -l" : "",
                   pause_after_migration ? " -p" : "");
     }
 
-    migrate_domain(domid, rune, debug, config_filename);
+    migrate_domain(domid, rune, debug, config_filename, copy_local_disks, hostname);
     return EXIT_SUCCESS;
 }
 
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25..5ffbfb7 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -882,7 +882,7 @@ start:
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
-                                          send_back_fd, &params,
+                                          send_back_fd, dom_info->copy_local_disks, &params,
                                           0, autoconnect_console_how);
 
         libxl_domain_restore_params_dispose(&params);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-23  7:42 [PATCH RFC] Live migration for VMs with QEMU backed local storage Bruno Alvisio
@ 2017-06-23  8:03 ` Roger Pau Monné
  2017-06-26 10:06   ` George Dunlap
  2017-06-29 11:58 ` Wei Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2017-06-23  8:03 UTC (permalink / raw)
  To: Bruno Alvisio; +Cc: xen-devel, wei.liu2, ian.jackson, dave

On Fri, Jun 23, 2017 at 03:42:20AM -0400, Bruno Alvisio wrote:
> This patch is the first attempt on adding live migration of instances with local
> storage to Xen. This patch just handles very restricted case of fully
> virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
> A new "-l" option is introduced to "xl migrate" command. If provided, the local
> disk should be mirrored during the migration process. If the option is set,
> during the VM creation a qemu NBD server is started on the destination. After
> the instance is suspended on the source, the QMP "disk-mirror" command is issued
> to mirror the disk to destination. Once the mirroring job is complete, the
> migration process continues as before. Finally, the NBD server is stopped after
> the instance is successfully resumed on the destination node.

Since I'm not familiar with all this, can this "driver-mirror" QEMU
capability handle the migration of disk while being actively used?

> A major problem with this patch is that the mirroring of the disk is performed
> only after the memory stream is completed and the VM is suspended on the source;
> thus the instance is frozen for a long period of time. The reason this happens
> is that the QEMU process (needed for the disk mirroring) is started on the
> destination node only after the memory copying is completed. One possibility I
> was considering to solve this issue (if it is decided that this capability
> should be used): Could a "helper" QEMU process be started on the destination
> node at the beginning of the migration sequence with the sole purpose of
> handling the disk mirroring and kill it at the end of the migration sequence? 
> 
> From the suggestions given by Konrad Wilk and Paul Durrant the preferred
> approach would be to handle the mirroring of disks by QEMU instead of directly
> being handled directly by, for example, blkback. It would be very helpful for me
> to have a mental map of all the scenarios that can be encountered regarding
> local disk (Xen could start supporting live migration of certain types of local
> disks). This are the ones I can think of:
> - Fully Virtualized HVM: QEMU emulation

PV domains can also use the QEMU PV disk backend, so it should be
feasible to handle this migration for all guest types just using
QEMU.

> - blkback

TBH, I don't think such feature should be added to blkback. It's
too complex to be implemented inside of the kernel itself.

There are options already available to perform block device
duplication at the block level itself in Linux like DRDB [0] and IMHO
this is what should be used in conjunction with blkback.

Remember that at the end of day the Unix philosophy has always been to
implement simple tools that solve specific problems, and then glue
them together in order to solve more complex problems.

In that line of thought, why not simply use iSCSI or similar in order
to share the disk with all the hosts?

> - blktap / blktap2 

This is deprecated and no longer present in upstream kernels, I don't
think it's worth looking into it.

Roger.

[0] http://docs.linbit.com/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-23  8:03 ` Roger Pau Monné
@ 2017-06-26 10:06   ` George Dunlap
  2017-06-26 23:16     ` Bruno Alvisio
  0 siblings, 1 reply; 12+ messages in thread
From: George Dunlap @ 2017-06-26 10:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Bruno Alvisio, dave, Wei Liu, Ian Jackson, xen-devel

On Fri, Jun 23, 2017 at 9:03 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Fri, Jun 23, 2017 at 03:42:20AM -0400, Bruno Alvisio wrote:
>> This patch is the first attempt on adding live migration of instances with local
>> storage to Xen. This patch just handles very restricted case of fully
>> virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
>> A new "-l" option is introduced to "xl migrate" command. If provided, the local
>> disk should be mirrored during the migration process. If the option is set,
>> during the VM creation a qemu NBD server is started on the destination. After
>> the instance is suspended on the source, the QMP "disk-mirror" command is issued
>> to mirror the disk to destination. Once the mirroring job is complete, the
>> migration process continues as before. Finally, the NBD server is stopped after
>> the instance is successfully resumed on the destination node.
>
> Since I'm not familiar with all this, can this "driver-mirror" QEMU
> capability handle the migration of disk while being actively used?
>
>> A major problem with this patch is that the mirroring of the disk is performed
>> only after the memory stream is completed and the VM is suspended on the source;
>> thus the instance is frozen for a long period of time. The reason this happens
>> is that the QEMU process (needed for the disk mirroring) is started on the
>> destination node only after the memory copying is completed. One possibility I
>> was considering to solve this issue (if it is decided that this capability
>> should be used): Could a "helper" QEMU process be started on the destination
>> node at the beginning of the migration sequence with the sole purpose of
>> handling the disk mirroring and kill it at the end of the migration sequence?
>>
>> From the suggestions given by Konrad Wilk and Paul Durrant the preferred
>> approach would be to handle the mirroring of disks by QEMU instead of directly
>> being handled directly by, for example, blkback. It would be very helpful for me
>> to have a mental map of all the scenarios that can be encountered regarding
>> local disk (Xen could start supporting live migration of certain types of local
>> disks). This are the ones I can think of:
>> - Fully Virtualized HVM: QEMU emulation
>
> PV domains can also use the QEMU PV disk backend, so it should be
> feasible to handle this migration for all guest types just using
> QEMU.
>
>> - blkback
>
> TBH, I don't think such feature should be added to blkback. It's
> too complex to be implemented inside of the kernel itself.

In theory if blktap just exposed a dirty bitmap, like Xen does for the
memory, the "smarts" of copying over the dirty blocks could be done in
the toolstack.

But I think probably the best thing to do to start with would simply
say that disk migration is only available with a qdisk backend.

> There are options already available to perform block device
> duplication at the block level itself in Linux like DRDB [0] and IMHO
> this is what should be used in conjunction with blkback.
>
> Remember that at the end of day the Unix philosophy has always been to
> implement simple tools that solve specific problems, and then glue
> them together in order to solve more complex problems.
>
> In that line of thought, why not simply use iSCSI or similar in order
> to share the disk with all the hosts?

Well iSCSI can be complicated to set up, and it means your disk data
goes over a network rather than simply staying on your local disk.
Obviously if people anticipate doing large amounts of migration, then
it's worth the effort to set up DRBD or iSCSI.  But having the option
to do occasional migrates without having to do through that overhead
is still something worth having.  Given that qemu already has a disk
mirroring function, it's probably worth pursuing.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-26 10:06   ` George Dunlap
@ 2017-06-26 23:16     ` Bruno Alvisio
  0 siblings, 0 replies; 12+ messages in thread
From: Bruno Alvisio @ 2017-06-26 23:16 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, dave, Wei Liu, Ian Jackson, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 4839 bytes --]

Thank you for the information and feedback. The scenarios to handle are:
1. QEMU emulation
2. blkback.
3. qdisk.

>From the previous e-mails, there is an agreement that no functionality (or
maybe minimal) should be added to blkback.
@Roger Pau Monné: Yes, "drive-mirror" feature handles disks that being
actively written. As George Dunlap mentioned, I was thinking of scenarios
where iSCSI or DRBD are not set up and only occasional migrations are
needed.

TODO for me: I will start looking at the qdisk back and see how can I
leverage the disk mirroring feature already provided by QEMU.

Thanks,

Bruno

On Mon, Jun 26, 2017 at 6:06 AM, George Dunlap <dunlapg@umich.edu> wrote:

> On Fri, Jun 23, 2017 at 9:03 AM, Roger Pau Monné <roger.pau@citrix.com>
> wrote:
> > On Fri, Jun 23, 2017 at 03:42:20AM -0400, Bruno Alvisio wrote:
> >> This patch is the first attempt on adding live migration of instances
> with local
> >> storage to Xen. This patch just handles very restricted case of fully
> >> virtualized HVMs. The code uses the "drive-mirror" capability provided
> by QEMU.
> >> A new "-l" option is introduced to "xl migrate" command. If provided,
> the local
> >> disk should be mirrored during the migration process. If the option is
> set,
> >> during the VM creation a qemu NBD server is started on the destination.
> After
> >> the instance is suspended on the source, the QMP "disk-mirror" command
> is issued
> >> to mirror the disk to destination. Once the mirroring job is complete,
> the
> >> migration process continues as before. Finally, the NBD server is
> stopped after
> >> the instance is successfully resumed on the destination node.
> >
> > Since I'm not familiar with all this, can this "driver-mirror" QEMU
> > capability handle the migration of disk while being actively used?
> >
> >> A major problem with this patch is that the mirroring of the disk is
> performed
> >> only after the memory stream is completed and the VM is suspended on
> the source;
> >> thus the instance is frozen for a long period of time. The reason this
> happens
> >> is that the QEMU process (needed for the disk mirroring) is started on
> the
> >> destination node only after the memory copying is completed. One
> possibility I
> >> was considering to solve this issue (if it is decided that this
> capability
> >> should be used): Could a "helper" QEMU process be started on the
> destination
> >> node at the beginning of the migration sequence with the sole purpose of
> >> handling the disk mirroring and kill it at the end of the migration
> sequence?
> >>
> >> From the suggestions given by Konrad Wilk and Paul Durrant the preferred
> >> approach would be to handle the mirroring of disks by QEMU instead of
> directly
> >> being handled directly by, for example, blkback. It would be very
> helpful for me
> >> to have a mental map of all the scenarios that can be encountered
> regarding
> >> local disk (Xen could start supporting live migration of certain types
> of local
> >> disks). This are the ones I can think of:
> >> - Fully Virtualized HVM: QEMU emulation
> >
> > PV domains can also use the QEMU PV disk backend, so it should be
> > feasible to handle this migration for all guest types just using
> > QEMU.
> >
> >> - blkback
> >
> > TBH, I don't think such feature should be added to blkback. It's
> > too complex to be implemented inside of the kernel itself.
>
> In theory if blktap just exposed a dirty bitmap, like Xen does for the
> memory, the "smarts" of copying over the dirty blocks could be done in
> the toolstack.
>
> But I think probably the best thing to do to start with would simply
> say that disk migration is only available with a qdisk backend.
>
> > There are options already available to perform block device
> > duplication at the block level itself in Linux like DRDB [0] and IMHO
> > this is what should be used in conjunction with blkback.
> >
> > Remember that at the end of day the Unix philosophy has always been to
> > implement simple tools that solve specific problems, and then glue
> > them together in order to solve more complex problems.
> >
> > In that line of thought, why not simply use iSCSI or similar in order
> > to share the disk with all the hosts?
>
> Well iSCSI can be complicated to set up, and it means your disk data
> goes over a network rather than simply staying on your local disk.
> Obviously if people anticipate doing large amounts of migration, then
> it's worth the effort to set up DRBD or iSCSI.  But having the option
> to do occasional migrates without having to do through that overhead
> is still something worth having.  Given that qemu already has a disk
> mirroring function, it's probably worth pursuing.
>
>  -George
>

[-- Attachment #1.2: Type: text/html, Size: 6145 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-23  7:42 [PATCH RFC] Live migration for VMs with QEMU backed local storage Bruno Alvisio
  2017-06-23  8:03 ` Roger Pau Monné
@ 2017-06-29 11:58 ` Wei Liu
  2017-06-29 13:33   ` Bruno Alvisio
  1 sibling, 1 reply; 12+ messages in thread
From: Wei Liu @ 2017-06-29 11:58 UTC (permalink / raw)
  To: Bruno Alvisio; +Cc: xen-devel, ian.jackson, wei.liu2, dave

On Fri, Jun 23, 2017 at 03:42:20AM -0400, Bruno Alvisio wrote:
> This patch is the first attempt on adding live migration of instances with local
> storage to Xen. This patch just handles very restricted case of fully
> virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
> A new "-l" option is introduced to "xl migrate" command. If provided, the local
> disk should be mirrored during the migration process. If the option is set,
> during the VM creation a qemu NBD server is started on the destination. After
> the instance is suspended on the source, the QMP "disk-mirror" command is issued
> to mirror the disk to destination. Once the mirroring job is complete, the
> migration process continues as before. Finally, the NBD server is stopped after
> the instance is successfully resumed on the destination node.
> 
> A major problem with this patch is that the mirroring of the disk is performed
> only after the memory stream is completed and the VM is suspended on the source;
> thus the instance is frozen for a long period of time. The reason this happens
> is that the QEMU process (needed for the disk mirroring) is started on the
> destination node only after the memory copying is completed. One possibility I
> was considering to solve this issue (if it is decided that this capability
> should be used): Could a "helper" QEMU process be started on the destination
> node at the beginning of the migration sequence with the sole purpose of
> handling the disk mirroring and kill it at the end of the migration sequence? 
> 

In theory we could, but I am very cautious about this. I _think_ we can
change the timing QEMU is started. It can be started earlier, but take
precaution that it shouldn't resume the guest.

In any case, start with the simple setup first.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-29 11:58 ` Wei Liu
@ 2017-06-29 13:33   ` Bruno Alvisio
  2017-06-29 13:56     ` Wei Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Bruno Alvisio @ 2017-06-29 13:33 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Jackson, dave


[-- Attachment #1.1: Type: text/plain, Size: 2461 bytes --]

Thanks Wei. Currently it is started after the memory is streamed from
source to destination (for migration) and the booting functions are
completed.I was going to ask to the list if there is a specific reason the
QEMU process needs to be started at that point.

Also, if the start point of the QEMU process is moved to an earlier part of
the domain creation process, how can I run a basic set of tests to validate
that I am not breaking any functionality and causing a regression?

Thanks,

Bruno

On Thu, Jun 29, 2017 at 7:58 AM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, Jun 23, 2017 at 03:42:20AM -0400, Bruno Alvisio wrote:
> > This patch is the first attempt on adding live migration of instances
> with local
> > storage to Xen. This patch just handles very restricted case of fully
> > virtualized HVMs. The code uses the "drive-mirror" capability provided
> by QEMU.
> > A new "-l" option is introduced to "xl migrate" command. If provided,
> the local
> > disk should be mirrored during the migration process. If the option is
> set,
> > during the VM creation a qemu NBD server is started on the destination.
> After
> > the instance is suspended on the source, the QMP "disk-mirror" command
> is issued
> > to mirror the disk to destination. Once the mirroring job is complete,
> the
> > migration process continues as before. Finally, the NBD server is
> stopped after
> > the instance is successfully resumed on the destination node.
> >
> > A major problem with this patch is that the mirroring of the disk is
> performed
> > only after the memory stream is completed and the VM is suspended on the
> source;
> > thus the instance is frozen for a long period of time. The reason this
> happens
> > is that the QEMU process (needed for the disk mirroring) is started on
> the
> > destination node only after the memory copying is completed. One
> possibility I
> > was considering to solve this issue (if it is decided that this
> capability
> > should be used): Could a "helper" QEMU process be started on the
> destination
> > node at the beginning of the migration sequence with the sole purpose of
> > handling the disk mirroring and kill it at the end of the migration
> sequence?
> >
>
> In theory we could, but I am very cautious about this. I _think_ we can
> change the timing QEMU is started. It can be started earlier, but take
> precaution that it shouldn't resume the guest.
>
> In any case, start with the simple setup first.
>

[-- Attachment #1.2: Type: text/html, Size: 3040 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-29 13:33   ` Bruno Alvisio
@ 2017-06-29 13:56     ` Wei Liu
  2017-06-29 14:34       ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Wei Liu @ 2017-06-29 13:56 UTC (permalink / raw)
  To: Bruno Alvisio; +Cc: xen-devel, Wei Liu, Ian Jackson, dave

On Thu, Jun 29, 2017 at 09:33:07AM -0400, Bruno Alvisio wrote:
> Thanks Wei. Currently it is started after the memory is streamed from
> source to destination (for migration) and the booting functions are
> completed.I was going to ask to the list if there is a specific reason the
> QEMU process needs to be started at that point.

I _think_ it is because we don't want QEMU to touch guest memory / state
too early. Note I haven't checked the code. Do ask on the list if you
aren't sure.

> 
> Also, if the start point of the QEMU process is moved to an earlier part of
> the domain creation process, how can I run a basic set of tests to validate
> that I am not breaking any functionality and causing a regression?
> 

Doing some local migration tests as a starter.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-29 13:56     ` Wei Liu
@ 2017-06-29 14:34       ` Roger Pau Monné
  2017-06-29 16:11         ` Wei Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2017-06-29 14:34 UTC (permalink / raw)
  To: Wei Liu; +Cc: Bruno Alvisio, dave, Ian Jackson, xen-devel

On Thu, Jun 29, 2017 at 02:56:55PM +0100, Wei Liu wrote:
> On Thu, Jun 29, 2017 at 09:33:07AM -0400, Bruno Alvisio wrote:
> > Thanks Wei. Currently it is started after the memory is streamed from
> > source to destination (for migration) and the booting functions are
> > completed.I was going to ask to the list if there is a specific reason the
> > QEMU process needs to be started at that point.
> 
> I _think_ it is because we don't want QEMU to touch guest memory / state
> too early. Note I haven't checked the code. Do ask on the list if you
> aren't sure.

I would have thought that's because you need the QEMU state, and
that's not saved until the guest on the other end is paused (ie:
rather at the end of the memory copy).

Roger

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-29 14:34       ` Roger Pau Monné
@ 2017-06-29 16:11         ` Wei Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Wei Liu @ 2017-06-29 16:11 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Bruno Alvisio, dave, Wei Liu, Ian Jackson, xen-devel

On Thu, Jun 29, 2017 at 03:34:37PM +0100, Roger Pau Monné wrote:
> On Thu, Jun 29, 2017 at 02:56:55PM +0100, Wei Liu wrote:
> > On Thu, Jun 29, 2017 at 09:33:07AM -0400, Bruno Alvisio wrote:
> > > Thanks Wei. Currently it is started after the memory is streamed from
> > > source to destination (for migration) and the booting functions are
> > > completed.I was going to ask to the list if there is a specific reason the
> > > QEMU process needs to be started at that point.
> > 
> > I _think_ it is because we don't want QEMU to touch guest memory / state
> > too early. Note I haven't checked the code. Do ask on the list if you
> > aren't sure.
> 
> I would have thought that's because you need the QEMU state, and
> that's not saved until the guest on the other end is paused (ie:
> rather at the end of the memory copy).
> 

This is also a plausible cause. This can also be worked around I think.
The principle is same -- start QEMU first and load state later. But then
this begs the question how you can handle the mirroring and state
transfer concurrently.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-23 14:15 ` Konrad Rzeszutek Wilk
@ 2017-06-23 14:20   ` Ian Jackson
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Jackson @ 2017-06-23 14:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Bruno Alvisio, dave, wei.liu2, xen-devel

Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [PATCH RFC] Live migration for VMs with QEMU backed local storage"):
> On Fri, Jun 23, 2017 at 03:31:16AM -0400, Bruno Alvisio wrote:
> > disks). This are the ones I can think of:
> > - Fully Virtualized HVM: QEMU emulation
> > - blkback
> > - blktap / blktap2 
> 
> You are missing 'qdisk' which is the QEMU implemenation of blkback.

Some people use drbd for this but that has its own mirroring built-in.

Doing some kind of poor-man's mirroring with blkback, devmapper and
nbd might well be a useful approach too.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC] Live migration for VMs with QEMU backed local storage
  2017-06-23  7:31 Bruno Alvisio
@ 2017-06-23 14:15 ` Konrad Rzeszutek Wilk
  2017-06-23 14:20   ` Ian Jackson
  0 siblings, 1 reply; 12+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-06-23 14:15 UTC (permalink / raw)
  To: Bruno Alvisio; +Cc: xen-devel, wei.liu2, ian.jackson, dave

On Fri, Jun 23, 2017 at 03:31:16AM -0400, Bruno Alvisio wrote:
> This patch is the first attempt on adding live migration of instances with local
> storage to Xen. This patch just handles very restricted case of fully
> virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
> A new "-l" option is introduced to "xl migrate" command. If provided, the local
> disk should be mirrored during the migration process. If the option is set,
> during the VM creation a qemu NBD server is started on the destination. After
> the instance is suspended on the source, the QMP "disk-mirror" command is issued
> to mirror the disk to destination. Once the mirroring job is complete, the
> migration process continues as before. Finally, the NBD server is stopped after
> the instance is successfully resumed on the destination node.
> 
> A major problem with this patch is that the mirroring of the disk is performed
> only after the memory stream is completed and the VM is suspended on the source;
> thus the instance is frozen for a long period of time. The reason this happens
> is that the QEMU process (needed for the disk mirroring) is started on the
> destination node only after the memory copying is completed. One possibility I
> was considering to solve this issue (if it is decided that this capability
> should be used): Could a "helper" QEMU process be started on the destination
> node at the beginning of the migration sequence with the sole purpose of
> handling the disk mirroring and kill it at the end of the migration sequence? 
> 
> From the suggestions given by Konrad Wilk and Paul Durrant the preferred
> approach would be to handle the mirroring of disks by QEMU instead of directly
> being handled directly by, for example, blkback. It would be very helpful for me
> to have a mental map of all the scenarios that can be encountered regarding
> local disk (Xen could start supporting live migration of certain types of local
> disks). This are the ones I can think of:
> - Fully Virtualized HVM: QEMU emulation
> - blkback
> - blktap / blktap2 

You are missing 'qdisk' which is the QEMU implemenation of blkback.

> 
> 
> I have included TODOs in the code. I am sending this patch as is because I first
> wanted to get an initial feedback if this is the path the should be pursued. Any
> suggestions and ideas on this patch or on how to make a more generic solution
> would be really appreciated.
> 
> Signed-off-by: Bruno Alvisio <bruno.alvisio@gmail.com>
> 
> ---
>  tools/libxl/libxl.h                  |  16 ++++-
>  tools/libxl/libxl_create.c           |  87 +++++++++++++++++++++++++-
>  tools/libxl/libxl_internal.h         |  16 +++++
>  tools/libxl/libxl_qmp.c              | 115 ++++++++++++++++++++++++++++++++++-
>  tools/ocaml/libs/xl/xenlight_stubs.c |   2 +-
>  tools/xl/xl.h                        |   1 +
>  tools/xl/xl_migrate.c                |  79 +++++++++++++++++++++---
>  tools/xl/xl_vmcontrol.c              |   2 +-
>  8 files changed, 303 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index cf8687a..81fb2dc 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1294,6 +1294,15 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
>                      xentoollog_logger *lg);
>  int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
>  
> +int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format) LIBXL_EXTERNAL_CALLERS_ONLY;
> +
> +int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready) LIBXL_EXTERNAL_CALLERS_ONLY;
> +
> +int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names) LIBXL_EXTERNAL_CALLERS_ONLY;
> +
> +int libxl__nbd_server_stop(libxl_ctx *ctx, int domid) LIBXL_EXTERNAL_CALLERS_ONLY;
> +
> +
>  /* domain related functions */
>  
>  /* If the result is ERROR_ABORTED, the domain may or may not exist
> @@ -1307,7 +1316,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
>                              LIBXL_EXTERNAL_CALLERS_ONLY;
>  int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
>                                  uint32_t *domid, int restore_fd,
> -                                int send_back_fd,
> +                                int send_back_fd, int copy_local_disks,
>                                  const libxl_domain_restore_params *params,
>                                  const libxl_asyncop_how *ao_how,
>                                  const libxl_asyncprogress_how *aop_console_how)
> @@ -1348,7 +1357,7 @@ static inline int libxl_domain_create_restore_0x040400(
>      LIBXL_EXTERNAL_CALLERS_ONLY
>  {
>      return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
> -                                       -1, params, ao_how, aop_console_how);
> +                                       -1, 0, params, ao_how, aop_console_how);
>  }
>  
>  #define libxl_domain_create_restore libxl_domain_create_restore_0x040400
> @@ -1387,6 +1396,9 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
>  #define LIBXL_SUSPEND_DEBUG 1
>  #define LIBXL_SUSPEND_LIVE 2
>  
> +#define QEMU_DRIVE_MIRROR_PORT "11000"
> +#define QEMU_DRIVE_MIRROR_DEVICE "ide0-hd0"
> +
>  /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
>   *   If this parameter is true, use co-operative resume. The guest
>   *   must support this.
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index bffbc45..ef99f03 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -27,6 +27,40 @@
>  
>  #include <xen-xsm/flask/flask.h>
>  
> +//TODO: These functions were created to be able to call qmp commands from xl.
> +//TODO: These functions should be removed since they won't be called from xl.
> +int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format){
> +    GC_INIT(ctx);
> +    int rc;
> +    rc = libxl__qmp_drive_mirror(gc, domid, device, target, format);
> +    GC_FREE;
> +    return rc;
> +}
> +
> +int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready){
> +    GC_INIT(ctx);
> +    int rc;
> +    rc = libxl__qmp_query_block_jobs(gc, domid, is_ready);
> +    GC_FREE;
> +    return rc;
> +}
> +
> +int libxl__nbd_server_stop(libxl_ctx *ctx, int domid){
> +    GC_INIT(ctx);
> +    int rc;
> +    rc = libxl__qmp_nbd_server_stop(gc, domid);
> +    GC_FREE;
> +    return rc;
> +}
> +
> +int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names){
> +    GC_INIT(ctx);
> +    int rc;
> +    rc = libxl__qmp_query_block(gc, domid, device_names);
> +    GC_FREE;
> +    return rc;
> +}
> +
>  int libxl__domain_create_info_setdefault(libxl__gc *gc,
>                                           libxl_domain_create_info *c_info)
>  {
> @@ -1355,6 +1389,51 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
>          else
>              libxl__spawn_local_dm(egc, &dcs->sdss.dm);
>  
> +
> +        if(dcs->restore_fd >= 0 && dcs->copy_local_disks) {
> +             /*
> +              * Start and add the NBD server
> +              * Host is set it to "::" for now
> +              * Port we hard code a port for now
> +
> +              * This code just handles the case when -M pc is used.
> +              * (The config xen_platform_pci = 0)
> +              *
> +              * Current implementation performs the disk mirroring after the
> +              * VM in the source has been suspended. Thus, the VM is frozen
> +              * for a long period of time.
> +              * Consider doing the mirroring of the drive before the memory
> +              * stream is performed.
> +              * Consider a solution that handles multiple types of VM configurations
> +              *
> +              * TODO: Current implementation only works with upstream qemu
> +              * TODO: consider the case when qemu-xen-traditional is used.
> +              * TODO: Check and copy only those disks which are local
> +              * TODO: Assign port dynamically
> +              */
> +
> +            fprintf(stderr, "Starting NBD Server\n");
> +            ret = libxl__qmp_nbd_server_start(gc, domid, "::", QEMU_DRIVE_MIRROR_PORT);
> +            if (ret) {
> +                ret = ERROR_FAIL;
> +                fprintf(stderr, "Failed to start NBD Server\n");
> +                goto skip_nbd;
> +            }else{
> +                fprintf(stderr, "Started NBD Server Successfully\n");
> +            }
> +
> +            ret = libxl__qmp_nbd_server_add(gc, domid, QEMU_DRIVE_MIRROR_DEVICE);
> +
> +            if (ret) {
> +                ret = ERROR_FAIL;
> +                fprintf(stderr, "Failed to add NBD Server\n");
> +                goto skip_nbd;
> +            } else {
> +                fprintf(stderr, "NBD Add Successful\n");
> +            }
> +        }
> +
> +skip_nbd:
>          /*
>           * Handle the domain's (and the related stubdomain's) access to
>           * the VGA framebuffer.
> @@ -1602,6 +1681,7 @@ static void domain_create_cb(libxl__egc *egc,
>  
>  static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
>                              uint32_t *domid, int restore_fd, int send_back_fd,
> +                            int copy_local_disks,
>                              const libxl_domain_restore_params *params,
>                              const libxl_asyncop_how *ao_how,
>                              const libxl_asyncprogress_how *aop_console_how)
> @@ -1617,6 +1697,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
>      libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
>      cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
>      cdcs->dcs.send_back_fd = send_back_fd;
> +    cdcs->dcs.copy_local_disks = copy_local_disks;
>      if (restore_fd > -1) {
>          cdcs->dcs.restore_params = *params;
>          rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
> @@ -1845,13 +1926,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
>                              const libxl_asyncprogress_how *aop_console_how)
>  {
>      unset_disk_colo_restore(d_config);
> -    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
> +    return do_domain_create(ctx, d_config, domid, -1, -1, 0, NULL,
>                              ao_how, aop_console_how);
>  }
>  
>  int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
>                                  uint32_t *domid, int restore_fd,
> -                                int send_back_fd,
> +                                int send_back_fd, int copy_local_disks,
>                                  const libxl_domain_restore_params *params,
>                                  const libxl_asyncop_how *ao_how,
>                                  const libxl_asyncprogress_how *aop_console_how)
> @@ -1863,7 +1944,7 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
>      }
>  
>      return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
> -                            params, ao_how, aop_console_how);
> +                            copy_local_disks, params, ao_how, aop_console_how);
>  }
>  
>  int libxl_domain_soft_reset(libxl_ctx *ctx,
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index afe6652..938481a 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1835,6 +1835,21 @@ _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
>  /* Start replication */
>  _hidden int libxl__qmp_start_replication(libxl__gc *gc, int domid,
>                                           bool primary);
> +
> +/* Add a disk to NBD server */
> + _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
> +                                       const char *disk);
> +
> +/* Mirror disk drive */
> +_hidden int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device,
> +                                    const char* target, const char* format);
> +
> +/* Query block devices */
> +_hidden int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names);
> +
> +/* Query existing block jobs*/
> +_hidden int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready);
> +
>  /* Get replication error that occurs when the vm is running */
>  _hidden int libxl__qmp_query_xen_replication_status(libxl__gc *gc, int domid);
>  /* Do checkpoint */
> @@ -3695,6 +3710,7 @@ struct libxl__domain_create_state {
>      int restore_fd, libxc_fd;
>      int restore_fdfl; /* original flags of restore_fd */
>      int send_back_fd;
> +    int copy_local_disks;
>      libxl_domain_restore_params restore_params;
>      uint32_t domid_soft_reset;
>      libxl__domain_create_cb *callback;
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index eab993a..cbfcf77 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -347,7 +347,9 @@ static libxl__qmp_handler *qmp_init_handler(libxl__gc *gc, uint32_t domid)
>      }
>      qmp->ctx = CTX;
>      qmp->domid = domid;
> -    qmp->timeout = 5;
> +    //TODO: Changed default timeout because drive-mirror command takes a long
> +    //TODO: to return. Consider timeout to be passed as param.
> +    qmp->timeout = 600;
>  
>      LIBXL_STAILQ_INIT(&qmp->callback_list);
>  
> @@ -1069,6 +1071,117 @@ int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid, const char *disk)
>      return qmp_run_command(gc, domid, "nbd-server-add", args, NULL, NULL);
>  }
>  
> +int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device, const char* target, const char* format)
> +{
> +    libxl__json_object *args = NULL;
> +    //TODO: Allow method to receive "sync", "speed", "mode", "granurality", "buf-size"
> +    qmp_parameters_add_string(gc, &args, "device", device);
> +    qmp_parameters_add_string(gc, &args, "target", target);
> +    qmp_parameters_add_string(gc, &args, "sync", "full");
> +    qmp_parameters_add_string(gc, &args, "format", format);
> +    qmp_parameters_add_string(gc, &args, "mode", "existing");
> +    qmp_parameters_add_integer(gc, &args, "granularity", 0);
> +    qmp_parameters_add_integer(gc, &args, "buf-size", 0);
> +
> +    return qmp_run_command(gc, domid, "drive-mirror", args, NULL, NULL);
> +}
> +
> +static int query_block_callback(libxl__qmp_handler *qmp,
> +                               const libxl__json_object *response,
> +                               void *opaque)
> +{
> +    const libxl__json_object *blockinfo = NULL;
> +    GC_INIT(qmp->ctx);
> +    int i, rc = -1;
> +
> +    for (i = 0; (blockinfo = libxl__json_array_get(response, i)); i++) {
> +        const libxl__json_object *d;
> +        const char* device_name;
> +        d = libxl__json_map_get("device", blockinfo, JSON_STRING);
> +        if(!d){
> +            goto out;
> +        }
> +        device_name = libxl__json_object_get_string(d);
> +    }
> +
> +    rc = 0;
> +out:
> +    GC_FREE;
> +    return rc;
> +}
> +
> +static int query_block_jobs_callback(libxl__qmp_handler *qmp,
> +                               const libxl__json_object *response,
> +                               void *opaque)
> +{
> +    const libxl__json_object *blockjobinfo = NULL;
> +    GC_INIT(qmp->ctx);
> +    int i, rc = -1;
> +    bool empty = true;
> +
> +    for (i = 0; (blockjobinfo = libxl__json_array_get(response, i)); i++) {
> +        empty = false;
> +        const char *bjtype;
> +        const char *bjdevice;
> +        unsigned int bjlen;
> +        unsigned int bjoffset;
> +        bool bjbusy;
> +        bool bjpaused;
> +        const char *bjiostatus;
> +        bool bjready;
> +
> +        const libxl__json_object *type = NULL;
> +        const libxl__json_object *device = NULL;
> +        const libxl__json_object *len = NULL;
> +        const libxl__json_object *offset = NULL;
> +        const libxl__json_object *busy = NULL;
> +        const libxl__json_object *paused = NULL;
> +        const libxl__json_object *io_status = NULL;
> +        const libxl__json_object *ready = NULL;
> +
> +        type = libxl__json_map_get("type", blockjobinfo, JSON_STRING);
> +        device = libxl__json_map_get("device", blockjobinfo, JSON_STRING);
> +        len = libxl__json_map_get("len", blockjobinfo, JSON_INTEGER);
> +        offset = libxl__json_map_get("offset", blockjobinfo, JSON_INTEGER);
> +        busy = libxl__json_map_get("busy", blockjobinfo, JSON_BOOL);
> +        paused = libxl__json_map_get("type", blockjobinfo, JSON_BOOL);
> +        io_status = libxl__json_map_get("io-status", blockjobinfo, JSON_STRING);
> +        ready = libxl__json_map_get("ready", blockjobinfo, JSON_BOOL);
> +
> +        bjtype = libxl__json_object_get_string(type);
> +        bjdevice = libxl__json_object_get_string(device);
> +        bjlen = libxl__json_object_get_integer(len);
> +        bjoffset = libxl__json_object_get_integer(offset);
> +        bjbusy = libxl__json_object_get_bool(len);
> +        bjpaused = libxl__json_object_get_bool(paused);
> +        bjiostatus = libxl__json_object_get_string(io_status);
> +        bjready = libxl__json_object_get_bool(ready);
> +
> +        bool *is_ready = opaque;
> +        *is_ready = bjready;
> +    }
> +
> +    if(empty){
> +        bool *is_ready = opaque;
> +        *is_ready = true;
> +    }
> +
> +    rc = 0;
> +
> +    GC_FREE;
> +    return rc;
> +}
> +
> +int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names)
> +{
> +    return qmp_run_command(gc, domid, "query-block", NULL, query_block_callback, device_names);
> +}
> +
> +int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready)
> +{
> +    return qmp_run_command(gc, domid, "query-block-jobs", NULL, query_block_jobs_callback, is_ready);
> +}
> +
>  int libxl__qmp_start_replication(libxl__gc *gc, int domid, bool primary)
>  {
>      libxl__json_object *args = NULL;
> diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
> index 98b52b9..8791175 100644
> --- a/tools/ocaml/libs/xl/xenlight_stubs.c
> +++ b/tools/ocaml/libs/xl/xenlight_stubs.c
> @@ -538,7 +538,7 @@ value stub_libxl_domain_create_restore(value ctx, value domain_config, value par
>  
>  	caml_enter_blocking_section();
>  	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd,
> -		-1, &c_params, ao_how, NULL);
> +		-1, 0, &c_params, ao_how, NULL);
>  	caml_leave_blocking_section();
>  
>  	free(ao_how);
> diff --git a/tools/xl/xl.h b/tools/xl/xl.h
> index aa95b77..dcdb80d 100644
> --- a/tools/xl/xl.h
> +++ b/tools/xl/xl.h
> @@ -35,6 +35,7 @@ struct domain_create {
>      int daemonize;
>      int monitor; /* handle guest reboots etc */
>      int paused;
> +    int copy_local_disks;
>      int dryrun;
>      int quiet;
>      int vnc;
> diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
> index 1f0e87d..62b78ea 100644
> --- a/tools/xl/xl_migrate.c
> +++ b/tools/xl/xl_migrate.c
> @@ -177,7 +177,8 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
>  }
>  
>  static void migrate_domain(uint32_t domid, const char *rune, int debug,
> -                           const char *override_config_file)
> +                           const char *override_config_file,
> +                           int copy_local_disks, const char* hostname)
>  {
>      pid_t child = -1;
>      int rc;
> @@ -186,6 +187,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
>      char rc_buf;
>      uint8_t *config_data;
>      int config_len, flags = LIBXL_SUSPEND_LIVE;
> +    char* target;
>  
>      save_domain_core_begin(domid, override_config_file,
>                             &config_data, &config_len);
> @@ -232,6 +234,47 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
>  
>      fprintf(stderr, "migration sender: Target has acknowledged transfer.\n");
>  
> +
> +    /*
> +     * If the -l was provided, the drive-mirror job is started.
> +     * TODO: Move the following code as part of the domain_suspend
> +     * TODO: The port should be sent by the destination.
> +    */
> +    if(copy_local_disks) {
> +        fprintf(stderr, "Starting mirror-drive of device %s\n", QEMU_DRIVE_MIRROR_DEVICE);
> +        xasprintf(&target, "nbd:%s:%s:exportname=%s", hostname, QEMU_DRIVE_MIRROR_PORT, QEMU_DRIVE_MIRROR_DEVICE);
> +        rc = libxl__drive_mirror(ctx, domid, QEMU_DRIVE_MIRROR_DEVICE, target, "raw");
> +        if (!rc) {
> +            fprintf(stderr, "Drive mirror command returned successfully\n");
> +        }else{
> +            fprintf(stderr, "Sending drive mirror command failed\n");
> +            goto cont;
> +        }
> +
> +        /*
> +         * Query job status until it is ready
> +         * TODO: This code is just an inefficient busy wait. QMP sends an
> +         * TODO: asynchronous message when mirroring job is completed. Consider
> +         * TODO: adding the capability to handle asynchronous QMP messages (already done?)
> +         */
> +        bool job_is_ready = false;
> +        while(!job_is_ready) {
> +            fprintf(stderr, "Checking for drive-mirror job");
> +            rc = libxl__query_block_jobs(ctx, domid, &job_is_ready);
> +            if(rc){
> +                fprintf(stderr, "Checking block job failed\n");
> +                goto cont;
> +            }else{
> +                fprintf(stderr, "Checking block job succeeded\n");
> +            }
> +            if(!job_is_ready){
> +                fprintf(stderr, "Sleeping 5 sec\n");
> +                sleep(5);
> +            }
> +        }
> +    }
> +cont:
> +
>      if (common_domname) {
>          xasprintf(&away_domname, "%s--migratedaway", common_domname);
>          rc = libxl_domain_rename(ctx, domid, common_domname, away_domname);
> @@ -316,7 +359,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
>  }
>  
>  static void migrate_receive(int debug, int daemonize, int monitor,
> -                            int pause_after_migration,
> +                            int pause_after_migration, int copy_local_disks,
>                              int send_fd, int recv_fd,
>                              libxl_checkpointed_stream checkpointed,
>                              char *colo_proxy_script,
> @@ -343,6 +386,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
>      dom_info.daemonize = daemonize;
>      dom_info.monitor = monitor;
>      dom_info.paused = 1;
> +    dom_info.copy_local_disks = copy_local_disks;
>      dom_info.migrate_fd = recv_fd;
>      dom_info.send_back_fd = send_fd;
>      dom_info.migration_domname_r = &migration_domname;
> @@ -423,6 +467,14 @@ static void migrate_receive(int debug, int daemonize, int monitor,
>  
>      fprintf(stderr, "migration target: Got permission, starting domain.\n");
>  
> +    fprintf(stderr, "Stopping NBD server\n");
> +    rc = libxl__nbd_server_stop(ctx, domid);
> +    if (rc){
> +        fprintf(stderr, "Failed to stop NBD server\n");
> +    }else{
> +        fprintf(stderr, "Stopped NBD server successfully\n");
> +    }
> +
>      if (migration_domname) {
>          rc = libxl_domain_rename(ctx, domid, migration_domname, common_domname);
>          if (rc) goto perhaps_destroy_notify_rc;
> @@ -478,6 +530,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
>  int main_migrate_receive(int argc, char **argv)
>  {
>      int debug = 0, daemonize = 1, monitor = 1, pause_after_migration = 0;
> +    int copy_local_disks = 0;
>      libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
>      int opt;
>      bool userspace_colo_proxy = false;
> @@ -490,7 +543,7 @@ int main_migrate_receive(int argc, char **argv)
>          COMMON_LONG_OPTS
>      };
>  
> -    SWITCH_FOREACH_OPT(opt, "Fedrp", opts, "migrate-receive", 0) {
> +    SWITCH_FOREACH_OPT(opt, "Fedrpl", opts, "migrate-receive", 0) {
>      case 'F':
>          daemonize = 0;
>          break;
> @@ -516,6 +569,9 @@ int main_migrate_receive(int argc, char **argv)
>      case 'p':
>          pause_after_migration = 1;
>          break;
> +    case 'l':
> +        copy_local_disks = 1;
> +        break;
>      }
>  
>      if (argc-optind != 0) {
> @@ -523,7 +579,7 @@ int main_migrate_receive(int argc, char **argv)
>          return EXIT_FAILURE;
>      }
>      migrate_receive(debug, daemonize, monitor, pause_after_migration,
> -                    STDOUT_FILENO, STDIN_FILENO,
> +                    copy_local_disks, STDOUT_FILENO, STDIN_FILENO,
>                      checkpointed, script, userspace_colo_proxy);
>  
>      return EXIT_SUCCESS;
> @@ -536,14 +592,16 @@ int main_migrate(int argc, char **argv)
>      const char *ssh_command = "ssh";
>      char *rune = NULL;
>      char *host;
> +    char *hostname;
>      int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
> +    int copy_local_disks = 0;
>      static struct option opts[] = {
>          {"debug", 0, 0, 0x100},
>          {"live", 0, 0, 0x200},
>          COMMON_LONG_OPTS
>      };
>  
> -    SWITCH_FOREACH_OPT(opt, "FC:s:ep", opts, "migrate", 2) {
> +    SWITCH_FOREACH_OPT(opt, "FC:s:epl", opts, "migrate", 2) {
>      case 'C':
>          config_filename = optarg;
>          break;
> @@ -560,6 +618,9 @@ int main_migrate(int argc, char **argv)
>      case 'p':
>          pause_after_migration = 1;
>          break;
> +    case 'l':
> +        copy_local_disks = 1;
> +        break;
>      case 0x100: /* --debug */
>          debug = 1;
>          break;
> @@ -571,6 +632,9 @@ int main_migrate(int argc, char **argv)
>      domid = find_domain(argv[optind]);
>      host = argv[optind + 1];
>  
> +    hostname = strchr(host, '@');
> +    hostname++;
> +
>      bool pass_tty_arg = progress_use_cr || (isatty(2) > 0);
>  
>      if (!ssh_command[0]) {
> @@ -587,16 +651,17 @@ int main_migrate(int argc, char **argv)
>          } else {
>              verbose_len = (minmsglevel_default - minmsglevel) + 2;
>          }
> -        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s",
> +        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s%s",
>                    ssh_command, host,
>                    pass_tty_arg ? " -t" : "",
>                    verbose_len, verbose_buf,
>                    daemonize ? "" : " -e",
>                    debug ? " -d" : "",
> +                  copy_local_disks ? " -l" : "",
>                    pause_after_migration ? " -p" : "");
>      }
>  
> -    migrate_domain(domid, rune, debug, config_filename);
> +    migrate_domain(domid, rune, debug, config_filename, copy_local_disks, hostname);
>      return EXIT_SUCCESS;
>  }
>  
> diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> index 89c2b25..5ffbfb7 100644
> --- a/tools/xl/xl_vmcontrol.c
> +++ b/tools/xl/xl_vmcontrol.c
> @@ -882,7 +882,7 @@ start:
>  
>          ret = libxl_domain_create_restore(ctx, &d_config,
>                                            &domid, restore_fd,
> -                                          send_back_fd, &params,
> +                                          send_back_fd, dom_info->copy_local_disks, &params,
>                                            0, autoconnect_console_how);
>  
>          libxl_domain_restore_params_dispose(&params);
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC] Live migration for VMs with QEMU backed local storage
@ 2017-06-23  7:31 Bruno Alvisio
  2017-06-23 14:15 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 12+ messages in thread
From: Bruno Alvisio @ 2017-06-23  7:31 UTC (permalink / raw)
  To: xen-devel; +Cc: wei.liu2, ian.jackson, dave

This patch is the first attempt on adding live migration of instances with local
storage to Xen. This patch just handles very restricted case of fully
virtualized HVMs. The code uses the "drive-mirror" capability provided by QEMU.
A new "-l" option is introduced to "xl migrate" command. If provided, the local
disk should be mirrored during the migration process. If the option is set,
during the VM creation a qemu NBD server is started on the destination. After
the instance is suspended on the source, the QMP "disk-mirror" command is issued
to mirror the disk to destination. Once the mirroring job is complete, the
migration process continues as before. Finally, the NBD server is stopped after
the instance is successfully resumed on the destination node.

A major problem with this patch is that the mirroring of the disk is performed
only after the memory stream is completed and the VM is suspended on the source;
thus the instance is frozen for a long period of time. The reason this happens
is that the QEMU process (needed for the disk mirroring) is started on the
destination node only after the memory copying is completed. One possibility I
was considering to solve this issue (if it is decided that this capability
should be used): Could a "helper" QEMU process be started on the destination
node at the beginning of the migration sequence with the sole purpose of
handling the disk mirroring and kill it at the end of the migration sequence? 

From the suggestions given by Konrad Wilk and Paul Durrant the preferred
approach would be to handle the mirroring of disks by QEMU instead of directly
being handled directly by, for example, blkback. It would be very helpful for me
to have a mental map of all the scenarios that can be encountered regarding
local disk (Xen could start supporting live migration of certain types of local
disks). This are the ones I can think of:
- Fully Virtualized HVM: QEMU emulation
- blkback
- blktap / blktap2 


I have included TODOs in the code. I am sending this patch as is because I first
wanted to get an initial feedback if this is the path the should be pursued. Any
suggestions and ideas on this patch or on how to make a more generic solution
would be really appreciated.

Signed-off-by: Bruno Alvisio <bruno.alvisio@gmail.com>

---
 tools/libxl/libxl.h                  |  16 ++++-
 tools/libxl/libxl_create.c           |  87 +++++++++++++++++++++++++-
 tools/libxl/libxl_internal.h         |  16 +++++
 tools/libxl/libxl_qmp.c              | 115 ++++++++++++++++++++++++++++++++++-
 tools/ocaml/libs/xl/xenlight_stubs.c |   2 +-
 tools/xl/xl.h                        |   1 +
 tools/xl/xl_migrate.c                |  79 +++++++++++++++++++++---
 tools/xl/xl_vmcontrol.c              |   2 +-
 8 files changed, 303 insertions(+), 15 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index cf8687a..81fb2dc 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1294,6 +1294,15 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
                     xentoollog_logger *lg);
 int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
 
+int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+int libxl__nbd_server_stop(libxl_ctx *ctx, int domid) LIBXL_EXTERNAL_CALLERS_ONLY;
+
+
 /* domain related functions */
 
 /* If the result is ERROR_ABORTED, the domain may or may not exist
@@ -1307,7 +1316,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
-                                int send_back_fd,
+                                int send_back_fd, int copy_local_disks,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1348,7 +1357,7 @@ static inline int libxl_domain_create_restore_0x040400(
     LIBXL_EXTERNAL_CALLERS_ONLY
 {
     return libxl_domain_create_restore(ctx, d_config, domid, restore_fd,
-                                       -1, params, ao_how, aop_console_how);
+                                       -1, 0, params, ao_how, aop_console_how);
 }
 
 #define libxl_domain_create_restore libxl_domain_create_restore_0x040400
@@ -1387,6 +1396,9 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
 
+#define QEMU_DRIVE_MIRROR_PORT "11000"
+#define QEMU_DRIVE_MIRROR_DEVICE "ide0-hd0"
+
 /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
  *   If this parameter is true, use co-operative resume. The guest
  *   must support this.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index bffbc45..ef99f03 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -27,6 +27,40 @@
 
 #include <xen-xsm/flask/flask.h>
 
+//TODO: These functions were created to be able to call qmp commands from xl.
+//TODO: These functions should be removed since they won't be called from xl.
+int libxl__drive_mirror(libxl_ctx *ctx, int domid, const char* device, const char* target, const char* format){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_drive_mirror(gc, domid, device, target, format);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__query_block_jobs(libxl_ctx *ctx, int domid, bool *is_ready){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_query_block_jobs(gc, domid, is_ready);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__nbd_server_stop(libxl_ctx *ctx, int domid){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_nbd_server_stop(gc, domid);
+    GC_FREE;
+    return rc;
+}
+
+int libxl__query_block(libxl_ctx *ctx, int domid, char *device_names){
+    GC_INIT(ctx);
+    int rc;
+    rc = libxl__qmp_query_block(gc, domid, device_names);
+    GC_FREE;
+    return rc;
+}
+
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                          libxl_domain_create_info *c_info)
 {
@@ -1355,6 +1389,51 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
         else
             libxl__spawn_local_dm(egc, &dcs->sdss.dm);
 
+
+        if(dcs->restore_fd >= 0 && dcs->copy_local_disks) {
+             /*
+              * Start and add the NBD server
+              * Host is set it to "::" for now
+              * Port we hard code a port for now
+
+              * This code just handles the case when -M pc is used.
+              * (The config xen_platform_pci = 0)
+              *
+              * Current implementation performs the disk mirroring after the
+              * VM in the source has been suspended. Thus, the VM is frozen
+              * for a long period of time.
+              * Consider doing the mirroring of the drive before the memory
+              * stream is performed.
+              * Consider a solution that handles multiple types of VM configurations
+              *
+              * TODO: Current implementation only works with upstream qemu
+              * TODO: consider the case when qemu-xen-traditional is used.
+              * TODO: Check and copy only those disks which are local
+              * TODO: Assign port dynamically
+              */
+
+            fprintf(stderr, "Starting NBD Server\n");
+            ret = libxl__qmp_nbd_server_start(gc, domid, "::", QEMU_DRIVE_MIRROR_PORT);
+            if (ret) {
+                ret = ERROR_FAIL;
+                fprintf(stderr, "Failed to start NBD Server\n");
+                goto skip_nbd;
+            }else{
+                fprintf(stderr, "Started NBD Server Successfully\n");
+            }
+
+            ret = libxl__qmp_nbd_server_add(gc, domid, QEMU_DRIVE_MIRROR_DEVICE);
+
+            if (ret) {
+                ret = ERROR_FAIL;
+                fprintf(stderr, "Failed to add NBD Server\n");
+                goto skip_nbd;
+            } else {
+                fprintf(stderr, "NBD Add Successful\n");
+            }
+        }
+
+skip_nbd:
         /*
          * Handle the domain's (and the related stubdomain's) access to
          * the VGA framebuffer.
@@ -1602,6 +1681,7 @@ static void domain_create_cb(libxl__egc *egc,
 
 static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid, int restore_fd, int send_back_fd,
+                            int copy_local_disks,
                             const libxl_domain_restore_params *params,
                             const libxl_asyncop_how *ao_how,
                             const libxl_asyncprogress_how *aop_console_how)
@@ -1617,6 +1697,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config,
     libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config);
     cdcs->dcs.restore_fd = cdcs->dcs.libxc_fd = restore_fd;
     cdcs->dcs.send_back_fd = send_back_fd;
+    cdcs->dcs.copy_local_disks = copy_local_disks;
     if (restore_fd > -1) {
         cdcs->dcs.restore_params = *params;
         rc = libxl__fd_flags_modify_save(gc, cdcs->dcs.restore_fd,
@@ -1845,13 +1926,13 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             const libxl_asyncprogress_how *aop_console_how)
 {
     unset_disk_colo_restore(d_config);
-    return do_domain_create(ctx, d_config, domid, -1, -1, NULL,
+    return do_domain_create(ctx, d_config, domid, -1, -1, 0, NULL,
                             ao_how, aop_console_how);
 }
 
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
-                                int send_back_fd,
+                                int send_back_fd, int copy_local_disks,
                                 const libxl_domain_restore_params *params,
                                 const libxl_asyncop_how *ao_how,
                                 const libxl_asyncprogress_how *aop_console_how)
@@ -1863,7 +1944,7 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
     }
 
     return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
-                            params, ao_how, aop_console_how);
+                            copy_local_disks, params, ao_how, aop_console_how);
 }
 
 int libxl_domain_soft_reset(libxl_ctx *ctx,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index afe6652..938481a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1835,6 +1835,21 @@ _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
 /* Start replication */
 _hidden int libxl__qmp_start_replication(libxl__gc *gc, int domid,
                                          bool primary);
+
+/* Add a disk to NBD server */
+ _hidden int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid,
+                                       const char *disk);
+
+/* Mirror disk drive */
+_hidden int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device,
+                                    const char* target, const char* format);
+
+/* Query block devices */
+_hidden int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names);
+
+/* Query existing block jobs*/
+_hidden int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready);
+
 /* Get replication error that occurs when the vm is running */
 _hidden int libxl__qmp_query_xen_replication_status(libxl__gc *gc, int domid);
 /* Do checkpoint */
@@ -3695,6 +3710,7 @@ struct libxl__domain_create_state {
     int restore_fd, libxc_fd;
     int restore_fdfl; /* original flags of restore_fd */
     int send_back_fd;
+    int copy_local_disks;
     libxl_domain_restore_params restore_params;
     uint32_t domid_soft_reset;
     libxl__domain_create_cb *callback;
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index eab993a..cbfcf77 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -347,7 +347,9 @@ static libxl__qmp_handler *qmp_init_handler(libxl__gc *gc, uint32_t domid)
     }
     qmp->ctx = CTX;
     qmp->domid = domid;
-    qmp->timeout = 5;
+    //TODO: Changed default timeout because drive-mirror command takes a long
+    //TODO: to return. Consider timeout to be passed as param.
+    qmp->timeout = 600;
 
     LIBXL_STAILQ_INIT(&qmp->callback_list);
 
@@ -1069,6 +1071,117 @@ int libxl__qmp_nbd_server_add(libxl__gc *gc, int domid, const char *disk)
     return qmp_run_command(gc, domid, "nbd-server-add", args, NULL, NULL);
 }
 
+int libxl__qmp_drive_mirror(libxl__gc *gc, int domid, const char* device, const char* target, const char* format)
+{
+    libxl__json_object *args = NULL;
+    //TODO: Allow method to receive "sync", "speed", "mode", "granurality", "buf-size"
+    qmp_parameters_add_string(gc, &args, "device", device);
+    qmp_parameters_add_string(gc, &args, "target", target);
+    qmp_parameters_add_string(gc, &args, "sync", "full");
+    qmp_parameters_add_string(gc, &args, "format", format);
+    qmp_parameters_add_string(gc, &args, "mode", "existing");
+    qmp_parameters_add_integer(gc, &args, "granularity", 0);
+    qmp_parameters_add_integer(gc, &args, "buf-size", 0);
+
+    return qmp_run_command(gc, domid, "drive-mirror", args, NULL, NULL);
+}
+
+static int query_block_callback(libxl__qmp_handler *qmp,
+                               const libxl__json_object *response,
+                               void *opaque)
+{
+    const libxl__json_object *blockinfo = NULL;
+    GC_INIT(qmp->ctx);
+    int i, rc = -1;
+
+    for (i = 0; (blockinfo = libxl__json_array_get(response, i)); i++) {
+        const libxl__json_object *d;
+        const char* device_name;
+        d = libxl__json_map_get("device", blockinfo, JSON_STRING);
+        if(!d){
+            goto out;
+        }
+        device_name = libxl__json_object_get_string(d);
+    }
+
+    rc = 0;
+out:
+    GC_FREE;
+    return rc;
+}
+
+static int query_block_jobs_callback(libxl__qmp_handler *qmp,
+                               const libxl__json_object *response,
+                               void *opaque)
+{
+    const libxl__json_object *blockjobinfo = NULL;
+    GC_INIT(qmp->ctx);
+    int i, rc = -1;
+    bool empty = true;
+
+    for (i = 0; (blockjobinfo = libxl__json_array_get(response, i)); i++) {
+        empty = false;
+        const char *bjtype;
+        const char *bjdevice;
+        unsigned int bjlen;
+        unsigned int bjoffset;
+        bool bjbusy;
+        bool bjpaused;
+        const char *bjiostatus;
+        bool bjready;
+
+        const libxl__json_object *type = NULL;
+        const libxl__json_object *device = NULL;
+        const libxl__json_object *len = NULL;
+        const libxl__json_object *offset = NULL;
+        const libxl__json_object *busy = NULL;
+        const libxl__json_object *paused = NULL;
+        const libxl__json_object *io_status = NULL;
+        const libxl__json_object *ready = NULL;
+
+        type = libxl__json_map_get("type", blockjobinfo, JSON_STRING);
+        device = libxl__json_map_get("device", blockjobinfo, JSON_STRING);
+        len = libxl__json_map_get("len", blockjobinfo, JSON_INTEGER);
+        offset = libxl__json_map_get("offset", blockjobinfo, JSON_INTEGER);
+        busy = libxl__json_map_get("busy", blockjobinfo, JSON_BOOL);
+        paused = libxl__json_map_get("type", blockjobinfo, JSON_BOOL);
+        io_status = libxl__json_map_get("io-status", blockjobinfo, JSON_STRING);
+        ready = libxl__json_map_get("ready", blockjobinfo, JSON_BOOL);
+
+        bjtype = libxl__json_object_get_string(type);
+        bjdevice = libxl__json_object_get_string(device);
+        bjlen = libxl__json_object_get_integer(len);
+        bjoffset = libxl__json_object_get_integer(offset);
+        bjbusy = libxl__json_object_get_bool(len);
+        bjpaused = libxl__json_object_get_bool(paused);
+        bjiostatus = libxl__json_object_get_string(io_status);
+        bjready = libxl__json_object_get_bool(ready);
+
+        bool *is_ready = opaque;
+        *is_ready = bjready;
+    }
+
+    if(empty){
+        bool *is_ready = opaque;
+        *is_ready = true;
+    }
+
+    rc = 0;
+
+    GC_FREE;
+    return rc;
+}
+
+int libxl__qmp_query_block(libxl__gc *gc, int domid, char *device_names)
+{
+    return qmp_run_command(gc, domid, "query-block", NULL, query_block_callback, device_names);
+}
+
+int libxl__qmp_query_block_jobs(libxl__gc *gc, int domid, bool *is_ready)
+{
+    return qmp_run_command(gc, domid, "query-block-jobs", NULL, query_block_jobs_callback, is_ready);
+}
+
 int libxl__qmp_start_replication(libxl__gc *gc, int domid, bool primary)
 {
     libxl__json_object *args = NULL;
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
index 98b52b9..8791175 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -538,7 +538,7 @@ value stub_libxl_domain_create_restore(value ctx, value domain_config, value par
 
 	caml_enter_blocking_section();
 	ret = libxl_domain_create_restore(CTX, &c_dconfig, &c_domid, restore_fd,
-		-1, &c_params, ao_how, NULL);
+		-1, 0, &c_params, ao_how, NULL);
 	caml_leave_blocking_section();
 
 	free(ao_how);
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index aa95b77..dcdb80d 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -35,6 +35,7 @@ struct domain_create {
     int daemonize;
     int monitor; /* handle guest reboots etc */
     int paused;
+    int copy_local_disks;
     int dryrun;
     int quiet;
     int vnc;
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 1f0e87d..62b78ea 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -177,7 +177,8 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 }
 
 static void migrate_domain(uint32_t domid, const char *rune, int debug,
-                           const char *override_config_file)
+                           const char *override_config_file,
+                           int copy_local_disks, const char* hostname)
 {
     pid_t child = -1;
     int rc;
@@ -186,6 +187,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
     char rc_buf;
     uint8_t *config_data;
     int config_len, flags = LIBXL_SUSPEND_LIVE;
+    char* target;
 
     save_domain_core_begin(domid, override_config_file,
                            &config_data, &config_len);
@@ -232,6 +234,47 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 
     fprintf(stderr, "migration sender: Target has acknowledged transfer.\n");
 
+
+    /*
+     * If the -l was provided, the drive-mirror job is started.
+     * TODO: Move the following code as part of the domain_suspend
+     * TODO: The port should be sent by the destination.
+    */
+    if(copy_local_disks) {
+        fprintf(stderr, "Starting mirror-drive of device %s\n", QEMU_DRIVE_MIRROR_DEVICE);
+        xasprintf(&target, "nbd:%s:%s:exportname=%s", hostname, QEMU_DRIVE_MIRROR_PORT, QEMU_DRIVE_MIRROR_DEVICE);
+        rc = libxl__drive_mirror(ctx, domid, QEMU_DRIVE_MIRROR_DEVICE, target, "raw");
+        if (!rc) {
+            fprintf(stderr, "Drive mirror command returned successfully\n");
+        }else{
+            fprintf(stderr, "Sending drive mirror command failed\n");
+            goto cont;
+        }
+
+        /*
+         * Query job status until it is ready
+         * TODO: This code is just an inefficient busy wait. QMP sends an
+         * TODO: asynchronous message when mirroring job is completed. Consider
+         * TODO: adding the capability to handle asynchronous QMP messages (already done?)
+         */
+        bool job_is_ready = false;
+        while(!job_is_ready) {
+            fprintf(stderr, "Checking for drive-mirror job");
+            rc = libxl__query_block_jobs(ctx, domid, &job_is_ready);
+            if(rc){
+                fprintf(stderr, "Checking block job failed\n");
+                goto cont;
+            }else{
+                fprintf(stderr, "Checking block job succeeded\n");
+            }
+            if(!job_is_ready){
+                fprintf(stderr, "Sleeping 5 sec\n");
+                sleep(5);
+            }
+        }
+    }
+cont:
+
     if (common_domname) {
         xasprintf(&away_domname, "%s--migratedaway", common_domname);
         rc = libxl_domain_rename(ctx, domid, common_domname, away_domname);
@@ -316,7 +359,7 @@ static void migrate_domain(uint32_t domid, const char *rune, int debug,
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int pause_after_migration,
+                            int pause_after_migration, int copy_local_disks,
                             int send_fd, int recv_fd,
                             libxl_checkpointed_stream checkpointed,
                             char *colo_proxy_script,
@@ -343,6 +386,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
     dom_info.daemonize = daemonize;
     dom_info.monitor = monitor;
     dom_info.paused = 1;
+    dom_info.copy_local_disks = copy_local_disks;
     dom_info.migrate_fd = recv_fd;
     dom_info.send_back_fd = send_fd;
     dom_info.migration_domname_r = &migration_domname;
@@ -423,6 +467,14 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 
     fprintf(stderr, "migration target: Got permission, starting domain.\n");
 
+    fprintf(stderr, "Stopping NBD server\n");
+    rc = libxl__nbd_server_stop(ctx, domid);
+    if (rc){
+        fprintf(stderr, "Failed to stop NBD server\n");
+    }else{
+        fprintf(stderr, "Stopped NBD server successfully\n");
+    }
+
     if (migration_domname) {
         rc = libxl_domain_rename(ctx, domid, migration_domname, common_domname);
         if (rc) goto perhaps_destroy_notify_rc;
@@ -478,6 +530,7 @@ static void migrate_receive(int debug, int daemonize, int monitor,
 int main_migrate_receive(int argc, char **argv)
 {
     int debug = 0, daemonize = 1, monitor = 1, pause_after_migration = 0;
+    int copy_local_disks = 0;
     libxl_checkpointed_stream checkpointed = LIBXL_CHECKPOINTED_STREAM_NONE;
     int opt;
     bool userspace_colo_proxy = false;
@@ -490,7 +543,7 @@ int main_migrate_receive(int argc, char **argv)
         COMMON_LONG_OPTS
     };
 
-    SWITCH_FOREACH_OPT(opt, "Fedrp", opts, "migrate-receive", 0) {
+    SWITCH_FOREACH_OPT(opt, "Fedrpl", opts, "migrate-receive", 0) {
     case 'F':
         daemonize = 0;
         break;
@@ -516,6 +569,9 @@ int main_migrate_receive(int argc, char **argv)
     case 'p':
         pause_after_migration = 1;
         break;
+    case 'l':
+        copy_local_disks = 1;
+        break;
     }
 
     if (argc-optind != 0) {
@@ -523,7 +579,7 @@ int main_migrate_receive(int argc, char **argv)
         return EXIT_FAILURE;
     }
     migrate_receive(debug, daemonize, monitor, pause_after_migration,
-                    STDOUT_FILENO, STDIN_FILENO,
+                    copy_local_disks, STDOUT_FILENO, STDIN_FILENO,
                     checkpointed, script, userspace_colo_proxy);
 
     return EXIT_SUCCESS;
@@ -536,14 +592,16 @@ int main_migrate(int argc, char **argv)
     const char *ssh_command = "ssh";
     char *rune = NULL;
     char *host;
+    char *hostname;
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
+    int copy_local_disks = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
 
-    SWITCH_FOREACH_OPT(opt, "FC:s:ep", opts, "migrate", 2) {
+    SWITCH_FOREACH_OPT(opt, "FC:s:epl", opts, "migrate", 2) {
     case 'C':
         config_filename = optarg;
         break;
@@ -560,6 +618,9 @@ int main_migrate(int argc, char **argv)
     case 'p':
         pause_after_migration = 1;
         break;
+    case 'l':
+        copy_local_disks = 1;
+        break;
     case 0x100: /* --debug */
         debug = 1;
         break;
@@ -571,6 +632,9 @@ int main_migrate(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    hostname = strchr(host, '@');
+    hostname++;
+
     bool pass_tty_arg = progress_use_cr || (isatty(2) > 0);
 
     if (!ssh_command[0]) {
@@ -587,16 +651,17 @@ int main_migrate(int argc, char **argv)
         } else {
             verbose_len = (minmsglevel_default - minmsglevel) + 2;
         }
-        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s",
+        xasprintf(&rune, "exec %s %s xl%s%.*s migrate-receive%s%s%s%s",
                   ssh_command, host,
                   pass_tty_arg ? " -t" : "",
                   verbose_len, verbose_buf,
                   daemonize ? "" : " -e",
                   debug ? " -d" : "",
+                  copy_local_disks ? " -l" : "",
                   pause_after_migration ? " -p" : "");
     }
 
-    migrate_domain(domid, rune, debug, config_filename);
+    migrate_domain(domid, rune, debug, config_filename, copy_local_disks, hostname);
     return EXIT_SUCCESS;
 }
 
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 89c2b25..5ffbfb7 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -882,7 +882,7 @@ start:
 
         ret = libxl_domain_create_restore(ctx, &d_config,
                                           &domid, restore_fd,
-                                          send_back_fd, &params,
+                                          send_back_fd, dom_info->copy_local_disks, &params,
                                           0, autoconnect_console_how);
 
         libxl_domain_restore_params_dispose(&params);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-29 16:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-23  7:42 [PATCH RFC] Live migration for VMs with QEMU backed local storage Bruno Alvisio
2017-06-23  8:03 ` Roger Pau Monné
2017-06-26 10:06   ` George Dunlap
2017-06-26 23:16     ` Bruno Alvisio
2017-06-29 11:58 ` Wei Liu
2017-06-29 13:33   ` Bruno Alvisio
2017-06-29 13:56     ` Wei Liu
2017-06-29 14:34       ` Roger Pau Monné
2017-06-29 16:11         ` Wei Liu
  -- strict thread matches above, loose matches on Subject: below --
2017-06-23  7:31 Bruno Alvisio
2017-06-23 14:15 ` Konrad Rzeszutek Wilk
2017-06-23 14:20   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.