All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 2] libxl - Remus support
@ 2012-01-31  1:22 rshriram
  2012-01-31  1:22 ` [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks rshriram
  2012-01-31  1:22 ` [PATCH 2 of 2] libxl: Remus - xl remus command rshriram
  0 siblings, 2 replies; 6+ messages in thread
From: rshriram @ 2012-01-31  1:22 UTC (permalink / raw)
  To: xen-devel; +Cc: brendan, ian.jackson, ian.campbell

This patch series introduces a basic framework to
incorporate Remus into the libxl toolstack. The only functionality
currently implemented is memory checkpointing.

These patches depend on "libxl: refactor suspend/resume code" patch series.

Changes since previous version:
 * Receiving end of remus is now part of migrate-receive
 * Re-use code already present for live migration (refactored
   in the above mentioned patch series)

Shriram

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks
  2012-01-31  1:22 [PATCH 0 of 2] libxl - Remus support rshriram
@ 2012-01-31  1:22 ` rshriram
  2012-01-31 10:53   ` Ian Campbell
  2012-01-31  1:22 ` [PATCH 2 of 2] libxl: Remus - xl remus command rshriram
  1 sibling, 1 reply; 6+ messages in thread
From: rshriram @ 2012-01-31  1:22 UTC (permalink / raw)
  To: xen-devel; +Cc: brendan, ian.jackson, ian.campbell

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1327972786 28800
# Node ID 0129989da2cda789f86f2bc83d9c642a0bcebe60
# Parent  ffc99e708e90eb140b0a6f2e7087d567e71e9d0f
libxl: Remus - suspend/postflush/commit callbacks

 * Add libxl callback functions for Remus checkpoint suspend, postflush
   (aka resume) and checkpoint commit callbacks.
 * suspend callback is a stub that just bounces off
   libxl__domain_suspend_common_callback - which suspends the domain and
   saves the devices model state to a file.
 * resume callback currently just resumes the domain (and the device model).
 * commit callback just writes out the saved device model state to the
   network and sleeps for the checkpoint interval.

 * Future patches will augument these callbacks with more functionalities like
   issuing network buffer plug/unplug commands, disk checkpoint commands, etc.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r ffc99e708e90 -r 0129989da2cd tools/libxl/libxl.c
--- a/tools/libxl/libxl.c	Mon Jan 30 16:59:01 2012 -0800
+++ b/tools/libxl/libxl.c	Mon Jan 30 17:19:46 2012 -0800
@@ -480,7 +480,9 @@ int libxl_domain_suspend(libxl_ctx *ctx,
     int debug = info != NULL && info->flags & XL_SUSPEND_DEBUG;
     int rc = 0;
 
-    rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug);
+    rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug,
+                                      /* No Remus */ NULL);
+
     if (!rc && type == LIBXL_DOMAIN_TYPE_HVM)
         rc = libxl__domain_save_device_model(gc, domid, fd);
     GC_FREE;
diff -r ffc99e708e90 -r 0129989da2cd tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Mon Jan 30 16:59:01 2012 -0800
+++ b/tools/libxl/libxl_dom.c	Mon Jan 30 17:19:46 2012 -0800
@@ -404,6 +404,8 @@ struct suspendinfo {
     int hvm;
     unsigned int flags;
     int guest_responded;
+    int save_fd; /* Migration stream fd (for Remus) */
+    int interval; /* checkpoint interval (for Remus) */
 };
 
 static int libxl__domain_suspend_common_switch_qemu_logdirty(int domid, unsigned int enable, void *data)
@@ -609,9 +611,43 @@ static int libxl__domain_suspend_common_
     return 1;
 }
 
+static int libxl__remus_domain_suspend_callback(void *data)
+{
+    /* TODO: Issue disk and network checkpoint reqs. */
+    return libxl__domain_suspend_common_callback(data);
+}
+
+static int libxl__remus_domain_resume_callback(void *data)
+{
+    struct suspendinfo *si = data;
+    libxl_ctx *ctx = libxl__gc_owner(si->gc);
+
+    /* Resumes the domain and the device model */
+    if (libxl_domain_resume(ctx, si->domid, /* Fast Suspend */1))
+        return 0;
+
+    /* TODO: Deal with disk. Start a new network output buffer */
+    return 1;
+}
+
+static int libxl__remus_domain_checkpoint_callback(void *data)
+{
+    struct suspendinfo *si = data;
+
+    /* This would go into tailbuf. */
+    if (si->hvm &&
+        libxl__domain_save_device_model(si->gc, si->domid, si->save_fd))
+        return 0;
+
+    /* TODO: Wait for disk and memory ack, release network buffer */
+    usleep(si->interval * 1000);
+    return 1;
+}
+
 int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                  libxl_domain_type type,
-                                 int live, int debug)
+                                 int live, int debug,
+                                 const libxl_domain_remus_info *r_info)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     int flags;
@@ -642,10 +678,20 @@ int libxl__domain_suspend_common(libxl__
         return ERROR_INVAL;
     }
 
+    memset(&si, 0, sizeof(si));
     flags = (live) ? XCFLAGS_LIVE : 0
           | (debug) ? XCFLAGS_DEBUG : 0
           | (hvm) ? XCFLAGS_HVM : 0;
 
+    if (r_info != NULL) {
+        si.interval = r_info->interval;
+        if (r_info->compression)
+            flags |= XCFLAGS_CHECKPOINT_COMPRESS;
+        si.save_fd = fd;
+    }
+    else
+        si.save_fd = -1;
+
     si.domid = domid;
     si.flags = flags;
     si.hvm = hvm;
@@ -669,7 +715,27 @@ int libxl__domain_suspend_common(libxl__
     }
 
     memset(&callbacks, 0, sizeof(callbacks));
-    callbacks.suspend = libxl__domain_suspend_common_callback;
+    if (r_info != NULL) {
+        /* save_callbacks:
+         * suspend - called after expiration of checkpoint interval,
+         *           to *suspend* the domain.
+         *
+         * postcopy - called after the domain's dirty pages have been
+         *            copied into an output buffer. We *resume* the domain
+         *            & the device model, return to the caller. Caller then
+         *            flushes the output buffer, while the domain continues to run.
+         *
+         * checkpoint - called after the memory checkpoint has been flushed out
+         *              into the network. Send the saved device state, *wait*
+         *              for checkpoint ack and *release* the network buffer (TBD).
+         *              Then *sleep* for the checkpoint interval.
+         */
+        callbacks.suspend = libxl__remus_domain_suspend_callback;
+        callbacks.postcopy = libxl__remus_domain_resume_callback;
+        callbacks.checkpoint = libxl__remus_domain_checkpoint_callback;
+    } else
+        callbacks.suspend = libxl__domain_suspend_common_callback;
+
     callbacks.switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
     callbacks.data = &si;
 
diff -r ffc99e708e90 -r 0129989da2cd tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h	Mon Jan 30 16:59:01 2012 -0800
+++ b/tools/libxl/libxl_internal.h	Mon Jan 30 17:19:46 2012 -0800
@@ -275,7 +275,8 @@ _hidden int libxl__domain_restore_common
                                          int fd);
 _hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                          libxl_domain_type type,
-                                         int live, int debug);
+                                         int live, int debug,
+                                         const libxl_domain_remus_info *r_info);
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
diff -r ffc99e708e90 -r 0129989da2cd tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Mon Jan 30 16:59:01 2012 -0800
+++ b/tools/libxl/libxl_types.idl	Mon Jan 30 17:19:46 2012 -0800
@@ -397,3 +397,9 @@ libxl_sched_sedf = Struct("sched_sedf", 
     ("extratime", integer),
     ("weight", integer),
     ], dispose_fn=None)
+
+libxl_domain_remus_info = Struct("domain_remus_info",[
+    ("interval",     integer),
+    ("blackhole",    bool),
+    ("compression",  bool),
+    ])

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2 of 2] libxl: Remus - xl remus command
  2012-01-31  1:22 [PATCH 0 of 2] libxl - Remus support rshriram
  2012-01-31  1:22 ` [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks rshriram
@ 2012-01-31  1:22 ` rshriram
  2012-01-31 11:00   ` Ian Campbell
  1 sibling, 1 reply; 6+ messages in thread
From: rshriram @ 2012-01-31  1:22 UTC (permalink / raw)
  To: xen-devel; +Cc: brendan, ian.jackson, ian.campbell

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1327972788 28800
# Node ID 070f78ba4244cf8239ded3a69ccd54ac6e1bd24d
# Parent  0129989da2cda789f86f2bc83d9c642a0bcebe60
libxl: Remus - xl remus command

xl remus acts as a frontend to enable remus for a given domain.
 * At the moment, only memory checkpointing and blackhole replication is
   supported. Support for disk checkpointing and network buffering will
   be added in future.
 * Replication is done over ssh connection currently (like live migration
   with xl). Future versions will have an option to use simple tcp socket
   based replication channel (for both Remus & live migration).

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/libxl.c
--- a/tools/libxl/libxl.c	Mon Jan 30 17:19:46 2012 -0800
+++ b/tools/libxl/libxl.c	Mon Jan 30 17:19:48 2012 -0800
@@ -471,6 +471,40 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *
     return ptr;
 }
 
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+                             uint32_t domid, int fd)
+{
+    GC_INIT(ctx);
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    int rc = 0;
+
+    if (info == NULL) {
+        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
+                   "No remus_info structure supplied for domain %d", domid);
+        rc = -1;
+        goto remus_fail;
+    }
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    /* Point of no return */
+    rc = libxl__domain_suspend_common(gc, domid, fd, type, /* live */ 1,
+                                      /* debug */ 0, info);
+
+    /* 
+     * With Remus, if we reach this point, it means either
+     * backup died or some network error occurred preventing us
+     * from sending checkpoints.
+     */
+
+    /* TBD: Remus cleanup - i.e. detach qdisc, release other
+     * resources.
+     */
+ remus_fail:
+    GC_FREE;
+    return rc;
+}
+
 int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
                          uint32_t domid, int fd)
 {
diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/libxl.h
--- a/tools/libxl/libxl.h	Mon Jan 30 17:19:46 2012 -0800
+++ b/tools/libxl/libxl.h	Mon Jan 30 17:19:48 2012 -0800
@@ -266,6 +266,8 @@ typedef int (*libxl_console_ready)(libxl
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_console_ready cb, void *priv, uint32_t *domid);
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_console_ready cb, void *priv, uint32_t *domid, int restore_fd);
 void libxl_domain_config_dispose(libxl_domain_config *d_config);
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+                             uint32_t domid, int fd);
 int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
                           uint32_t domid, int fd);
 int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int fast);
diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/xl.h
--- a/tools/libxl/xl.h	Mon Jan 30 17:19:46 2012 -0800
+++ b/tools/libxl/xl.h	Mon Jan 30 17:19:48 2012 -0800
@@ -94,6 +94,7 @@ int main_cpupoolnumasplit(int argc, char
 int main_getenforce(int argc, char **argv);
 int main_setenforce(int argc, char **argv);
 int main_loadpolicy(int argc, char **argv);
+int main_remus(int argc, char **argv);
 
 void help(const char *command);
 
diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Mon Jan 30 17:19:46 2012 -0800
+++ b/tools/libxl/xl_cmdimpl.c	Mon Jan 30 17:19:48 2012 -0800
@@ -2846,7 +2846,7 @@ static void core_dump_domain(const char 
  * an ssh channel.
  */
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int send_fd, int recv_fd)
+                            int send_fd, int recv_fd, int remus)
 {
     int rc, rc2;
     char rc_buf;
@@ -2881,6 +2881,41 @@ static void migrate_receive(int debug, i
         exit(-rc);
     }
 
+    if (remus) {
+        /* If we are here, it means that the sender (primary) has crashed.
+         * TODO: Split-Brain Check.
+         */
+        fprintf(stderr, "migration target: Remus Failover for domain %u\n",
+                domid);
+
+        /*
+         * If domain renaming fails, lets just continue (as we need the domain
+         * to be up & dom names may not matter much, as long as its reachable
+         * over network).
+         *
+         * If domain unpausing fails, destroy domain ? Or is it better to have
+         * a consistent copy of the domain (memory, cpu state, disk)
+         * on atleast one physical host ? Right now, lets just leave the domain
+         * as is and let the Administrator decide (or troubleshoot).
+         */
+        if (migration_domname) {
+            rc = libxl_domain_rename(ctx, domid, migration_domname,
+                                     common_domname);
+            if (rc)
+                fprintf(stderr, "migration target (Remus): "
+                        "Failed to rename domain from %s to %s:%d\n",
+                        migration_domname, common_domname, rc);
+        }
+
+        rc = libxl_domain_unpause(ctx, domid);
+        if (rc)
+            fprintf(stderr, "migration target (Remus): "
+                    "Failed to unpause domain %s (id: %u):%d\n",
+                    common_domname, domid, rc);
+
+        exit(rc ? -ERROR_FAIL: 0);
+    }
+
     fprintf(stderr, "migration target: Transfer complete,"
             " requesting permission to start domain.\n");
 
@@ -3007,10 +3042,10 @@ int main_restore(int argc, char **argv)
 
 int main_migrate_receive(int argc, char **argv)
 {
-    int debug = 0, daemonize = 1, monitor = 1;
+    int debug = 0, daemonize = 1, monitor = 1, remus = 0;
     int opt;
 
-    while ((opt = def_getopt(argc, argv, "Fed", "migrate-receive", 0)) != -1) {
+    while ((opt = def_getopt(argc, argv, "Fedr", "migrate-receive", 0)) != -1) {
         switch (opt) {
         case 0: case 2:
             return opt;
@@ -3024,6 +3059,9 @@ int main_migrate_receive(int argc, char 
         case 'd':
             debug = 1;
             break;
+        case 'r':
+            remus = 1;
+            break;
         }
     }
 
@@ -3032,7 +3070,8 @@ int main_migrate_receive(int argc, char 
         return 2;
     }
     migrate_receive(debug, daemonize, monitor,
-                    /* write to stdout */1, /* read from stdin */0);
+                    /* write to stdout */ 1, /* read from stdin */ 0,
+                    remus);
     return 0;
 }
 
@@ -5968,6 +6007,91 @@ done:
     return ret;
 }
 
+int main_remus(int argc, char **argv)
+{
+    int opt, rc, daemonize = 1;
+    const char *ssh_command = "ssh";
+    char *host = NULL, *rune = NULL, *domain = NULL;
+    libxl_domain_remus_info r_info;
+    int send_fd = -1, recv_fd = -1;
+
+    memset(&r_info, 0, sizeof(libxl_domain_remus_info));
+    /* Defaults */
+    r_info.interval = 200;
+    r_info.blackhole = 0;
+    r_info.compression = 1;
+
+    while ((opt = def_getopt(argc, argv, "bui:s:e", "remus", 2)) != -1) {
+        switch (opt) {
+        case 0: case 2:
+            return opt;
+
+        case 'i':
+	    r_info.interval = atoi(optarg);
+            break;
+        case 'b':
+            r_info.blackhole = 1;
+            break;
+        case 'u':
+	    r_info.compression = 0;
+            break;
+        case 's':
+            ssh_command = optarg;
+            break;
+        case 'e':
+            daemonize = 0;
+            break;
+        }
+    }
+
+    domain = argv[optind];
+    host = argv[optind + 1];
+
+    if (r_info.blackhole) {
+        find_domain(domain);
+        send_fd = open("/dev/null", O_RDWR, 0644);
+        if (send_fd < 0) {
+            perror("failed to open /dev/null");
+            exit(-1);
+        }
+    } else {
+
+        /*
+         * TODO: change this to plain TCP socket based channel
+         * instead of SSH. For both live-migration and Remus.
+         */
+        if (!ssh_command[0]) {
+            rune = host;
+        } else {
+            if (asprintf(&rune, "exec %s %s xl migrate-receive -r %s",
+                         ssh_command, host,
+                         daemonize ? "" : " -e") < 0)
+                return 1;
+        }
+        migrate_do_preamble(domain, rune, NULL, &send_fd, &recv_fd, NULL);
+    }
+
+    /* Point of no return */
+    rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd);
+
+    /* If we are here, it means backup has failed/domain suspend failed.
+     * Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
+    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
+            " (rc=%d)\n", rc);
+
+    if (rc == ERROR_GUEST_TIMEDOUT)
+        fprintf(stderr, "Failed to suspend domain at primary.\n");
+    else {
+        fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n");
+        libxl_domain_resume(ctx, domid, 1);
+    }
+
+    close(send_fd);
+    return -ERROR_FAIL;
+}
+
 /*
  * Local variables:
  * mode: C
diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c	Mon Jan 30 17:19:46 2012 -0800
+++ b/tools/libxl/xl_cmdtable.c	Mon Jan 30 17:19:48 2012 -0800
@@ -412,6 +412,20 @@ struct cmd_spec cmd_table[] = {
       "Loads a new policy int the Flask Xen security module",
       "<policy file>",
     },
+    { "remus",
+      &main_remus, 0,
+      "Enable Remus HA for domain",
+      "[options] <Domain> [<host>]",
+      "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
+      "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
+      "-u                      Disable memory checkpoint compression.\n"
+      "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
+      "                        to sh. If empty, run <host> instead of \n"
+      "                        ssh <host> xl migrate-receive -r\n"
+      "-e                      Do not wait in the background (on <host>) for the death\n"
+      "                        of the domain."
+
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks
  2012-01-31  1:22 ` [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks rshriram
@ 2012-01-31 10:53   ` Ian Campbell
  2012-01-31 17:59     ` Shriram Rajagopalan
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2012-01-31 10:53 UTC (permalink / raw)
  To: rshriram; +Cc: brendan, xen-devel, Ian Jackson

> @@ -669,7 +715,27 @@ int libxl__domain_suspend_common(libxl__
>      }
>  
>      memset(&callbacks, 0, sizeof(callbacks));
> -    callbacks.suspend = libxl__domain_suspend_common_callback;
> +    if (r_info != NULL) {
> +        /* save_callbacks:
> +         * suspend - called after expiration of checkpoint interval,
> +         *           to *suspend* the domain.
> +         *
> +         * postcopy - called after the domain's dirty pages have been
> +         *            copied into an output buffer. We *resume* the domain
> +         *            & the device model, return to the caller. Caller then
> +         *            flushes the output buffer, while the domain continues to run.
> +         *
> +         * checkpoint - called after the memory checkpoint has been flushed out
> +         *              into the network. Send the saved device state, *wait*
> +         *              for checkpoint ack and *release* the network buffer (TBD).
> +         *              Then *sleep* for the checkpoint interval.
> +         */

Please can you put this next to the definition of the struct (in
xenguest.h). Otherwise this patch looks ok to me.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2 of 2] libxl: Remus - xl remus command
  2012-01-31  1:22 ` [PATCH 2 of 2] libxl: Remus - xl remus command rshriram
@ 2012-01-31 11:00   ` Ian Campbell
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2012-01-31 11:00 UTC (permalink / raw)
  To: rshriram; +Cc: brendan, xen-devel, Ian Jackson

On Tue, 2012-01-31 at 01:22 +0000, rshriram@cs.ubc.ca wrote:
> # HG changeset patch
> # User Shriram Rajagopalan <rshriram@cs.ubc.ca>
> # Date 1327972788 28800
> # Node ID 070f78ba4244cf8239ded3a69ccd54ac6e1bd24d
> # Parent  0129989da2cda789f86f2bc83d9c642a0bcebe60
> libxl: Remus - xl remus command
> 
> xl remus acts as a frontend to enable remus for a given domain.
>  * At the moment, only memory checkpointing and blackhole replication is
>    supported. Support for disk checkpointing and network buffering will
>    be added in future.
>  * Replication is done over ssh connection currently (like live migration
>    with xl). Future versions will have an option to use simple tcp socket
>    based replication channel (for both Remus & live migration).
> 
> Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> 
> diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/libxl.c
> --- a/tools/libxl/libxl.c	Mon Jan 30 17:19:46 2012 -0800
> +++ b/tools/libxl/libxl.c	Mon Jan 30 17:19:48 2012 -0800
> @@ -471,6 +471,40 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *
>      return ptr;
>  }
>  
> +int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
> +                             uint32_t domid, int fd)

This new function probably belongs in patch 1/2 with the other libxl
changes.

> +{
> +    GC_INIT(ctx);
> +    libxl_domain_type type = libxl__domain_type(gc, domid);
> +    int rc = 0;
> +
> +    if (info == NULL) {
> +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
> +                   "No remus_info structure supplied for domain %d", domid);
> +        rc = -1;

libxl functions should return one of ERROR_* not -1 one. In this case
ERROR_INVAL I suppose.

[...]

> diff -r 0129989da2cd -r 070f78ba4244 tools/libxl/xl_cmdtable.c
> --- a/tools/libxl/xl_cmdtable.c	Mon Jan 30 17:19:46 2012 -0800
> +++ b/tools/libxl/xl_cmdtable.c	Mon Jan 30 17:19:48 2012 -0800
> @@ -412,6 +412,20 @@ struct cmd_spec cmd_table[] = {
>        "Loads a new policy int the Flask Xen security module",
>        "<policy file>",
>      },
> +    { "remus",

Please also add this command to docs/man/xl.pod.1.

Otherwise this patch looks good, thanks.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks
  2012-01-31 10:53   ` Ian Campbell
@ 2012-01-31 17:59     ` Shriram Rajagopalan
  0 siblings, 0 replies; 6+ messages in thread
From: Shriram Rajagopalan @ 2012-01-31 17:59 UTC (permalink / raw)
  To: Ian Campbell; +Cc: brendan, xen-devel, Ian Jackson


[-- Attachment #1.1: Type: text/plain, Size: 1718 bytes --]

On Tue, Jan 31, 2012 at 2:53 AM, Ian Campbell <Ian.Campbell@citrix.com>wrote:

> > @@ -669,7 +715,27 @@ int libxl__domain_suspend_common(libxl__
> >      }
> >
> >      memset(&callbacks, 0, sizeof(callbacks));
> > -    callbacks.suspend = libxl__domain_suspend_common_callback;
> > +    if (r_info != NULL) {
> > +        /* save_callbacks:
> > +         * suspend - called after expiration of checkpoint interval,
> > +         *           to *suspend* the domain.
> > +         *
> > +         * postcopy - called after the domain's dirty pages have been
> > +         *            copied into an output buffer. We *resume* the
> domain
> > +         *            & the device model, return to the caller. Caller
> then
> > +         *            flushes the output buffer, while the domain
> continues to run.
> > +         *
> > +         * checkpoint - called after the memory checkpoint has been
> flushed out
> > +         *              into the network. Send the saved device state,
> *wait*
> > +         *              for checkpoint ack and *release* the network
> buffer (TBD).
> > +         *              Then *sleep* for the checkpoint interval.
> > +         */
>
> Please can you put this next to the definition of the struct (in
> xenguest.h). Otherwise this patch looks ok to me.
>
> Ian.
>
>
>
struct save_callbacks already has comments on these callbacks, it is
generic though,
as the postcopy and checkpoint callbacks could be used even without Remus.
One need
not necessarily do all the actions specified in the above comment.


But, with libxl/remus, it seemed proper to add more detailed comments to
the code
that does other things in the callback, specific to remus/checkpointing.

shriram

[-- Attachment #1.2: Type: text/html, Size: 2214 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-31 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31  1:22 [PATCH 0 of 2] libxl - Remus support rshriram
2012-01-31  1:22 ` [PATCH 1 of 2] libxl: Remus - suspend/postflush/commit callbacks rshriram
2012-01-31 10:53   ` Ian Campbell
2012-01-31 17:59     ` Shriram Rajagopalan
2012-01-31  1:22 ` [PATCH 2 of 2] libxl: Remus - xl remus command rshriram
2012-01-31 11:00   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.