All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
@ 2012-06-26 17:54 Ian Jackson
  2012-06-26 17:54 ` [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct Ian Jackson
                   ` (22 more replies)
  0 siblings, 23 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:54 UTC (permalink / raw)
  To: xen-devel

This is v5 of my series to asyncify save/restore, rebased to tip and
retested.  There are minor changes to 3 patches, as discussed on-list,
marked with "*" below:

    01/21 libxc: xc_domain_restore, make toolstack_restore const-correct
    02/21 libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails
    03/21 libxl: domain save: rename variables etc.
    04/21 libxl: domain restore: reshuffle, preparing for ao
  * 05/21 libxl: domain save: API changes for asynchrony
  * 06/21 libxl: domain save/restore: run in a separate process
    07/21 libxl: rename libxl_dom:save_helper to physmap_path
    08/21 libxl: provide libxl__xs_*_checked and libxl__xs_transaction_*
    09/21 libxl: wait for qemu to acknowledge logdirty command
    10/21 libxl: datacopier: provide "prefix data" facility
    11/21 libxl: prepare for asynchronous writing of qemu save file
    12/21 libxl: Make libxl__domain_save_device_model asynchronous
    13/21 libxl: Add a gc to libxl_get_cpu_topology
    14/21 libxl: Do not pass NULL as gc_opt; introduce NOGC
    15/21 libxl: Get compiler to warn about gc_opt==NULL
    16/21 xl: Handle return value from libxl_domain_suspend correctly
    17/21 libxl: do not leak dms->saved_state
    18/21 libxl: do not leak spawned middle children
    19/21 libxl: do not leak an event struct on ignored ao progress
  * 20/21 libxl: further fixups re LIBXL_DOMAIN_TYPE
  ! 21/21 libxl: DO NOT APPLY enforce prohibition on internal

All of these apart from the last have been acked and I intend to
commit those to xen-unstable.hg soon.

However, first I will invite Shriram to check that Remus is still
working.  (I can't conveniently do this with this message due to
shoddiness in git-send-email.)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
@ 2012-06-26 17:54 ` Ian Jackson
  2012-06-26 17:54 ` [PATCH 02/21] libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails Ian Jackson
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

Update the one provider of this callback, in libxl.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v3:
 * No longer introduce function pointer typedefs into the libxc API.
---
 tools/libxc/xenguest.h  |    2 +-
 tools/libxl/libxl_dom.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index 91d53f7..707e31c 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -92,7 +92,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
     /* callback to restore toolstack specific data */
-    int (*toolstack_restore)(uint32_t domid, uint8_t *buf,
+    int (*toolstack_restore)(uint32_t domid, const uint8_t *buf,
             uint32_t size, void* data);
 
     /* to be provided as the last argument to each callback function */
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a2e6655..6d63e0e 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -469,13 +469,13 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
             domid, phys_offset, node);
 }
 
-static int libxl__toolstack_restore(uint32_t domid, uint8_t *buf,
+static int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
         uint32_t size, void *data)
 {
     libxl__gc *gc = (libxl__gc *) data;
     libxl_ctx *ctx = gc->owner;
     int i, ret;
-    uint8_t *ptr = buf;
+    const uint8_t *ptr = buf;
     uint32_t count = 0, version = 0;
     struct libxl__physmap_info* pi;
     char *xs_path;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/21] libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
  2012-06-26 17:54 ` [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct Ian Jackson
@ 2012-06-26 17:54 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 03/21] libxl: domain save: rename variables etc Ian Jackson
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

In xc_domain_save the local variable `ob' is initialised to NULL.
There are then various startup actions.  Some of these `goto out' on
failure; for example the call to callbacks->switch_qemu_logdirty on
l.978.  However, out is used both by success and error paths.  So it
attempts (l.2043) to flush the current output buffer.  If ob has not
yet been assigned a non-NULL value, this segfaults.  So make the call
to outbuf_flush conditional on ob.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/xc_domain_save.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index fcc7718..c359649 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -2040,7 +2040,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
     }
 
     /* Flush last write and discard cache for file. */
-    if ( outbuf_flush(xch, ob, io_fd) < 0 ) {
+    if ( ob && outbuf_flush(xch, ob, io_fd) < 0 ) {
         PERROR("Error when flushing output buffer");
         rc = 1;
     }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/21] libxl: domain save: rename variables etc.
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
  2012-06-26 17:54 ` [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct Ian Jackson
  2012-06-26 17:54 ` [PATCH 02/21] libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 04/21] libxl: domain restore: reshuffle, preparing for ao Ian Jackson
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

Preparatory work for making domain suspend asynchronous:

* Rename `struct suspendinfo' to `libxl__domain_suspend_state'
  and move it to libxl_internal.h.

* Rename variables `si' to `dss'.

* Change the stack-allocated state and callbacks from
    struct suspendinfo si;
    struct save_callbacks callbacks;
    struct restore_callbacks callbacks;
  to
    libxl__domain_suspend_state dss[1];
    struct save_callbacks callbacks[1];
    struct restore_callbacks callbacks[1];
  so that it may be referred to as a pointer variable everywhere.

* Rename the variable `flags' (in libxl__domain_suspend_state) to
  `xcflags', to help distinguish it from the other `flags' which is
  passed in from the calling application in libxl_domain_suspend_info.
  Abolish the local variable in libxl__domain_suspend_common, as it
  can use the one in the dss.

* Move the prototypes of suspend-related functions in libxl_internal.h
  to after the definition of the state struct.

* Replace several ctx variables with gc variables and
  consequently references to ctx with CTX.  Change references
  to `dss->gc' in the functional code to simply `gc'.

* Use LOG* rather than LIBXL__LOG* in a number of places.

* In libxl__domain_save_device_model use `rc' instead of `ret'.

* Introduce and use `gc' and `domid' in
  libxl__domain_suspend_common_callback.

* Wrap some long lines.

* Add an extra pair of parens for clarity in a flag test.

* Remove two pointless casts from void* to a struct*.

No functional change whatsoever.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v3:
 * Abolish local variables `xcflags' and `hvm' in
   libxl__domain_suspend_common; just use dss->xcflags and dss->hvm
   instead and hence do not lose some of the changes to xcflags.

Changes in v2:
 * Make callbacks into arrays (for pointerisation) too.
 * Updated to cope with new remus code.
 * Fixed typo in commit message.
---
 tools/libxl/libxl_dom.c      |  261 ++++++++++++++++++++----------------------
 tools/libxl/libxl_internal.h |   30 ++++-
 2 files changed, 151 insertions(+), 140 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6d63e0e..4202b4b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -472,7 +472,7 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
 static int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
         uint32_t size, void *data)
 {
-    libxl__gc *gc = (libxl__gc *) data;
+    libxl__gc *gc = data;
     libxl_ctx *ctx = gc->owner;
     int i, ret;
     const uint8_t *ptr = buf;
@@ -533,7 +533,7 @@ int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
     /* read signature */
     int rc;
     int hvm, pae, superpages;
-    struct restore_callbacks callbacks;
+    struct restore_callbacks callbacks[1];
     int no_incr_generationid;
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
@@ -541,8 +541,8 @@ int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
         superpages = 1;
         pae = libxl_defbool_val(info->u.hvm.pae);
         no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
-        callbacks.toolstack_restore = libxl__toolstack_restore;
-        callbacks.data = gc;
+        callbacks->toolstack_restore = libxl__toolstack_restore;
+        callbacks->data = gc;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         hvm = 0;
@@ -558,7 +558,7 @@ int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
                            state->store_domid, state->console_port,
                            &state->console_mfn, state->console_domid,
                            hvm, pae, superpages, no_incr_generationid,
-                           &state->vm_generationid_addr, &callbacks);
+                           &state->vm_generationid_addr, callbacks);
     if ( rc ) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "restoring domain");
         return ERROR_FAIL;
@@ -566,33 +566,23 @@ int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
     return 0;
 }
 
-struct suspendinfo {
-    libxl__gc *gc;
-    xc_evtchn *xce; /* event channel handle */
-    int suspend_eventchn;
-    int domid;
-    int hvm;
-    unsigned int flags;
-    int guest_responded;
-    int save_fd; /* Migration stream fd (for Remus) */
-    int interval; /* checkpoint interval (for Remus) */
-};
-
-static int libxl__domain_suspend_common_switch_qemu_logdirty(int domid, unsigned int enable, void *data)
+static int libxl__domain_suspend_common_switch_qemu_logdirty
+                               (int domid, unsigned int enable, void *data)
 {
-    struct suspendinfo *si = data;
-    libxl_ctx *ctx = libxl__gc_owner(si->gc);
+    libxl__domain_suspend_state *dss = data;
+    libxl__gc *gc = dss->gc;
     char *path;
     bool rc;
 
-    path = libxl__sprintf(si->gc, "/local/domain/0/device-model/%u/logdirty/cmd", domid);
+    path = libxl__sprintf(gc,
+                   "/local/domain/0/device-model/%u/logdirty/cmd", domid);
     if (!path)
         return 1;
 
     if (enable)
-        rc = xs_write(ctx->xsh, XBT_NULL, path, "enable", strlen("enable"));
+        rc = xs_write(CTX->xsh, XBT_NULL, path, "enable", strlen("enable"));
     else
-        rc = xs_write(ctx->xsh, XBT_NULL, path, "disable", strlen("disable"));
+        rc = xs_write(CTX->xsh, XBT_NULL, path, "disable", strlen("disable"));
 
     return rc ? 0 : 1;
 }
@@ -647,53 +637,56 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
 
 static int libxl__domain_suspend_common_callback(void *data)
 {
-    struct suspendinfo *si = data;
+    libxl__domain_suspend_state *dss = data;
+    libxl__gc *gc = dss->gc;
     unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
     int ret;
     char *state = "suspend";
     int watchdog;
-    libxl_ctx *ctx = libxl__gc_owner(si->gc);
     xs_transaction_t t;
 
-    if (si->hvm) {
-        xc_get_hvm_param(ctx->xch, si->domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv);
-        xc_get_hvm_param(ctx->xch, si->domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state);
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+
+    if (dss->hvm) {
+        xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_CALLBACK_IRQ, &hvm_pvdrv);
+        xc_get_hvm_param(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, &hvm_s_state);
     }
 
-    if ((hvm_s_state == 0) && (si->suspend_eventchn >= 0)) {
-        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "issuing %s suspend request via event channel",
-                   si->hvm ? "PVHVM" : "PV");
-        ret = xc_evtchn_notify(si->xce, si->suspend_eventchn);
+    if ((hvm_s_state == 0) && (dss->suspend_eventchn >= 0)) {
+        LOG(DEBUG, "issuing %s suspend request via event channel",
+            dss->hvm ? "PVHVM" : "PV");
+        ret = xc_evtchn_notify(dss->xce, dss->suspend_eventchn);
         if (ret < 0) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "xc_evtchn_notify failed ret=%d", ret);
+            LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret);
             return 0;
         }
-        ret = xc_await_suspend(ctx->xch, si->xce, si->suspend_eventchn);
+        ret = xc_await_suspend(CTX->xch, dss->xce, dss->suspend_eventchn);
         if (ret < 0) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "xc_await_suspend failed ret=%d", ret);
+            LOG(ERROR, "xc_await_suspend failed ret=%d", ret);
             return 0;
         }
-        si->guest_responded = 1;
+        dss->guest_responded = 1;
         goto guest_suspended;
     }
 
-    if (si->hvm && (!hvm_pvdrv || hvm_s_state)) {
-        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Calling xc_domain_shutdown on HVM domain");
-        xc_domain_shutdown(ctx->xch, si->domid, SHUTDOWN_suspend);
+    if (dss->hvm && (!hvm_pvdrv || hvm_s_state)) {
+        LOG(DEBUG, "Calling xc_domain_shutdown on HVM domain");
+        xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
         /* The guest does not (need to) respond to this sort of request. */
-        si->guest_responded = 1;
+        dss->guest_responded = 1;
     } else {
-        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "issuing %s suspend request via XenBus control node",
-                   si->hvm ? "PVHVM" : "PV");
+        LOG(DEBUG, "issuing %s suspend request via XenBus control node",
+            dss->hvm ? "PVHVM" : "PV");
 
-        libxl__domain_pvcontrol_write(si->gc, XBT_NULL, si->domid, "suspend");
+        libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");
 
-        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "wait for the guest to acknowledge suspend request");
+        LOG(DEBUG, "wait for the guest to acknowledge suspend request");
         watchdog = 60;
         while (!strcmp(state, "suspend") && watchdog > 0) {
             usleep(100000);
 
-            state = libxl__domain_pvcontrol_read(si->gc, XBT_NULL, si->domid);
+            state = libxl__domain_pvcontrol_read(gc, XBT_NULL, domid);
             if (!state) state = "";
 
             watchdog--;
@@ -709,17 +702,17 @@ static int libxl__domain_suspend_common_callback(void *data)
          * at the last minute.
          */
         if (!strcmp(state, "suspend")) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "guest didn't acknowledge suspend, cancelling request");
+            LOG(ERROR, "guest didn't acknowledge suspend, cancelling request");
         retry_transaction:
-            t = xs_transaction_start(ctx->xsh);
+            t = xs_transaction_start(CTX->xsh);
 
-            state = libxl__domain_pvcontrol_read(si->gc, t, si->domid);
+            state = libxl__domain_pvcontrol_read(gc, t, domid);
             if (!state) state = "";
 
             if (!strcmp(state, "suspend"))
-                libxl__domain_pvcontrol_write(si->gc, t, si->domid, "");
+                libxl__domain_pvcontrol_write(gc, t, domid, "");
 
-            if (!xs_transaction_end(ctx->xsh, t, 0))
+            if (!xs_transaction_end(CTX->xsh, t, 0))
                 if (errno == EAGAIN)
                     goto retry_transaction;
 
@@ -731,27 +724,29 @@ static int libxl__domain_suspend_common_callback(void *data)
          * case we lost the race while cancelling and should continue.
          */
         if (!strcmp(state, "suspend")) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "guest didn't acknowledge suspend, request cancelled");
+            LOG(ERROR, "guest didn't acknowledge suspend, request cancelled");
             return 0;
         }
 
-        LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "guest acknowledged suspend request");
-        si->guest_responded = 1;
+        LOG(DEBUG, "guest acknowledged suspend request");
+        dss->guest_responded = 1;
     }
 
-    LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "wait for the guest to suspend");
+    LOG(DEBUG, "wait for the guest to suspend");
     watchdog = 60;
     while (watchdog > 0) {
         xc_domaininfo_t info;
 
         usleep(100000);
-        ret = xc_domain_getinfolist(ctx->xch, si->domid, 1, &info);
-        if (ret == 1 && info.domain == si->domid && info.flags & XEN_DOMINF_shutdown) {
+        ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);
+        if (ret == 1 && info.domain == domid &&
+            (info.flags & XEN_DOMINF_shutdown)) {
             int shutdown_reason;
 
-            shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift) & XEN_DOMINF_shutdownmask;
+            shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift)
+                & XEN_DOMINF_shutdownmask;
             if (shutdown_reason == SHUTDOWN_suspend) {
-                LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "guest has suspended");
+                LOG(DEBUG, "guest has suspended");
                 goto guest_suspended;
             }
         }
@@ -759,15 +754,14 @@ static int libxl__domain_suspend_common_callback(void *data)
         watchdog--;
     }
 
-    LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "guest did not suspend");
+    LOG(ERROR, "guest did not suspend");
     return 0;
 
  guest_suspended:
-    if (si->hvm) {
-        ret = libxl__domain_suspend_device_model(si->gc, si->domid);
+    if (dss->hvm) {
+        ret = libxl__domain_suspend_device_model(dss->gc, dss->domid);
         if (ret) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
-                       "libxl__domain_suspend_device_model failed ret=%d", ret);
+            LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
             return 0;
         }
     }
@@ -785,9 +779,8 @@ static inline char *save_helper(libxl__gc *gc, uint32_t domid,
 static int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data)
 {
-    struct suspendinfo *si = (struct suspendinfo *) data;
-    libxl__gc *gc = (libxl__gc *) si->gc;
-    libxl_ctx *ctx = gc->owner;
+    libxl__domain_suspend_state *dss = data;
+    libxl__gc *gc = dss->gc;
     int i = 0;
     char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL;
     unsigned int num = 0;
@@ -816,21 +809,21 @@ static int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         char *xs_path;
         phys_offset = entries[i];
         if (phys_offset == NULL) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "phys_offset %d is NULL", i);
+            LOG(ERROR, "phys_offset %d is NULL", i);
             return -1;
         }
 
         xs_path = save_helper(gc, domid, phys_offset, "start_addr");
         start_addr = libxl__xs_read(gc, 0, xs_path);
         if (start_addr == NULL) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "%s is NULL", xs_path);
+            LOG(ERROR, "%s is NULL", xs_path);
             return -1;
         }
 
         xs_path = save_helper(gc, domid, phys_offset, "size");
         size = libxl__xs_read(gc, 0, xs_path);
         if (size == NULL) {
-            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "%s is NULL", xs_path);
+            LOG(ERROR, "%s is NULL", xs_path);
             return -1;
         }
 
@@ -866,11 +859,11 @@ static int libxl__remus_domain_suspend_callback(void *data)
 
 static int libxl__remus_domain_resume_callback(void *data)
 {
-    struct suspendinfo *si = data;
-    libxl_ctx *ctx = libxl__gc_owner(si->gc);
+    libxl__domain_suspend_state *dss = data;
+    libxl__gc *gc = dss->gc;
 
     /* Resumes the domain and the device model */
-    if (libxl_domain_resume(ctx, si->domid, /* Fast Suspend */1))
+    if (libxl_domain_resume(CTX, dss->domid, /* Fast Suspend */1))
         return 0;
 
     /* TODO: Deal with disk. Start a new network output buffer */
@@ -879,15 +872,15 @@ static int libxl__remus_domain_resume_callback(void *data)
 
 static int libxl__remus_domain_checkpoint_callback(void *data)
 {
-    struct suspendinfo *si = data;
+    libxl__domain_suspend_state *dss = data;
 
     /* This would go into tailbuf. */
-    if (si->hvm &&
-        libxl__domain_save_device_model(si->gc, si->domid, si->save_fd))
+    if (dss->hvm &&
+        libxl__domain_save_device_model(dss->gc, dss->domid, dss->save_fd))
         return 0;
 
     /* TODO: Wait for disk and memory ack, release network buffer */
-    usleep(si->interval * 1000);
+    usleep(dss->interval * 1000);
     return 1;
 }
 
@@ -896,12 +889,10 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                  int live, int debug,
                                  const libxl_domain_remus_info *r_info)
 {
-    libxl_ctx *ctx = libxl__gc_owner(gc);
-    int flags;
     int port;
-    struct save_callbacks callbacks;
-    struct suspendinfo si;
-    int hvm, rc = ERROR_FAIL;
+    struct save_callbacks callbacks[1];
+    libxl__domain_suspend_state dss[1];
+    int rc = ERROR_FAIL;
     unsigned long vm_generationid_addr;
 
     switch (type) {
@@ -914,82 +905,81 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
         addr = libxl__xs_read(gc, XBT_NULL, path);
 
         vm_generationid_addr = (addr) ? strtoul(addr, NULL, 0) : 0;
-        hvm = 1;
+        dss->hvm = 1;
         break;
     }
     case LIBXL_DOMAIN_TYPE_PV:
         vm_generationid_addr = 0;
-        hvm = 0;
+        dss->hvm = 0;
         break;
     default:
         return ERROR_INVAL;
     }
 
-    memset(&si, 0, sizeof(si));
-    flags = (live) ? XCFLAGS_LIVE : 0
+    dss->xcflags = (live) ? XCFLAGS_LIVE : 0
           | (debug) ? XCFLAGS_DEBUG : 0
-          | (hvm) ? XCFLAGS_HVM : 0;
+          | (dss->hvm) ? XCFLAGS_HVM : 0;
+
+    dss->domid = domid;
+    dss->gc = gc;
+    dss->suspend_eventchn = -1;
+    dss->guest_responded = 0;
 
     if (r_info != NULL) {
-        si.interval = r_info->interval;
+        dss->interval = r_info->interval;
         if (r_info->compression)
-            flags |= XCFLAGS_CHECKPOINT_COMPRESS;
-        si.save_fd = fd;
+            dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
+        dss->save_fd = fd;
     }
     else
-        si.save_fd = -1;
-
-    si.domid = domid;
-    si.flags = flags;
-    si.hvm = hvm;
-    si.gc = gc;
-    si.suspend_eventchn = -1;
-    si.guest_responded = 0;
+        dss->save_fd = -1;
 
-    si.xce = xc_evtchn_open(NULL, 0);
-    if (si.xce == NULL)
+    dss->xce = xc_evtchn_open(NULL, 0);
+    if (dss->xce == NULL)
         goto out;
     else
     {
-        port = xs_suspend_evtchn_port(si.domid);
+        port = xs_suspend_evtchn_port(dss->domid);
 
         if (port >= 0) {
-            si.suspend_eventchn = xc_suspend_evtchn_init(ctx->xch, si.xce, si.domid, port);
+            dss->suspend_eventchn =
+                xc_suspend_evtchn_init(CTX->xch, dss->xce, dss->domid, port);
 
-            if (si.suspend_eventchn < 0)
-                LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "Suspend event channel initialization failed");
+            if (dss->suspend_eventchn < 0)
+                LOG(WARN, "Suspend event channel initialization failed");
         }
     }
 
-    memset(&callbacks, 0, sizeof(callbacks));
+    memset(callbacks, 0, sizeof(*callbacks));
     if (r_info != NULL) {
-        callbacks.suspend = libxl__remus_domain_suspend_callback;
-        callbacks.postcopy = libxl__remus_domain_resume_callback;
-        callbacks.checkpoint = libxl__remus_domain_checkpoint_callback;
+        callbacks->suspend = libxl__remus_domain_suspend_callback;
+        callbacks->postcopy = libxl__remus_domain_resume_callback;
+        callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
     } else
-        callbacks.suspend = libxl__domain_suspend_common_callback;
+        callbacks->suspend = libxl__domain_suspend_common_callback;
 
-    callbacks.switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
-    callbacks.toolstack_save = libxl__toolstack_save;
-    callbacks.data = &si;
+    callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
+    callbacks->toolstack_save = libxl__toolstack_save;
+    callbacks->data = dss;
 
-    rc = xc_domain_save(ctx->xch, fd, domid, 0, 0, flags, &callbacks,
-                        hvm, vm_generationid_addr);
+    rc = xc_domain_save(CTX->xch, fd, domid, 0, 0, dss->xcflags, callbacks,
+                        dss->hvm, vm_generationid_addr);
     if ( rc ) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "saving domain: %s",
-                         si.guest_responded ?
+        LOGE(ERROR, "saving domain: %s",
+                         dss->guest_responded ?
                          "domain responded to suspend request" :
                          "domain did not respond to suspend request");
-        if ( !si.guest_responded )
+        if ( !dss->guest_responded )
             rc = ERROR_GUEST_TIMEDOUT;
         else
             rc = ERROR_FAIL;
     }
 
-    if (si.suspend_eventchn > 0)
-        xc_suspend_evtchn_release(ctx->xch, si.xce, domid, si.suspend_eventchn);
-    if (si.xce != NULL)
-        xc_evtchn_close(si.xce);
+    if (dss->suspend_eventchn > 0)
+        xc_suspend_evtchn_release(CTX->xch, dss->xce, domid,
+                                  dss->suspend_eventchn);
+    if (dss->xce != NULL)
+        xc_evtchn_close(dss->xce);
 
 out:
     return rc;
@@ -997,8 +987,7 @@ out:
 
 int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd)
 {
-    libxl_ctx *ctx = libxl__gc_owner(gc);
-    int ret, fd2 = -1, c;
+    int rc, fd2 = -1, c;
     char buf[1024];
     const char *filename = libxl__device_model_savefile(gc, domid);
     struct stat st;
@@ -1006,46 +995,46 @@ int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd)
 
     if (stat(filename, &st) < 0)
     {
-        LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "Unable to stat qemu save file\n");
-        ret = ERROR_FAIL;
+        LOG(ERROR, "Unable to stat qemu save file\n");
+        rc = ERROR_FAIL;
         goto out;
     }
 
     qemu_state_len = st.st_size;
-    LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Qemu state is %d bytes\n", qemu_state_len);
+    LOG(DEBUG, "Qemu state is %d bytes\n", qemu_state_len);
 
-    ret = libxl_write_exactly(ctx, fd, QEMU_SIGNATURE, strlen(QEMU_SIGNATURE),
+    rc = libxl_write_exactly(CTX, fd, QEMU_SIGNATURE, strlen(QEMU_SIGNATURE),
                               "saved-state file", "qemu signature");
-    if (ret)
+    if (rc)
         goto out;
 
-    ret = libxl_write_exactly(ctx, fd, &qemu_state_len, sizeof(qemu_state_len),
+    rc = libxl_write_exactly(CTX, fd, &qemu_state_len, sizeof(qemu_state_len),
                             "saved-state file", "saved-state length");
-    if (ret)
+    if (rc)
         goto out;
 
     fd2 = open(filename, O_RDONLY);
     if (fd2 < 0) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Unable to open qemu save file\n");
+        LOGE(ERROR, "Unable to open qemu save file\n");
         goto out;
     }
     while ((c = read(fd2, buf, sizeof(buf))) != 0) {
         if (c < 0) {
             if (errno == EINTR)
                 continue;
-            ret = errno;
+            rc = errno;
             goto out;
         }
-        ret = libxl_write_exactly(
-            ctx, fd, buf, c, "saved-state file", "qemu state");
-        if (ret)
+        rc = libxl_write_exactly(
+            CTX, fd, buf, c, "saved-state file", "qemu state");
+        if (rc)
             goto out;
     }
-    ret = 0;
+    rc = 0;
 out:
     if (fd2 >= 0) close(fd2);
     unlink(filename);
-    return ret;
+    return rc;
 }
 
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fa4c08f..f22bf94 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -786,10 +786,6 @@ _hidden int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
                                          libxl_domain_build_info *info,
                                          libxl__domain_build_state *state,
                                          int fd);
-_hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
-                                         libxl_domain_type type,
-                                         int live, int debug,
-                                         const libxl_domain_remus_info *r_info);
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
@@ -1778,6 +1774,23 @@ _hidden void libxl__datacopier_kill(libxl__datacopier_state *dc);
 _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
 
 
+/*----- Domain suspend (save) state structure -----*/
+
+typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
+
+struct libxl__domain_suspend_state {
+    libxl__gc *gc;
+    xc_evtchn *xce; /* event channel handle */
+    int suspend_eventchn;
+    int domid;
+    int hvm;
+    unsigned int xcflags;
+    int guest_responded;
+    int save_fd; /* Migration stream fd (for Remus) */
+    int interval; /* checkpoint interval (for Remus) */
+};
+
+
 /*----- openpty -----*/
 
 /*
@@ -1888,6 +1901,15 @@ struct libxl__domain_create_state {
          * for the non-stubdom device model. */
 };
 
+/*----- Domain suspend (save) functions -----*/
+
+_hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
+                                         libxl_domain_type type,
+                                         int live, int debug,
+                                         const libxl_domain_remus_info *r_info);
+
+
+
 /*
  * Convenience macros.
  */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/21] libxl: domain restore: reshuffle, preparing for ao
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (2 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 03/21] libxl: domain save: rename variables etc Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 05/21] libxl: domain save: API changes for asynchrony Ian Jackson
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

We are going to arrange that libxl, instead of calling
xc_domain_restore, calls a stub function which forks and execs a
helper program, so that restore can be asynchronous rather than
blocking the whole toolstack.

This stub function will be called libxl__xc_domain_restore.

However, its prospective call site is unsuitable for a function which
needs to make a callback, and is buried in two nested single-call-site
functions which are logically part of the domain creation procedure.

So we first abolish those single-call-site functions, integrate their
contents into domain creation in their proper temporal order, and
break out libxl__xc_domain_restore ready for its reimplementation.

No functional change - just the following reorganisation:

* Abolish libxl__domain_restore_common, as it had only one caller.
  Move its contents into (what was) domain_restore.

* There is a new stage function domcreate_rebuild_done containing what
  used to be the bulk of domcreate_bootloader_done, since
  domcreate_bootloader_done now simply starts the restore (or does the
  rebuild) and arranges to call the next stage.

* Move the contents of domain_restore into its correct place in the
  domain creation sequence.  We put it inside
  domcreate_bootloader_done, which now either calls
  libxl__xc_domain_restore which will call the new function
  domcreate_rebuild_done, or calls domcreate_rebuild_done directly.

* Various general-purpose local variables (`i' etc.) and convenience
  alias variables need to be shuffled about accordingly.

* Consequently libxl__toolstack_restore needs to gain external linkage
  as it is now in a different file to its user.

* Move the xc_domain_save callbacks struct from the stack into
  libxl__domain_create_state.

In general the moved code remains almost identical.  Two returns in
what used to be libxl__domain_restore_common have been changed to set
the return value and "goto out", and the call sites for the abolished
and new functions have been adjusted.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v2:
 * Also move the save callbacks
---
 tools/libxl/Makefile             |    1 +
 tools/libxl/libxl_create.c       |  244 +++++++++++++++++++++++--------------
 tools/libxl/libxl_dom.c          |   45 +-------
 tools/libxl/libxl_internal.h     |   19 +++-
 tools/libxl/libxl_save_callout.c |   37 ++++++
 5 files changed, 206 insertions(+), 140 deletions(-)
 create mode 100644 tools/libxl/libxl_save_callout.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index e7d5cc2..1d8b80a 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -67,6 +67,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
 			libxl_json.o libxl_aoutils.o \
+			libxl_save_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 67cd207..9c3c671 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -21,7 +21,6 @@
 #include "libxl_arch.h"
 
 #include <xc_dom.h>
-#include <xenguest.h>
 
 void libxl_domain_config_init(libxl_domain_config *d_config)
 {
@@ -372,89 +371,6 @@ out:
     return ret;
 }
 
-static int domain_restore(libxl__gc *gc, libxl_domain_build_info *info,
-                          uint32_t domid, int fd,
-                          libxl__domain_build_state *state)
-{
-    libxl_ctx *ctx = libxl__gc_owner(gc);
-    char **vments = NULL, **localents = NULL;
-    struct timeval start_time;
-    int i, ret, esave, flags;
-
-    ret = libxl__build_pre(gc, domid, info, state);
-    if (ret)
-        goto out;
-
-    ret = libxl__domain_restore_common(gc, domid, info, state, fd);
-    if (ret)
-        goto out;
-
-    gettimeofday(&start_time, NULL);
-
-    switch (info->type) {
-    case LIBXL_DOMAIN_TYPE_HVM:
-        vments = libxl__calloc(gc, 7, sizeof(char *));
-        vments[0] = "rtc/timeoffset";
-        vments[1] = (info->u.hvm.timeoffset) ? info->u.hvm.timeoffset : "";
-        vments[2] = "image/ostype";
-        vments[3] = "hvm";
-        vments[4] = "start_time";
-        vments[5] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
-        break;
-    case LIBXL_DOMAIN_TYPE_PV:
-        vments = libxl__calloc(gc, 11, sizeof(char *));
-        i = 0;
-        vments[i++] = "image/ostype";
-        vments[i++] = "linux";
-        vments[i++] = "image/kernel";
-        vments[i++] = (char *) state->pv_kernel.path;
-        vments[i++] = "start_time";
-        vments[i++] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
-        if (state->pv_ramdisk.path) {
-            vments[i++] = "image/ramdisk";
-            vments[i++] = (char *) state->pv_ramdisk.path;
-        }
-        if (state->pv_cmdline) {
-            vments[i++] = "image/cmdline";
-            vments[i++] = (char *) state->pv_cmdline;
-        }
-        break;
-    default:
-        ret = ERROR_INVAL;
-        goto out;
-    }
-    ret = libxl__build_post(gc, domid, info, state, vments, localents);
-    if (ret)
-        goto out;
-
-    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
-        ret = asprintf(&state->saved_state,
-                       XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
-        ret = (ret < 0) ? ERROR_FAIL : 0;
-    }
-
-out:
-    if (info->type == LIBXL_DOMAIN_TYPE_PV) {
-        libxl__file_reference_unmap(&state->pv_kernel);
-        libxl__file_reference_unmap(&state->pv_ramdisk);
-    }
-
-    esave = errno;
-
-    flags = fcntl(fd, F_GETFL);
-    if (flags == -1) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unable to get flags on restore fd");
-    } else {
-        flags &= ~O_NONBLOCK;
-        if (fcntl(fd, F_SETFL, flags) == -1)
-            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unable to put restore fd"
-                         " back to blocking mode");
-    }
-
-    errno = esave;
-    return ret;
-}
-
 int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info,
                        uint32_t *domid)
 {
@@ -635,10 +551,13 @@ static void domcreate_bootloader_console_available(libxl__egc *egc,
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
                                       int rc);
-
 static void domcreate_console_available(libxl__egc *egc,
                                         libxl__domain_create_state *dcs);
 
+static void domcreate_rebuild_done(libxl__egc *egc,
+                                   libxl__domain_create_state *dcs,
+                                   int ret);
+
 /* Our own function to clean up and call the user's callback.
  * The final call in the sequence. */
 static void domcreate_complete(libxl__egc *egc,
@@ -732,20 +651,20 @@ static void domcreate_console_available(libxl__egc *egc,
 
 static void domcreate_bootloader_done(libxl__egc *egc,
                                       libxl__bootloader_state *bl,
-                                      int ret)
+                                      int rc)
 {
     libxl__domain_create_state *dcs = CONTAINER_OF(bl, *dcs, bl);
     STATE_AO_GC(bl->ao);
-    int i;
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
-    libxl_ctx *const ctx = CTX;
+    struct restore_callbacks *const callbacks = &dcs->callbacks;
 
-    if (ret) goto error_out;
+    if (rc) domcreate_rebuild_done(egc, dcs, rc);
 
     /* consume bootloader outputs. state->pv_{kernel,ramdisk} have
      * been initialised by the bootloader already.
@@ -761,12 +680,153 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     dcs->dmss.dm.callback = domcreate_devmodel_started;
     dcs->dmss.callback = domcreate_devmodel_started;
 
-    if ( restore_fd >= 0 ) {
-        ret = domain_restore(gc, &d_config->b_info, domid, restore_fd, state);
+    if ( restore_fd < 0 ) {
+        rc = libxl__domain_build(gc, &d_config->b_info, domid, state);
+        domcreate_rebuild_done(egc, dcs, rc);
+        return;
+    }
+
+    /* Restore */
+
+    rc = libxl__build_pre(gc, domid, info, state);
+    if (rc)
+        goto out;
+
+    /* read signature */
+    int hvm, pae, superpages;
+    int no_incr_generationid;
+    switch (info->type) {
+    case LIBXL_DOMAIN_TYPE_HVM:
+        hvm = 1;
+        superpages = 1;
+        pae = libxl_defbool_val(info->u.hvm.pae);
+        no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
+        callbacks->toolstack_restore = libxl__toolstack_restore;
+        callbacks->data = gc;
+        break;
+    case LIBXL_DOMAIN_TYPE_PV:
+        hvm = 0;
+        superpages = 0;
+        pae = 1;
+        no_incr_generationid = 0;
+        break;
+    default:
+        rc = ERROR_INVAL;
+        goto out;
+    }
+    libxl__xc_domain_restore(egc, dcs,
+                             hvm, pae, superpages, no_incr_generationid);
+    return;
+
+ out:
+    libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
+}
+
+void libxl__xc_domain_restore_done(libxl__egc *egc,
+                                   libxl__domain_create_state *dcs,
+                                   int ret, int retval, int errnoval)
+{
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = libxl__gc_owner(gc);
+    char **vments = NULL, **localents = NULL;
+    struct timeval start_time;
+    int i, esave, flags;
+
+    /* convenience aliases */
+    const uint32_t domid = dcs->guest_domid;
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *const info = &d_config->b_info;
+    libxl__domain_build_state *const state = &dcs->build_state;
+    const int fd = dcs->restore_fd;
+
+    if (ret)
+        goto out;
+
+    if (retval) {
+        LOGEV(ERROR, errnoval, "restoring domain");
+        ret = ERROR_FAIL;
+        goto out;
+    }
+
+    gettimeofday(&start_time, NULL);
+
+    switch (info->type) {
+    case LIBXL_DOMAIN_TYPE_HVM:
+        vments = libxl__calloc(gc, 7, sizeof(char *));
+        vments[0] = "rtc/timeoffset";
+        vments[1] = (info->u.hvm.timeoffset) ? info->u.hvm.timeoffset : "";
+        vments[2] = "image/ostype";
+        vments[3] = "hvm";
+        vments[4] = "start_time";
+        vments[5] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
+        break;
+    case LIBXL_DOMAIN_TYPE_PV:
+        vments = libxl__calloc(gc, 11, sizeof(char *));
+        i = 0;
+        vments[i++] = "image/ostype";
+        vments[i++] = "linux";
+        vments[i++] = "image/kernel";
+        vments[i++] = (char *) state->pv_kernel.path;
+        vments[i++] = "start_time";
+        vments[i++] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
+        if (state->pv_ramdisk.path) {
+            vments[i++] = "image/ramdisk";
+            vments[i++] = (char *) state->pv_ramdisk.path;
+        }
+        if (state->pv_cmdline) {
+            vments[i++] = "image/cmdline";
+            vments[i++] = (char *) state->pv_cmdline;
+        }
+        break;
+    default:
+        ret = ERROR_INVAL;
+        goto out;
+    }
+    ret = libxl__build_post(gc, domid, info, state, vments, localents);
+    if (ret)
+        goto out;
+
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        ret = asprintf(&state->saved_state,
+                       XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
+        ret = (ret < 0) ? ERROR_FAIL : 0;
+    }
+
+out:
+    if (info->type == LIBXL_DOMAIN_TYPE_PV) {
+        libxl__file_reference_unmap(&state->pv_kernel);
+        libxl__file_reference_unmap(&state->pv_ramdisk);
+    }
+
+    esave = errno;
+
+    flags = fcntl(fd, F_GETFL);
+    if (flags == -1) {
+        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unable to get flags on restore fd");
     } else {
-        ret = libxl__domain_build(gc, &d_config->b_info, domid, state);
+        flags &= ~O_NONBLOCK;
+        if (fcntl(fd, F_SETFL, flags) == -1)
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unable to put restore fd"
+                         " back to blocking mode");
     }
 
+    errno = esave;
+    domcreate_rebuild_done(egc, dcs, ret);
+}
+
+static void domcreate_rebuild_done(libxl__egc *egc,
+                                   libxl__domain_create_state *dcs,
+                                   int ret)
+{
+    STATE_AO_GC(dcs->ao);
+    int i;
+
+    /* convenience aliases */
+    const uint32_t domid = dcs->guest_domid;
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl__domain_build_state *const state = &dcs->build_state;
+    libxl_ctx *const ctx = CTX;
+
     if (ret) {
         LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "cannot (re-)build domain: %d", ret);
         ret = ERROR_FAIL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4202b4b..d73b089 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -19,7 +19,6 @@
 
 #include <xenctrl.h>
 #include <xc_dom.h>
-#include <xenguest.h>
 
 #include <xen/hvm/hvm_info_table.h>
 
@@ -469,7 +468,7 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
             domid, phys_offset, node);
 }
 
-static int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
+int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
         uint32_t size, void *data)
 {
     libxl__gc *gc = data;
@@ -524,48 +523,6 @@ static int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
     return 0;
 }
 
-int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
-                                 libxl_domain_build_info *info,
-                                 libxl__domain_build_state *state,
-                                 int fd)
-{
-    libxl_ctx *ctx = libxl__gc_owner(gc);
-    /* read signature */
-    int rc;
-    int hvm, pae, superpages;
-    struct restore_callbacks callbacks[1];
-    int no_incr_generationid;
-    switch (info->type) {
-    case LIBXL_DOMAIN_TYPE_HVM:
-        hvm = 1;
-        superpages = 1;
-        pae = libxl_defbool_val(info->u.hvm.pae);
-        no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
-        callbacks->toolstack_restore = libxl__toolstack_restore;
-        callbacks->data = gc;
-        break;
-    case LIBXL_DOMAIN_TYPE_PV:
-        hvm = 0;
-        superpages = 0;
-        pae = 1;
-        no_incr_generationid = 0;
-        break;
-    default:
-        return ERROR_INVAL;
-    }
-    rc = xc_domain_restore(ctx->xch, fd, domid,
-                           state->store_port, &state->store_mfn,
-                           state->store_domid, state->console_port,
-                           &state->console_mfn, state->console_domid,
-                           hvm, pae, superpages, no_incr_generationid,
-                           &state->vm_generationid_addr, callbacks);
-    if ( rc ) {
-        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "restoring domain");
-        return ERROR_FAIL;
-    }
-    return 0;
-}
-
 static int libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data)
 {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f22bf94..28478ea 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -46,6 +46,7 @@
 
 #include <xenstore.h>
 #include <xenctrl.h>
+#include <xenguest.h>
 
 #include "xentoollog.h"
 
@@ -782,10 +783,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
                                  const char *old_name, const char *new_name,
                                  xs_transaction_t trans);
 
-_hidden int libxl__domain_restore_common(libxl__gc *gc, uint32_t domid,
-                                         libxl_domain_build_info *info,
-                                         libxl__domain_build_state *state,
-                                         int fd);
+_hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
+                                     uint32_t size, void *data);
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
@@ -1899,6 +1898,7 @@ struct libxl__domain_create_state {
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
          * for the non-stubdom device model. */
+    struct restore_callbacks callbacks;
 };
 
 /*----- Domain suspend (save) functions -----*/
@@ -1908,6 +1908,17 @@ _hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                          int live, int debug,
                                          const libxl_domain_remus_info *r_info);
 
+/* calls libxl__xc_domain_restore_done when done */
+_hidden void libxl__xc_domain_restore(libxl__egc *egc,
+                                      libxl__domain_create_state *dcs,
+                                      int hvm, int pae, int superpages,
+                                      int no_incr_generationid);
+/* If rc==0 then retval is the return value from xc_domain_save
+ * and errnoval is the errno value it provided.
+ * If rc!=0, retval and errnoval are undefined. */
+_hidden void libxl__xc_domain_restore_done(libxl__egc *egc,
+                                           libxl__domain_create_state *dcs,
+                                           int rc, int retval, int errnoval);
 
 
 /*
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
new file mode 100644
index 0000000..2f8db9f
--- /dev/null
+++ b/tools/libxl/libxl_save_callout.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2012      Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h"
+
+#include "libxl_internal.h"
+
+void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
+                              int hvm, int pae, int superpages,
+                              int no_incr_generationid)
+{
+    STATE_AO_GC(dcs->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dcs->guest_domid;
+    const int restore_fd = dcs->restore_fd;
+    libxl__domain_build_state *const state = &dcs->build_state;
+
+    int r = xc_domain_restore(CTX->xch, restore_fd, domid,
+                              state->store_port, &state->store_mfn,
+                              state->store_domid, state->console_port,
+                              &state->console_mfn, state->console_domid,
+                              hvm, pae, superpages, no_incr_generationid,
+                              &state->vm_generationid_addr, &dcs->callbacks);
+    libxl__xc_domain_restore_done(egc, dcs, 0, r, errno);
+}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/21] libxl: domain save: API changes for asynchrony
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (3 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 04/21] libxl: domain restore: reshuffle, preparing for ao Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 06/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

Change the internal and external APIs for domain save (suspend) to be
capable of asynchronous operation.  The implementation remains
synchronous.  The interfaces surrounding device model saving are still
synchronous.

Public API changes:

 * libxl_domain_save takes an ao_how.

 * libxl_domain_remus_start takes an ao_how.  If the
   libxl_domain_remus_info is NULL, we abort rather than returning an
   error.

 * The `suspend_callback' function passed to libxl_domain_save is
   never called by the existing implementation in libxl.  Abolish it.

 * libxl_domain_save takes its flags parameter as an argument.
   Thus libxl_domain_suspend_info is abolished.

 * XL_SUSPEND_* flags renamed to LIBXL_SAVE_*.

 * Callers in xl updated.

Internal code restructuring:

 * libxl__domain_suspend_state member types and names rationalised.

 * libxl__domain_suspend renamed from libxl__domain_suspend_common.
   (_common here actually meant "internal function").

 * libxl__domain_suspend takes a libxl__domain_suspend_state, which
   where the parameters to the operation are filled in by the caller.

 * xc_domain_save is now called via libxl__xc_domain_save which can
   itself become asynchronous.

 * Consequently, libxl__domain_suspend is split into two functions at
   the callback boundary; the second half is
   libxl__xc_domain_save_done.

 * libxl__domain_save_device_model is now called by the actual
   implementation rather than by the public wrapper.  It is already in
   its proper place in the domain save execution sequence.  So
   officially make it part of that execution sequence, renaming it to
   domain_save_device_model.

 * Effectively, rewrite the public wrapper functions
   libxl_domain_suspend and libxl_domain_remus_start.

 * Remove a needless #include <xenctrl.h>

 * libxl__domain_suspend aborts on unexpected domain types rather
   than mysteriously returning EINVAL.

 * struct save_callbacks moved from the stack to the dss.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v5:
 * Renamed remus_crashed_cb to remus_failover_cb.

Changes in v4:
 * libxl_domain_suspend now handles error from libxl__domain_type
   in the correct way.  (Hunk brought forward from domain type
   fixup patch.)
 * Comment clarifies that libxl__domain_suspend calls dss->callback
   when done.

Changes in v3:
 * Remove `hvm' and `xcflags' args to libxl__xc_domain_save.  Instead,
   just use the values from the dss.

Changes in v2:
 * Move save_callbacks too.
 * Merge with Remus changes.
 * Improvements to commit message.
 * Do not rename libxl_domain_suspend any more.
---
 tools/libxl/libxl.c              |   94 +++++++++++++++++++++---------
 tools/libxl/libxl.h              |   22 ++++----
 tools/libxl/libxl_dom.c          |  121 +++++++++++++++++++++++++++-----------
 tools/libxl/libxl_internal.h     |   45 ++++++++++++---
 tools/libxl/libxl_save_callout.c |   11 ++++
 tools/libxl/xl_cmdimpl.c         |    9 +--
 6 files changed, 214 insertions(+), 88 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6215923..6ec7471 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -648,32 +648,51 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *ctx, int *nb_vm)
     return ptr;
 }
 
+static void remus_failover_cb(libxl__egc *egc,
+                              libxl__domain_suspend_state *dss, int rc);
+
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
-                             uint32_t domid, int send_fd, int recv_fd)
+                             uint32_t domid, int send_fd, int recv_fd,
+                             const libxl_asyncop_how *ao_how)
 {
-    GC_INIT(ctx);
-    libxl_domain_type type = libxl__domain_type(gc, domid);
-    int rc = 0;
+    AO_CREATE(ctx, domid, ao_how);
+    libxl__domain_suspend_state *dss;
+    int rc;
 
+    libxl_domain_type type = libxl__domain_type(gc, domid);
     if (type == LIBXL_DOMAIN_TYPE_INVALID) {
         rc = ERROR_FAIL;
-        goto remus_fail;
+        goto out;
     }
 
-    if (info == NULL) {
-        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
-                   "No remus_info structure supplied for domain %d", domid);
-        rc = ERROR_INVAL;
-        goto remus_fail;
-    }
+    GCNEW(dss);
+    dss->ao = ao;
+    dss->callback = remus_failover_cb;
+    dss->domid = domid;
+    dss->fd = send_fd;
+    /* TODO do something with recv_fd */
+    dss->type = type;
+    dss->live = 1;
+    dss->debug = 0;
+    dss->remus = info;
+
+    assert(info);
 
     /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
 
     /* Point of no return */
-    rc = libxl__domain_suspend_common(gc, domid, send_fd, type, /* live */ 1,
-                                      /* debug */ 0, info);
+    libxl__domain_suspend(egc, dss);
+    return AO_INPROGRESS;
+
+ out:
+    return AO_ABORT(rc);
+}
 
+static void remus_failover_cb(libxl__egc *egc,
+                              libxl__domain_suspend_state *dss, int rc)
+{
+    STATE_AO_GC(dss->ao);
     /*
      * With Remus, if we reach this point, it means either
      * backup died or some network error occurred preventing us
@@ -683,27 +702,46 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
     /* TBD: Remus cleanup - i.e. detach qdisc, release other
      * resources.
      */
- remus_fail:
-    GC_FREE;
-    return rc;
+    libxl__ao_complete(egc, ao, rc);
 }
 
-int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
-                         uint32_t domid, int fd)
+static void domain_suspend_cb(libxl__egc *egc,
+                              libxl__domain_suspend_state *dss, int rc)
 {
-    GC_INIT(ctx);
+    STATE_AO_GC(dss->ao);
+    libxl__ao_complete(egc,ao,rc);
+
+}
+
+int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
+                         const libxl_asyncop_how *ao_how)
+{
+    AO_CREATE(ctx, domid, ao_how);
+    int rc;
+
     libxl_domain_type type = libxl__domain_type(gc, domid);
-    int live = info != NULL && info->flags & XL_SUSPEND_LIVE;
-    int debug = info != NULL && info->flags & XL_SUSPEND_DEBUG;
-    int rc = 0;
+    if (type == LIBXL_DOMAIN_TYPE_INVALID) {
+        rc = ERROR_FAIL;
+        goto out_err;
+    }
 
-    rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug,
-                                      /* No Remus */ NULL);
+    libxl__domain_suspend_state *dss;
+    GCNEW(dss);
 
-    if (!rc && type == LIBXL_DOMAIN_TYPE_HVM)
-        rc = libxl__domain_save_device_model(gc, domid, fd);
-    GC_FREE;
-    return rc;
+    dss->ao = ao;
+    dss->callback = domain_suspend_cb;
+
+    dss->domid = domid;
+    dss->fd = fd;
+    dss->type = type;
+    dss->live = flags & LIBXL_SUSPEND_LIVE;
+    dss->debug = flags & LIBXL_SUSPEND_DEBUG;
+
+    libxl__domain_suspend(egc, dss);
+    return AO_INPROGRESS;
+
+ out_err:
+    return AO_ABORT(rc);
 }
 
 int libxl_domain_pause(libxl_ctx *ctx, uint32_t domid)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 05f0e01..10d7115 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -347,13 +347,6 @@ typedef struct libxl__ctx libxl_ctx;
 
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx);
 
-typedef struct {
-#define XL_SUSPEND_DEBUG 1
-#define XL_SUSPEND_LIVE 2
-    int flags;
-    int (*suspend_callback)(void *, int);
-} libxl_domain_suspend_info;
-
 enum {
     ERROR_NONSPECIFIC = -1,
     ERROR_VERSION = -2,
@@ -514,16 +507,23 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
 
 void libxl_domain_config_init(libxl_domain_config *d_config);
 void libxl_domain_config_dispose(libxl_domain_config *d_config);
-int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
-                             uint32_t domid, int send_fd, int recv_fd);
-int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
-                          uint32_t domid, int fd);
+
+int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
+                         int flags, /* LIBXL_SUSPEND_* */
+                         const libxl_asyncop_how *ao_how);
+#define LIBXL_SUSPEND_DEBUG 1
+#define LIBXL_SUSPEND_LIVE 2
 
 /* @param suspend_cancel [from xenctrl.h:xc_domain_resume( @param fast )]
  *   If this parameter is true, use co-operative resume. The guest
  *   must support this.
  */
 int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel);
+
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+                             uint32_t domid, int send_fd, int recv_fd,
+                             const libxl_asyncop_how *ao_how);
+
 int libxl_domain_shutdown(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_reboot(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_destroy(libxl_ctx *ctx, uint32_t domid);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d73b089..c44dec0 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -17,13 +17,11 @@
 
 #include <glob.h>
 
-#include <xenctrl.h>
-#include <xc_dom.h>
+#include "libxl_internal.h"
 
+#include <xc_dom.h>
 #include <xen/hvm/hvm_info_table.h>
 
-#include "libxl_internal.h"
-
 libxl_domain_type libxl__domain_type(libxl__gc *gc, uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
@@ -523,11 +521,18 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
     return 0;
 }
 
-static int libxl__domain_suspend_common_switch_qemu_logdirty
+/*==================== Domain suspend (save) ====================*/
+
+static void domain_suspend_done(libxl__egc *egc,
+                        libxl__domain_suspend_state *dss, int rc);
+
+/*----- callbacks, called by xc_domain_save -----*/
+
+int libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data)
 {
     libxl__domain_suspend_state *dss = data;
-    libxl__gc *gc = dss->gc;
+    STATE_AO_GC(dss->ao);
     char *path;
     bool rc;
 
@@ -592,10 +597,10 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
     return 0;
 }
 
-static int libxl__domain_suspend_common_callback(void *data)
+int libxl__domain_suspend_common_callback(void *data)
 {
     libxl__domain_suspend_state *dss = data;
-    libxl__gc *gc = dss->gc;
+    STATE_AO_GC(dss->ao);
     unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
     int ret;
     char *state = "suspend";
@@ -716,7 +721,7 @@ static int libxl__domain_suspend_common_callback(void *data)
 
  guest_suspended:
     if (dss->hvm) {
-        ret = libxl__domain_suspend_device_model(dss->gc, dss->domid);
+        ret = libxl__domain_suspend_device_model(gc, dss->domid);
         if (ret) {
             LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
             return 0;
@@ -733,11 +738,11 @@ static inline char *save_helper(libxl__gc *gc, uint32_t domid,
             domid, phys_offset, node);
 }
 
-static int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
+int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data)
 {
     libxl__domain_suspend_state *dss = data;
-    libxl__gc *gc = dss->gc;
+    STATE_AO_GC(dss->ao);
     int i = 0;
     char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL;
     unsigned int num = 0;
@@ -808,6 +813,8 @@ static int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
     return 0;
 }
 
+/*----- remus callbacks -----*/
+
 static int libxl__remus_domain_suspend_callback(void *data)
 {
     /* TODO: Issue disk and network checkpoint reqs. */
@@ -817,7 +824,7 @@ static int libxl__remus_domain_suspend_callback(void *data)
 static int libxl__remus_domain_resume_callback(void *data)
 {
     libxl__domain_suspend_state *dss = data;
-    libxl__gc *gc = dss->gc;
+    STATE_AO_GC(dss->ao);
 
     /* Resumes the domain and the device model */
     if (libxl_domain_resume(CTX, dss->domid, /* Fast Suspend */1))
@@ -830,10 +837,11 @@ static int libxl__remus_domain_resume_callback(void *data)
 static int libxl__remus_domain_checkpoint_callback(void *data)
 {
     libxl__domain_suspend_state *dss = data;
+    STATE_AO_GC(dss->ao);
 
     /* This would go into tailbuf. */
     if (dss->hvm &&
-        libxl__domain_save_device_model(dss->gc, dss->domid, dss->save_fd))
+        libxl__domain_save_device_model(gc, dss->domid, dss->fd))
         return 0;
 
     /* TODO: Wait for disk and memory ack, release network buffer */
@@ -841,17 +849,23 @@ static int libxl__remus_domain_checkpoint_callback(void *data)
     return 1;
 }
 
-int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
-                                 libxl_domain_type type,
-                                 int live, int debug,
-                                 const libxl_domain_remus_info *r_info)
+/*----- main code for suspending, in order of execution -----*/
+
+void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 {
+    STATE_AO_GC(dss->ao);
     int port;
-    struct save_callbacks callbacks[1];
-    libxl__domain_suspend_state dss[1];
     int rc = ERROR_FAIL;
     unsigned long vm_generationid_addr;
 
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+    const libxl_domain_type type = dss->type;
+    const int live = dss->live;
+    const int debug = dss->debug;
+    const libxl_domain_remus_info *const r_info = dss->remus;
+    struct save_callbacks *const callbacks = &dss->callbacks;
+
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
         char *path;
@@ -870,15 +884,13 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
         dss->hvm = 0;
         break;
     default:
-        return ERROR_INVAL;
+        abort();
     }
 
     dss->xcflags = (live) ? XCFLAGS_LIVE : 0
           | (debug) ? XCFLAGS_DEBUG : 0
           | (dss->hvm) ? XCFLAGS_HVM : 0;
 
-    dss->domid = domid;
-    dss->gc = gc;
     dss->suspend_eventchn = -1;
     dss->guest_responded = 0;
 
@@ -886,10 +898,7 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
         dss->interval = r_info->interval;
         if (r_info->compression)
             dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
-        dss->save_fd = fd;
     }
-    else
-        dss->save_fd = -1;
 
     dss->xce = xc_evtchn_open(NULL, 0);
     if (dss->xce == NULL)
@@ -919,10 +928,28 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
     callbacks->toolstack_save = libxl__toolstack_save;
     callbacks->data = dss;
 
-    rc = xc_domain_save(CTX->xch, fd, domid, 0, 0, dss->xcflags, callbacks,
-                        dss->hvm, vm_generationid_addr);
-    if ( rc ) {
-        LOGE(ERROR, "saving domain: %s",
+    libxl__xc_domain_save(egc, dss, vm_generationid_addr);
+    return;
+
+ out:
+    domain_suspend_done(egc, dss, rc);
+}
+
+void libxl__xc_domain_save_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss,
+                                int rc, int retval, int errnoval)
+{
+    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases */
+    const libxl_domain_type type = dss->type;
+    const uint32_t domid = dss->domid;
+
+    if (rc)
+        goto out;
+
+    if (retval) {
+        LOGEV(ERROR, errnoval, "saving domain: %s",
                          dss->guest_responded ?
                          "domain responded to suspend request" :
                          "domain did not respond to suspend request");
@@ -930,16 +957,21 @@ int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
             rc = ERROR_GUEST_TIMEDOUT;
         else
             rc = ERROR_FAIL;
+        goto out;
     }
 
-    if (dss->suspend_eventchn > 0)
-        xc_suspend_evtchn_release(CTX->xch, dss->xce, domid,
-                                  dss->suspend_eventchn);
-    if (dss->xce != NULL)
-        xc_evtchn_close(dss->xce);
+    if (type == LIBXL_DOMAIN_TYPE_HVM) {
+        rc = libxl__domain_suspend_device_model(gc, domid);
+        if (rc) goto out;
+        
+        rc = libxl__domain_save_device_model(gc, domid, dss->fd);
+        if (rc) goto out;
+    }
+
+    rc = 0;
 
 out:
-    return rc;
+    domain_suspend_done(egc, dss, rc);
 }
 
 int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd)
@@ -994,6 +1026,25 @@ out:
     return rc;
 }
 
+static void domain_suspend_done(libxl__egc *egc,
+                        libxl__domain_suspend_state *dss, int rc)
+{
+    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = dss->domid;
+
+    if (dss->suspend_eventchn > 0)
+        xc_suspend_evtchn_release(CTX->xch, dss->xce, domid,
+                                  dss->suspend_eventchn);
+    if (dss->xce != NULL)
+        xc_evtchn_close(dss->xce);
+
+    dss->callback(egc, dss, rc);
+}
+
+/*==================== Miscellaneous ====================*/
+
 char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid)
 {
     char *s = libxl__sprintf(gc, LIBXL_UUID_FMT, LIBXL_UUID_BYTES(uuid));
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 28478ea..7cf1b04 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1777,16 +1777,28 @@ _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
 
+typedef void libxl__domain_suspend_cb(libxl__egc*,
+                                      libxl__domain_suspend_state*, int rc);
+
 struct libxl__domain_suspend_state {
-    libxl__gc *gc;
+    /* set by caller of libxl__domain_suspend */
+    libxl__ao *ao;
+    libxl__domain_suspend_cb *callback;
+
+    uint32_t domid;
+    int fd;
+    libxl_domain_type type;
+    int live;
+    int debug;
+    const libxl_domain_remus_info *remus;
+    /* private */
     xc_evtchn *xce; /* event channel handle */
     int suspend_eventchn;
-    int domid;
     int hvm;
-    unsigned int xcflags;
+    int xcflags;
     int guest_responded;
-    int save_fd; /* Migration stream fd (for Remus) */
     int interval; /* checkpoint interval (for Remus) */
+    struct save_callbacks callbacks;
 };
 
 
@@ -1903,10 +1915,27 @@ struct libxl__domain_create_state {
 
 /*----- Domain suspend (save) functions -----*/
 
-_hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
-                                         libxl_domain_type type,
-                                         int live, int debug,
-                                         const libxl_domain_remus_info *r_info);
+/* calls dss->callback when done */
+_hidden void libxl__domain_suspend(libxl__egc *egc,
+                                   libxl__domain_suspend_state *dss);
+
+
+/* calls libxl__xc_domain_suspend_done when done */
+_hidden void libxl__xc_domain_save(libxl__egc*, libxl__domain_suspend_state*,
+                                   unsigned long vm_generationid_addr);
+/* If rc==0 then retval is the return value from xc_domain_save
+ * and errnoval is the errno value it provided.
+ * If rc!=0, retval and errnoval are undefined. */
+_hidden void libxl__xc_domain_save_done(libxl__egc*,
+                                        libxl__domain_suspend_state*,
+                                        int rc, int retval, int errnoval);
+
+_hidden int libxl__domain_suspend_common_callback(void *data);
+_hidden int libxl__domain_suspend_common_switch_qemu_logdirty
+                               (int domid, unsigned int enable, void *data);
+_hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
+        uint32_t *len, void *data);
+
 
 /* calls libxl__xc_domain_restore_done when done */
 _hidden void libxl__xc_domain_restore(libxl__egc *egc,
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 2f8db9f..1b481ab 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -35,3 +35,14 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
                               &state->vm_generationid_addr, &dcs->callbacks);
     libxl__xc_domain_restore_done(egc, dcs, 0, r, errno);
 }
+
+void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
+                           unsigned long vm_generationid_addr)
+{
+    STATE_AO_GC(dss->ao);
+    int r;
+
+    r = xc_domain_save(CTX->xch, dss->fd, dss->domid, 0, 0, dss->xcflags,
+                       &dss->callbacks, dss->hvm, vm_generationid_addr);
+    libxl__xc_domain_save_done(egc, dss, 0, r, errno);
+}
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index afa0af6..4aea1c7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2817,7 +2817,7 @@ static int save_domain(const char *p, const char *filename, int checkpoint,
 
     save_domain_core_writeconfig(fd, filename, config_data, config_len);
 
-    CHK_ERRNO(libxl_domain_suspend(ctx, NULL, domid, fd));
+    CHK_ERRNO(libxl_domain_suspend(ctx, domid, fd, 0, NULL));
     close(fd);
 
     if (checkpoint)
@@ -2979,7 +2979,6 @@ static void migrate_domain(const char *domain_spec, const char *rune,
     pid_t child = -1;
     int rc;
     int send_fd = -1, recv_fd = -1;
-    libxl_domain_suspend_info suspinfo;
     char *away_domname;
     char rc_buf;
     uint8_t *config_data;
@@ -3001,9 +3000,7 @@ static void migrate_domain(const char *domain_spec, const char *rune,
 
     xtl_stdiostream_adjust_flags(logger, XTL_STDIOSTREAM_HIDE_PROGRESS, 0);
 
-    memset(&suspinfo, 0, sizeof(suspinfo));
-    suspinfo.flags |= XL_SUSPEND_LIVE;
-    rc = libxl_domain_suspend(ctx, &suspinfo, domid, send_fd);
+    rc = libxl_domain_suspend(ctx, domid, send_fd, LIBXL_SUSPEND_LIVE, NULL);
     if (rc) {
         fprintf(stderr, "migration sender: libxl_domain_suspend failed"
                 " (rc=%d)\n", rc);
@@ -6575,7 +6572,7 @@ int main_remus(int argc, char **argv)
     }
 
     /* Point of no return */
-    rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd);
+    rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
     /* If we are here, it means backup has failed/domain suspend failed.
      * Try to resume the domain and exit gracefully.
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/21] libxl: domain save/restore: run in a separate process
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (4 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 05/21] libxl: domain save: API changes for asynchrony Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 07/21] libxl: rename libxl_dom:save_helper to physmap_path Ian Jackson
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

libxenctrl expects to be able to simply run the save or restore
operation synchronously.  This won't work well in a process which is
trying to handle multiple domains.

The options are:

 - Block such a whole process (eg, the whole of libvirt) while
   migration completes (or until it fails).

 - Create a thread to run xc_domain_save and xc_domain_restore on.
   This is quite unpalatable.  Multithreaded programming is error
   prone enough without generating threads in libraries, particularly
   if the thread does some very complex operation.

 - Fork and run the operation in the child without execing.  This is
   no good because we would need to negotiate with the caller about
   fds we would inherit (and we might be a very large process).

 - Fork and exec a helper.

Of these options the latter is the most palatable.

Consequently:

 * A new helper program libxl-save-helper (which does both save and
   restore).  It will be installed in /usr/lib/xen/bin.  It does not
   link against libxl, only libxc, and its error handling does not
   need to be very advanced.  It does contain a plumbing through of
   the logging interface into the callback stream.

 * A small ad-hoc protocol between the helper and libxl which allows
   log messages and the libxc callbacks to be passed up and down.
   Protocol doc comment is in libxl_save_helper.c.

 * To avoid a lot of tedium the marshalling boilerplate (stubs for the
   helper and the callback decoder for libxl) is generated with a
   small perl script.

 * Implement new functionality to spawn the helper, monitor its
   output, provide responses, and check on its exit status.

 * The functions libxl__xc_domain_restore_done and
   libxl__xc_domain_save_done now turn out to want be called in the
   same place.  So make their state argument a void* so that the two
   functions are type compatible.

The domain save path still writes the qemu savefile synchronously.
This will need to be fixed in a subsequent patch.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v5:
 * assert that preserve_fds are >2.

Changes in v4:
 * Migration stream fd is handled specially by the run_helper
   function, rather than simply being a numarg.  Specifically:
     - dup it to a safe fd number if necessary.
     - clear cloexec flag fd before execing helper
 * Toolstack data fd argument to run_helper replaced with
   generic preserve_fds array, which get cloexec cleared.
 * libxl__xc_domain_save uses supplied callback function pointer,
   rather than calling libxl__toolstack_save directly;
   toolstack data save callback is only supplied to libxc if
   in-libxl caller supplied a callback.
 * libxl-save-helper is not needlessly linked against libxl.
 * Code which prepares pipes for helper clarified.
 * Deal properly with, and log properly, POLLPRI/POLLERR on
   pipe to save helper.
 * Spelling fix in perl script comment.
 * In message generator, use better names for the ends of serial
   conditional here documents.
 * Makefile does $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)

Changes in v3:
 * Suppress errno value in debug message when helper reports successful
   completion.
 * Significant consequential changes to cope with changes to
   earlier patches in the series.

Changes in v2:
 * Helper path can be overridden by an environment variable for testing.
 * Add a couple of debug logging messages re toolstack data.
 * Fixes from testing.
 * Helper protocol message lengths (and numbers) are 16-bit which
   more clearly avoids piling lots of junk on the stack.
 * Merged with remus changes.
 * Callback implementations in libxl now called via pointers
   so remus can have its own callbacks.
 * Better namespace prefixes on autogenerated names etc.
 * Autogenerator can generate debugging printfs too.
---
 .gitignore                         |    1 +
 .hgignore                          |    2 +
 tools/libxl/Makefile               |   22 ++-
 tools/libxl/libxl_create.c         |   22 ++-
 tools/libxl/libxl_dom.c            |   36 ++--
 tools/libxl/libxl_internal.h       |   56 +++++-
 tools/libxl/libxl_save_callout.c   |  368 ++++++++++++++++++++++++++++++++-
 tools/libxl/libxl_save_helper.c    |  281 +++++++++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |  397 ++++++++++++++++++++++++++++++++++++
 9 files changed, 1147 insertions(+), 38 deletions(-)
 create mode 100644 tools/libxl/libxl_save_helper.c
 create mode 100755 tools/libxl/libxl_save_msgs_gen.pl

diff --git a/.gitignore b/.gitignore
index 7770e54..3451e52 100644
--- a/.gitignore
+++ b/.gitignore
@@ -353,6 +353,7 @@ tools/libxl/_*.[ch]
 tools/libxl/testidl
 tools/libxl/testidl.c
 tools/libxl/*.pyc
+tools/libxl/libxl-save-helper
 tools/blktap2/control/tap-ctl
 tools/firmware/etherboot/eb-roms.h
 tools/firmware/etherboot/gpxe-git-snapshot.tar.gz
diff --git a/.hgignore b/.hgignore
index 27d8f79..05304ea 100644
--- a/.hgignore
+++ b/.hgignore
@@ -180,9 +180,11 @@
 ^tools/libxl/_.*\.c$
 ^tools/libxl/libxlu_cfg_y\.output$
 ^tools/libxl/xl$
+^tools/libxl/libxl-save-helper$
 ^tools/libxl/testidl$
 ^tools/libxl/testidl\.c$
 ^tools/libxl/tmp\..*$
+^tools/libxl/.*\.new$
 ^tools/libvchan/vchan-node[12]$
 ^tools/libaio/src/.*\.ol$
 ^tools/libaio/src/.*\.os$
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 1d8b80a..ddc2624 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -67,25 +67,30 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
 			libxl_json.o libxl_aoutils.o \
-			libxl_save_callout.o \
+			libxl_save_callout.o _libxl_save_msgs_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
 $(LIBXL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) $(CFLAGS_libxenstore) $(CFLAGS_libblktapctl) -include $(XEN_ROOT)/tools/config.h
 
-AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h
+AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \
+	_libxl_save_msgs_callout.h _libxl_save_msgs_helper.h
 AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c
+AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c
 LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \
 	libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o
 $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
 
-CLIENTS = xl testidl
+CLIENTS = xl testidl libxl-save-helper
 
 XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxenlight)
 $(XL_OBJS): CFLAGS += -include $(XEN_ROOT)/tools/config.h # libxl_json.h needs it.
 
+SAVE_HELPER_OBJS = libxl_save_helper.o _libxl_save_msgs_helper.o
+$(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
+
 testidl.o: CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenlight)
 testidl.c: libxl_types.idl gentest.py libxl.h $(AUTOINCS)
 	$(PYTHON) gentest.py libxl_types.idl testidl.c.new
@@ -117,6 +122,12 @@ _libxl_list.h: $(XEN_INCLUDE)/xen-external/bsd-sys-queue-h-seddery $(XEN_INCLUDE
 	perl $^ --prefix=libxl >$@.new
 	$(call move-if-changed,$@.new,$@)
 
+_libxl_save_msgs_helper.c _libxl_save_msgs_callout.c \
+_libxl_save_msgs_helper.h _libxl_save_msgs_callout.h: \
+		libxl_save_msgs_gen.pl
+	$(PERL) -w $< $@ >$@.new
+	$(call move-if-changed,$@.new,$@)
+
 libxl.h: _libxl_types.h
 libxl_json.h: _libxl_types_json.h
 libxl_internal.h: _libxl_types_internal.h _paths.h
@@ -159,6 +170,9 @@ libxlutil.a: $(LIBXLU_OBJS)
 xl: $(XL_OBJS) libxlutil.so libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS)
 
+libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
+	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
+
 testidl: testidl.o libxlutil.so libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ testidl.o libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
@@ -169,7 +183,9 @@ install: all
 	$(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)
 	$(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR)
 	$(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR)
+	$(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
 	$(INSTALL_PROG) xl $(DESTDIR)$(SBINDIR)
+	$(INSTALL_PROG) libxl-save-helper $(DESTDIR)$(PRIVATE_BINDIR)
 	$(INSTALL_PROG) libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)
 	ln -sf libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)/libxenlight.so.$(MAJOR)
 	ln -sf libxenlight.so.$(MAJOR) $(DESTDIR)$(LIBDIR)/libxenlight.so
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9c3c671..7b92539 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -662,7 +662,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
-    struct restore_callbacks *const callbacks = &dcs->callbacks;
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->shs.callbacks.restore.a;
 
     if (rc) domcreate_rebuild_done(egc, dcs, rc);
 
@@ -702,7 +703,6 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         pae = libxl_defbool_val(info->u.hvm.pae);
         no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
         callbacks->toolstack_restore = libxl__toolstack_restore;
-        callbacks->data = gc;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         hvm = 0;
@@ -722,10 +722,24 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
-void libxl__xc_domain_restore_done(libxl__egc *egc,
-                                   libxl__domain_create_state *dcs,
+void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
+          unsigned long console_mfn, unsigned long genidad, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    STATE_AO_GC(dcs->ao);
+    libxl__domain_build_state *const state = &dcs->build_state;
+
+    state->store_mfn =            store_mfn;
+    state->console_mfn =          console_mfn;
+    state->vm_generationid_addr = genidad;
+    shs->need_results =           0;
+}
+
+void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                    int ret, int retval, int errnoval)
 {
+    libxl__domain_create_state *dcs = dcs_void;
     STATE_AO_GC(dcs->ao);
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char **vments = NULL, **localents = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index c44dec0..b52d29a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -467,16 +467,20 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
 }
 
 int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
-        uint32_t size, void *data)
+                             uint32_t size, void *user)
 {
-    libxl__gc *gc = data;
-    libxl_ctx *ctx = gc->owner;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = CTX;
     int i, ret;
     const uint8_t *ptr = buf;
     uint32_t count = 0, version = 0;
     struct libxl__physmap_info* pi;
     char *xs_path;
 
+    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, size);
+
     if (size < sizeof(version) + sizeof(count)) {
         LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "wrong size");
         return -1;
@@ -529,9 +533,10 @@ static void domain_suspend_done(libxl__egc *egc,
 /*----- callbacks, called by xc_domain_save -----*/
 
 int libxl__domain_suspend_common_switch_qemu_logdirty
-                               (int domid, unsigned int enable, void *data)
+                               (int domid, unsigned enable, void *user)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
     char *path;
     bool rc;
@@ -597,9 +602,10 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
     return 0;
 }
 
-int libxl__domain_suspend_common_callback(void *data)
+int libxl__domain_suspend_common_callback(void *user)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
     unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
     int ret;
@@ -739,9 +745,9 @@ static inline char *save_helper(libxl__gc *gc, uint32_t domid,
 }
 
 int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
-        uint32_t *len, void *data)
+        uint32_t *len, void *dss_void)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__domain_suspend_state *dss = dss_void;
     STATE_AO_GC(dss->ao);
     int i = 0;
     char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL;
@@ -810,6 +816,8 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         ptr += sizeof(struct libxl__physmap_info) + namelen;
     }
 
+    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, *len);
+
     return 0;
 }
 
@@ -864,7 +872,8 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     const int live = dss->live;
     const int debug = dss->debug;
     const libxl_domain_remus_info *const r_info = dss->remus;
-    struct save_callbacks *const callbacks = &dss->callbacks;
+    libxl__srm_save_autogen_callbacks *const callbacks =
+        &dss->shs.callbacks.save.a;
 
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
@@ -925,8 +934,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
         callbacks->suspend = libxl__domain_suspend_common_callback;
 
     callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
-    callbacks->toolstack_save = libxl__toolstack_save;
-    callbacks->data = dss;
+    dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
     libxl__xc_domain_save(egc, dss, vm_generationid_addr);
     return;
@@ -935,10 +943,10 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     domain_suspend_done(egc, dss, rc);
 }
 
-void libxl__xc_domain_save_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss,
+void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
                                 int rc, int retval, int errnoval)
 {
+    libxl__domain_suspend_state *dss = dss_void;
     STATE_AO_GC(dss->ao);
 
     /* Convenience aliases */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7cf1b04..1a7b526 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -54,6 +54,7 @@
 
 #include "libxl.h"
 #include "_paths.h"
+#include "_libxl_save_msgs_callout.h"
 
 #if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
 #define _hidden __attribute__((visibility("hidden")))
@@ -1773,6 +1774,51 @@ _hidden void libxl__datacopier_kill(libxl__datacopier_state *dc);
 _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
 
 
+/*----- Save/restore helper (used by creation and suspend) -----*/
+
+typedef struct libxl__srm_save_callbacks {
+    libxl__srm_save_autogen_callbacks a;
+    int (*toolstack_save)(uint32_t domid, uint8_t **buf,
+                          uint32_t *len, void *data);
+} libxl__srm_save_callbacks;
+
+typedef struct libxl__srm_restore_callbacks {
+    libxl__srm_restore_autogen_callbacks a;
+} libxl__srm_restore_callbacks;
+
+/* a pointer to this struct is also passed as "user" to the
+ * save callout helper callback functions */
+typedef struct libxl__save_helper_state {
+    /* public, caller of run_helper initialises */
+    libxl__ao *ao;
+    uint32_t domid;
+    union {
+        libxl__srm_save_callbacks save;
+        libxl__srm_restore_callbacks restore;
+    } callbacks;
+    int (*recv_callback)(const unsigned char *msg, uint32_t len, void *user);
+    void (*completion_callback)(libxl__egc *egc, void *caller_state,
+                                int rc, int retval, int errnoval);
+    void *caller_state;
+    int need_results; /* set to 0 or 1 by caller of run_helper;
+                       * if set to 1 then the ultimate caller's
+                       * results function must set it to 0 */
+    /* private */
+    int rc;
+    int completed; /* retval/errnoval valid iff completed */
+    int retval, errnoval; /* from xc_domain_save / xc_domain_restore */
+    libxl__carefd *pipes[2]; /* 0 = helper's stdin, 1 = helper's stdout */
+    libxl__ev_fd readable;
+    libxl__ev_child child;
+    const char *stdin_what, *stdout_what;
+    FILE *toolstack_data_file;
+
+    libxl__egc *egc; /* valid only for duration of each event callback;
+                      * is here in this struct for the benefit of the
+                      * marshalling and xc callback functions */
+} libxl__save_helper_state;
+
+
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
@@ -1798,7 +1844,7 @@ struct libxl__domain_suspend_state {
     int xcflags;
     int guest_responded;
     int interval; /* checkpoint interval (for Remus) */
-    struct save_callbacks callbacks;
+    libxl__save_helper_state shs;
 };
 
 
@@ -1910,7 +1956,7 @@ struct libxl__domain_create_state {
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
          * for the non-stubdom device model. */
-    struct restore_callbacks callbacks;
+    libxl__save_helper_state shs;
 };
 
 /*----- Domain suspend (save) functions -----*/
@@ -1926,8 +1972,7 @@ _hidden void libxl__xc_domain_save(libxl__egc*, libxl__domain_suspend_state*,
 /* If rc==0 then retval is the return value from xc_domain_save
  * and errnoval is the errno value it provided.
  * If rc!=0, retval and errnoval are undefined. */
-_hidden void libxl__xc_domain_save_done(libxl__egc*,
-                                        libxl__domain_suspend_state*,
+_hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
                                         int rc, int retval, int errnoval);
 
 _hidden int libxl__domain_suspend_common_callback(void *data);
@@ -1945,8 +1990,7 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
 /* If rc==0 then retval is the return value from xc_domain_save
  * and errnoval is the errno value it provided.
  * If rc!=0, retval and errnoval are undefined. */
-_hidden void libxl__xc_domain_restore_done(libxl__egc *egc,
-                                           libxl__domain_create_state *dcs,
+_hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                            int rc, int retval, int errnoval);
 
 
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1b481ab..19fff1b 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -16,6 +16,30 @@
 
 #include "libxl_internal.h"
 
+/* stream_fd is as from the caller (eventually, the application).
+ * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
+ * The actual fd value is not included in the supplied argnums; rather
+ * it will be automatically supplied by run_helper as the 2nd argument.
+ *
+ * preserve_fds are fds that the caller is intending to pass to the
+ * helper so which need cloexec clearing.  They may not be 0, 1 or 2.
+ * An entry may be -1 in which case it will be ignored.
+ */
+static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
+                       const char *mode_arg,
+                       int stream_fd,
+                       const int *preserve_fds, int num_preserve_fds,
+                       const unsigned long *argnums, int num_argnums);
+
+static void helper_failed(libxl__egc*, libxl__save_helper_state *shs, int rc);
+static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
+                                   int fd, short events, short revents);
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+                          pid_t pid, int status);
+static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs);
+
+/*----- entrypoints -----*/
+
 void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
                               int hvm, int pae, int superpages,
                               int no_incr_generationid)
@@ -27,22 +51,344 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
-    int r = xc_domain_restore(CTX->xch, restore_fd, domid,
-                              state->store_port, &state->store_mfn,
-                              state->store_domid, state->console_port,
-                              &state->console_mfn, state->console_domid,
-                              hvm, pae, superpages, no_incr_generationid,
-                              &state->vm_generationid_addr, &dcs->callbacks);
-    libxl__xc_domain_restore_done(egc, dcs, 0, r, errno);
+    unsigned cbflags = libxl__srm_callout_enumcallbacks_restore
+        (&dcs->shs.callbacks.restore.a);
+
+    const unsigned long argnums[] = {
+        domid,
+        state->store_port,
+        state->store_domid, state->console_port,
+        state->console_domid,
+        hvm, pae, superpages, no_incr_generationid,
+        cbflags,
+    };
+
+    dcs->shs.ao = ao;
+    dcs->shs.domid = domid;
+    dcs->shs.recv_callback = libxl__srm_callout_received_restore;
+    dcs->shs.completion_callback = libxl__xc_domain_restore_done;
+    dcs->shs.caller_state = dcs;
+    dcs->shs.need_results = 1;
+    dcs->shs.toolstack_data_file = 0;
+
+    run_helper(egc, &dcs->shs, "--restore-domain", restore_fd, 0,0,
+               argnums, ARRAY_SIZE(argnums));
 }
 
 void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
                            unsigned long vm_generationid_addr)
 {
     STATE_AO_GC(dss->ao);
-    int r;
+    int r, rc, toolstack_data_fd = -1;
+    uint32_t toolstack_data_len = 0;
+
+    /* Resources we need to free */
+    uint8_t *toolstack_data_buf = 0;
+
+    unsigned cbflags = libxl__srm_callout_enumcallbacks_save
+        (&dss->shs.callbacks.save.a);
+
+    if (dss->shs.callbacks.save.toolstack_save) {
+        r = dss->shs.callbacks.save.toolstack_save
+            (dss->domid, &toolstack_data_buf, &toolstack_data_len, dss);
+        if (r) { rc = ERROR_FAIL; goto out; }
+
+        dss->shs.toolstack_data_file = tmpfile();
+        if (!dss->shs.toolstack_data_file) {
+            LOGE(ERROR, "cannot create toolstack data tmpfile");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        toolstack_data_fd = fileno(dss->shs.toolstack_data_file);
+
+        r = libxl_write_exactly(CTX, toolstack_data_fd,
+                                toolstack_data_buf, toolstack_data_len,
+                                "toolstack data tmpfile", 0);
+        if (r) { rc = ERROR_FAIL; goto out; }
+
+        r = lseek(toolstack_data_fd, 0, SEEK_SET);
+        if (r) {
+            LOGE(ERROR, "rewind toolstack data tmpfile");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    const unsigned long argnums[] = {
+        dss->domid, 0, 0, dss->xcflags, dss->hvm, vm_generationid_addr,
+        toolstack_data_fd, toolstack_data_len,
+        cbflags,
+    };
+
+    dss->shs.ao = ao;
+    dss->shs.domid = dss->domid;
+    dss->shs.recv_callback = libxl__srm_callout_received_save;
+    dss->shs.completion_callback = libxl__xc_domain_save_done;
+    dss->shs.caller_state = dss;
+    dss->shs.need_results = 0;
+
+    free(toolstack_data_buf);
+
+    run_helper(egc, &dss->shs, "--save-domain", dss->fd,
+               &toolstack_data_fd, 1,
+               argnums, ARRAY_SIZE(argnums));
+    return;
+
+ out:
+    free(toolstack_data_buf);
+    if (dss->shs.toolstack_data_file) fclose(dss->shs.toolstack_data_file);
+
+    libxl__xc_domain_save_done(egc, dss, rc, 0, 0);
+}
+
+
+/*----- helper execution -----*/
+
+static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
+                       const char *mode_arg, int stream_fd,
+                       const int *preserve_fds, int num_preserve_fds,
+                       const unsigned long *argnums, int num_argnums)
+{
+    STATE_AO_GC(shs->ao);
+    const char *args[4 + num_argnums];
+    const char **arg = args;
+    int i, rc;
+
+    /* Resources we must free */
+    libxl__carefd *childs_pipes[2] = { 0,0 };
+
+    /* Convenience aliases */
+    const uint32_t domid = shs->domid;
+
+    shs->rc = 0;
+    shs->completed = 0;
+    shs->pipes[0] = shs->pipes[1] = 0;
+    libxl__ev_fd_init(&shs->readable);
+    libxl__ev_child_init(&shs->child);
+
+    shs->stdin_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
+                                " stdin pipe", domid);
+    shs->stdout_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
+                                 " stdout pipe", domid);
+
+    *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC "/" "libxl-save-helper";
+    *arg++ = mode_arg;
+    const char **stream_fd_arg = arg++;
+    for (i=0; i<num_argnums; i++)
+        *arg++ = GCSPRINTF("%lu", argnums[i]);
+    *arg++ = 0;
+    assert(arg == args + ARRAY_SIZE(args));
+
+    libxl__carefd_begin();
+    int childfd;
+    for (childfd=0; childfd<2; childfd++) {
+        /* Setting up the pipe for the child's fd childfd */
+        int fds[2];
+        if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; }
+        int childs_end = childfd==0 ? 0 /*read*/  : 1 /*write*/;
+        int our_end    = childfd==0 ? 1 /*write*/ : 0 /*read*/;
+        childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]);
+        shs->pipes[childfd] =   libxl__carefd_record(CTX, fds[our_end]);
+    }
+    libxl__carefd_unlock();
+
+    pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
+    if (!pid) {
+        if (stream_fd <= 2) {
+            stream_fd = dup(stream_fd);
+            if (stream_fd < 0) {
+                LOGE(ERROR,"dup migration stream fd");
+                exit(-1);
+            }
+        }
+        libxl_fd_set_cloexec(CTX, stream_fd, 0);
+        *stream_fd_arg = GCSPRINTF("%d", stream_fd);
+
+        for (i=0; i<num_preserve_fds; i++)
+            if (preserve_fds[i] >= 0) {
+                assert(preserve_fds[i] > 2);
+                libxl_fd_set_cloexec(CTX, preserve_fds[i], 0);
+            }
+
+        libxl__exec(gc,
+                    libxl__carefd_fd(childs_pipes[0]),
+                    libxl__carefd_fd(childs_pipes[1]),
+                    -1,
+                    args[0], (char**)args, 0);
+    }
+
+    libxl__carefd_close(childs_pipes[0]);
+    libxl__carefd_close(childs_pipes[1]);
+
+    rc = libxl__ev_fd_register(gc, &shs->readable, helper_stdout_readable,
+                               libxl__carefd_fd(shs->pipes[1]), POLLIN|POLLPRI);
+    if (rc) goto out;
+    return;
+
+ out:
+    libxl__carefd_close(childs_pipes[0]);
+    libxl__carefd_close(childs_pipes[1]);
+    helper_failed(egc, shs, rc);;
+}
+
+static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
+                          int rc)
+{
+    STATE_AO_GC(shs->ao);
+
+    if (!shs->rc)
+        shs->rc = rc;
+
+    libxl__ev_fd_deregister(gc, &shs->readable);
+
+    if (!libxl__ev_child_inuse(&shs->child)) {
+        helper_done(egc, shs);
+        return;
+    }
+
+    int r = kill(shs->child.pid, SIGKILL);
+    if (r) LOGE(WARN, "failed to kill save/restore helper [%lu]",
+                (unsigned long)shs->child.pid);
+}
+
+static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
+                                   int fd, short events, short revents)
+{
+    libxl__save_helper_state *shs = CONTAINER_OF(ev, *shs, readable);
+    STATE_AO_GC(shs->ao);
+    int rc, errnoval;
+
+    if (revents & (POLLERR|POLLPRI)) {
+        LOG(ERROR, "%s signaled POLLERR|POLLPRI (%#x)",
+            shs->stdout_what, revents);
+        rc = ERROR_FAIL;
+ out:
+        /* this is here because otherwise we bypass the decl of msg[] */
+        helper_failed(egc, shs, rc);
+        return;
+    }
+
+    uint16_t msglen;
+    errnoval = libxl_read_exactly(CTX, fd, &msglen, sizeof(msglen),
+                                  shs->stdout_what, "ipc msg header");
+    if (errnoval) { rc = ERROR_FAIL; goto out; }
+
+    unsigned char msg[msglen];
+    errnoval = libxl_read_exactly(CTX, fd, msg, msglen,
+                                  shs->stdout_what, "ipc msg body");
+    if (errnoval) { rc = ERROR_FAIL; goto out; }
+
+    shs->egc = egc;
+    shs->recv_callback(msg, msglen, shs);
+    shs->egc = 0;
+    return;
+}
+
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+                          pid_t pid, int status)
+{
+    libxl__save_helper_state *shs = CONTAINER_OF(ch, *shs, child);
+    STATE_AO_GC(shs->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = shs->domid;
+
+    const char *what =
+        GCSPRINTF("domain %"PRIu32" save/restore helper", domid);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, XTL_ERROR, what, pid, status);
+        shs->rc = ERROR_FAIL;
+    }
+
+    if (shs->need_results) {
+        if (!shs->rc)
+            LOG(ERROR,"%s exited without providing results",what);
+        shs->rc = ERROR_FAIL;
+    }
+
+    if (!shs->completed) {
+        if (!shs->rc)
+            LOG(ERROR,"%s exited without signaling completion",what);
+        shs->rc = ERROR_FAIL;
+    }
+
+    helper_done(egc, shs);
+    return;
+}
+
+static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs)
+{
+    STATE_AO_GC(shs->ao);
+
+    libxl__ev_fd_deregister(gc, &shs->readable);
+    libxl__carefd_close(shs->pipes[0]);  shs->pipes[0] = 0;
+    libxl__carefd_close(shs->pipes[1]);  shs->pipes[1] = 0;
+    assert(!libxl__ev_child_inuse(&shs->child));
+    if (shs->toolstack_data_file) fclose(shs->toolstack_data_file);
+
+    shs->egc = egc;
+    shs->completion_callback(egc, shs->caller_state,
+                             shs->rc, shs->retval, shs->errnoval);
+    shs->egc = 0;
+}
+
+/*----- generic helpers for the autogenerated code -----*/
+
+const libxl__srm_save_autogen_callbacks*
+libxl__srm_callout_get_callbacks_save(void *user)
+{
+    libxl__save_helper_state *shs = user;
+    return &shs->callbacks.save.a;
+}
+
+const libxl__srm_restore_autogen_callbacks*
+libxl__srm_callout_get_callbacks_restore(void *user)
+{
+    libxl__save_helper_state *shs = user;
+    return &shs->callbacks.restore.a;
+}
+
+void libxl__srm_callout_sendreply(int r, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(shs->ao);
+    int errnoval;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   &r, sizeof(r), shs->stdin_what,
+                                   "callback return value");
+    if (errnoval)
+        helper_failed(egc, shs, ERROR_FAIL);
+}
+
+void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
+                  const char *context, const char *formatted, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
+    xtl_log(CTX->lg, level, errnoval, context, "%s", formatted);
+}
+
+void libxl__srm_callout_callback_progress(const char *context,
+                   const char *doing_what, unsigned long done,
+                   unsigned long total, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
+    xtl_progress(CTX->lg, context, doing_what, done, total);
+}
+
+int libxl__srm_callout_callback_complete(int retval, int errnoval,
+                                         void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
 
-    r = xc_domain_save(CTX->xch, dss->fd, dss->domid, 0, 0, dss->xcflags,
-                       &dss->callbacks, dss->hvm, vm_generationid_addr);
-    libxl__xc_domain_save_done(egc, dss, 0, r, errno);
+    shs->completed = 1;
+    shs->retval = retval;
+    shs->errnoval = errnoval;
+    libxl__ev_fd_deregister(gc, &shs->readable);
+    return 0;
 }
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
new file mode 100644
index 0000000..3bdfa28
--- /dev/null
+++ b/tools/libxl/libxl_save_helper.c
@@ -0,0 +1,281 @@
+/*
+ * Copyright (C) 2012      Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+/*
+ * The libxl-save-helper utility speaks a protocol to its caller for
+ * the callbacks.  The protocol is as follows.
+ *
+ * The helper talks on stdin and stdout, in binary in machine
+ * endianness.  The helper speaks first, and only when it has a
+ * callback to make.  It writes a 16-bit number being the message
+ * length, and then the message body.
+ *
+ * Each message starts with a 16-bit number indicating which of the
+ * messages it is, and then some arguments in a binary marshalled form.
+ * If the callback does not need a reply (it returns void), the helper
+ * just continues.  Otherwise the helper waits for its caller to send a
+ * single int which is to be the return value from the callback.
+ *
+ * Where feasible the stubs and callbacks have prototypes identical to
+ * those required by xc_domain_save and xc_domain_restore, so that the
+ * autogenerated functions can be used/provided directly.
+ *
+ * The actual messages are in the array @msgs in libxl_save_msgs_gen.pl
+ */
+
+#include "libxl_osdeps.h"
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <assert.h>
+#include <inttypes.h>
+
+#include "libxl.h"
+
+#include "xenctrl.h"
+#include "xenguest.h"
+#include "_libxl_save_msgs_helper.h"
+
+/*----- globals -----*/
+
+static const char *program = "libxl-save-helper";
+static xentoollog_logger *logger;
+static xc_interface *xch;
+
+/*----- error handling -----*/
+
+static void fail(int errnoval, const char *fmt, ...)
+    __attribute__((noreturn,format(printf,2,3)));
+static void fail(int errnoval, const char *fmt, ...)
+{
+    va_list al;
+    va_start(al,fmt);
+    xtl_logv(logger,XTL_ERROR,errnoval,program,fmt,al);
+    exit(-1);
+}
+
+static int read_exactly(int fd, void *buf, size_t len)
+/* returns 0 if we get eof, even if we got it midway through; 1 if ok */
+{
+    while (len) {
+        ssize_t r = read(fd, buf, len);
+        if (r<=0) return r;
+        assert(r <= len);
+        len -= r;
+        buf = (char*)buf + r;
+    }
+    return 1;
+}
+
+static void *xmalloc(size_t sz)
+{
+    if (!sz) return 0;
+    void *r = malloc(sz);
+    if (!r) { perror("memory allocation failed"); exit(-1); }
+    return r;
+}
+
+/*----- logger -----*/
+
+typedef struct {
+    xentoollog_logger vtable;
+} xentoollog_logger_tellparent;
+
+static void tellparent_vmessage(xentoollog_logger *logger_in,
+                                xentoollog_level level,
+                                int errnoval,
+                                const char *context,
+                                const char *format,
+                                va_list al)
+{
+    char *formatted;
+    int r = vasprintf(&formatted, format, al);
+    if (r < 0) { perror("memory allocation failed during logging"); exit(-1); }
+    helper_stub_log(level, errnoval, context, formatted, 0);
+    free(formatted);
+}
+
+static void tellparent_progress(struct xentoollog_logger *logger_in,
+                                const char *context,
+                                const char *doing_what, int percent,
+                                unsigned long done, unsigned long total)
+{
+    helper_stub_progress(context, doing_what, done, total, 0);
+}
+
+static void tellparent_destroy(struct xentoollog_logger *logger_in)
+{
+    abort();
+}
+
+static xentoollog_logger_tellparent *createlogger_tellparent(void)
+{
+    xentoollog_logger_tellparent newlogger;
+    return XTL_NEW_LOGGER(tellparent, newlogger);
+}
+
+/*----- helper functions called by autogenerated stubs -----*/
+
+unsigned char * helper_allocbuf(int len, void *user)
+{
+    return xmalloc(len);
+}
+
+static void transmit(const unsigned char *msg, int len, void *user)
+{
+    while (len) {
+        int r = write(1, msg, len);
+        if (r<0) { perror("write"); exit(-1); }
+        assert(r >= 0);
+        assert(r <= len);
+        len -= r;
+        msg += r;
+    }
+}
+
+void helper_transmitmsg(unsigned char *msg_freed, int len_in, void *user)
+{
+    assert(len_in < 64*1024);
+    uint16_t len = len_in;
+    transmit((const void*)&len, sizeof(len), user);
+    transmit(msg_freed, len, user);
+    free(msg_freed);
+}
+
+int helper_getreply(void *user)
+{
+    int v;
+    int r = read_exactly(0, &v, sizeof(v));
+    if (r<=0) exit(-2);
+    return v;
+}
+
+/*----- other callbacks -----*/
+
+static int toolstack_save_fd;
+static uint32_t toolstack_save_len;
+
+static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
+                             uint32_t *len, void *data)
+{
+    assert(toolstack_save_fd > 0);
+
+    *buf = xmalloc(toolstack_save_len);
+    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
+    if (r<0) fail(errno,"read toolstack data");
+    if (r==0) fail(0,"read toolstack data eof");
+
+    toolstack_save_fd = -1;
+    *len = toolstack_save_len;
+    return 0;
+}
+
+static void startup(const char *op) {
+    logger = (xentoollog_logger*)createlogger_tellparent();
+    if (!logger) {
+        fprintf(stderr, "%s: cannot initialise logger\n", program);
+        exit(-1);
+    }
+
+    xtl_log(logger,XTL_DEBUG,0,program,"starting %s",op);
+
+    xch = xc_interface_open(logger,logger,0);
+    if (!xch) fail(errno,"xc_interface_open failed");
+}
+
+static void complete(int retval) {
+    int errnoval = retval ? errno : 0; /* suppress irrelevant errnos */
+    xtl_log(logger,XTL_DEBUG,errnoval,program,"complete r=%d",retval);
+    helper_stub_complete(retval,errnoval,0);
+    exit(0);
+}
+
+static struct save_callbacks helper_save_callbacks;
+static struct restore_callbacks helper_restore_callbacks;
+
+int main(int argc, char **argv)
+{
+    int r;
+
+#define NEXTARG (++argv, assert(*argv), *argv)
+
+    const char *mode = *++argv;
+    assert(mode);
+
+    if (!strcmp(mode,"--save-domain")) {
+
+        int io_fd =                atoi(NEXTARG);
+        uint32_t dom =             strtoul(NEXTARG,0,10);
+        uint32_t max_iters =       strtoul(NEXTARG,0,10);
+        uint32_t max_factor =      strtoul(NEXTARG,0,10);
+        uint32_t flags =           strtoul(NEXTARG,0,10);
+        int hvm =                  atoi(NEXTARG);
+        unsigned long genidad =    strtoul(NEXTARG,0,10);
+        toolstack_save_fd  =       atoi(NEXTARG);
+        toolstack_save_len =       strtoul(NEXTARG,0,10);
+        unsigned cbflags =         strtoul(NEXTARG,0,10);
+        assert(!*++argv);
+
+        if (toolstack_save_fd >= 0)
+            helper_save_callbacks.toolstack_save = toolstack_save_cb;
+
+        helper_setcallbacks_save(&helper_save_callbacks, cbflags);
+
+        startup("save");
+        r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
+                           &helper_save_callbacks, hvm, genidad);
+        complete(r);
+
+    } else if (!strcmp(mode,"--restore-domain")) {
+
+        int io_fd =                atoi(NEXTARG);
+        uint32_t dom =             strtoul(NEXTARG,0,10);
+        unsigned store_evtchn =    strtoul(NEXTARG,0,10);
+        domid_t store_domid =      strtoul(NEXTARG,0,10);
+        unsigned console_evtchn =  strtoul(NEXTARG,0,10);
+        domid_t console_domid =    strtoul(NEXTARG,0,10);
+        unsigned int hvm =         strtoul(NEXTARG,0,10);
+        unsigned int pae =         strtoul(NEXTARG,0,10);
+        int superpages =           strtoul(NEXTARG,0,10);
+        int no_incr_genidad =      strtoul(NEXTARG,0,10);
+        unsigned cbflags =         strtoul(NEXTARG,0,10);
+        assert(!*++argv);
+
+        helper_setcallbacks_restore(&helper_restore_callbacks, cbflags);
+
+        unsigned long store_mfn = 0;
+        unsigned long console_mfn = 0;
+        unsigned long genidad = 0;
+
+        startup("restore");
+        r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn,
+                              store_domid, console_evtchn, &console_mfn,
+                              console_domid, hvm, pae, superpages,
+                              no_incr_genidad, &genidad,
+                              &helper_restore_callbacks);
+        helper_stub_restore_results(store_mfn,console_mfn,genidad,0);
+        complete(r);
+
+    } else {
+        assert(!"unexpected mode argument");
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
new file mode 100755
index 0000000..c45986e
--- /dev/null
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -0,0 +1,397 @@
+#!/usr/bin/perl -w
+
+use warnings;
+use strict;
+use POSIX;
+
+our $debug = 0; # produce copious debugging output at run-time?
+
+our @msgs = (
+    # flags:
+    #   s  - applicable to save
+    #   r  - applicable to restore
+    #   c  - function pointer in callbacks struct rather than fixed function
+    #   x  - function pointer is in struct {save,restore}_callbacks
+    #         and its null-ness needs to be passed through to the helper's xc
+    #   W  - needs a return value; callback is synchronous
+    [  1, 'sr',     "log",                   [qw(uint32_t level
+                                                 uint32_t errnoval
+                                                 STRING context
+                                                 STRING formatted)] ],
+    [  2, 'sr',     "progress",              [qw(STRING context
+                                                 STRING doing_what),
+                                                'unsigned long', 'done',
+                                                'unsigned long', 'total'] ],
+    [  3, 'scxW',   "suspend", [] ],         
+    [  4, 'scxW',   "postcopy", [] ],        
+    [  5, 'scxW',   "checkpoint", [] ],      
+    [  6, 'scxW',   "switch_qemu_logdirty",  [qw(int domid
+                                              unsigned enable)] ],
+    #                toolstack_save          done entirely `by hand'
+    [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
+                                                BLOCK tsdata)] ],
+    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+                                              'unsigned long', 'console_mfn',
+                                              'unsigned long', 'genidad'] ],
+    [  9, 'srW',    "complete",              [qw(int retval
+                                                 int errnoval)] ],
+);
+
+#----------------------------------------
+
+our %cbs;
+our %func;
+our %func_ah;
+our @outfuncs;
+our %out_decls;
+our %out_body;
+our %msgnum_used;
+
+die unless @ARGV==1;
+die if $ARGV[0] =~ m/^-/;
+
+our ($intendedout) = @ARGV;
+
+$intendedout =~ m/([a-z]+)\.([ch])$/ or die;
+my ($want_ah, $ch) = ($1, $2);
+
+my $declprefix = '';
+
+foreach my $ah (qw(callout helper)) {
+    $out_body{$ah} .=
+        <<END_BOTH.($ah eq 'callout' ? <<END_CALLOUT : <<END_HELPER);
+#include "libxl_osdeps.h"
+
+#include <assert.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+END_BOTH
+
+#include "libxl_internal.h"
+
+END_CALLOUT
+
+#include "_libxl_save_msgs_${ah}.h"
+#include <xenctrl.h>
+#include <xenguest.h>
+
+END_HELPER
+}
+
+die $want_ah unless defined $out_body{$want_ah};
+
+sub f_decl ($$$$) {
+    my ($name, $ah, $c_rtype, $c_decl) = @_;
+    $out_decls{$name} = "${declprefix}$c_rtype $name$c_decl;\n";
+    $func{$name} = "$c_rtype $name$c_decl\n{\n" . ($func{$name} || '');
+    $func_ah{$name} = $ah;
+}
+
+sub f_more ($$) {
+    my ($name, $addbody) = @_;
+    $func{$name} ||= '';
+    $func{$name} .= $addbody;
+    push @outfuncs, $name;
+}
+
+our $libxl = "libxl__srm";
+our $callback = "${libxl}_callout_callback";
+our $receiveds = "${libxl}_callout_received";
+our $sendreply = "${libxl}_callout_sendreply";
+our $getcallbacks = "${libxl}_callout_get_callbacks";
+our $enumcallbacks = "${libxl}_callout_enumcallbacks";
+sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
+
+f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
+
+our $helper = "helper";
+our $encode = "${helper}_stub";
+our $allocbuf = "${helper}_allocbuf";
+our $transmit = "${helper}_transmitmsg";
+our $getreply = "${helper}_getreply";
+our $setcallbacks = "${helper}_setcallbacks";
+
+f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
+f_decl($transmit, 'helper', 'void',
+       '(unsigned char *msg_freed, int len, void *user)');
+f_decl($getreply, 'helper', 'int', '(void *user)');
+
+sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
+
+$out_body{'callout'} .= <<END;
+static int bytes_get(const unsigned char **msg,
+		     const unsigned char *const endmsg,
+		     void *result, int rlen)
+{
+    if (endmsg - *msg < rlen) return 0;
+    memcpy(result,*msg,rlen);
+    *msg += rlen;
+    return 1;
+}
+
+END
+$out_body{'helper'} .= <<END;
+static void bytes_put(unsigned char *const buf, int *len,
+		      const void *value, int vlen)
+{
+    assert(vlen < INT_MAX/2 - *len);
+    if (buf)
+	memcpy(buf + *len, value, vlen);
+    *len += vlen;
+}
+
+END
+
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long') {
+    my $typeid = typeid($simpletype);
+    $out_body{'callout'} .= <<END;
+static int ${typeid}_get(const unsigned char **msg,
+                        const unsigned char *const endmsg,
+                        $simpletype *result)
+{
+    return bytes_get(msg, endmsg, result, sizeof(*result));
+}
+
+END
+    $out_body{'helper'} .= <<END;
+static void ${typeid}_put(unsigned char *const buf, int *len,
+			 const $simpletype value)
+{
+    bytes_put(buf, len, &value, sizeof(value));
+}
+
+END
+}
+
+$out_body{'callout'} .= <<END;
+static int BLOCK_get(const unsigned char **msg,
+                      const unsigned char *const endmsg,
+                      const uint8_t **result, uint32_t *result_size)
+{
+    if (!uint32_t_get(msg,endmsg,result_size)) return 0;
+    if (endmsg - *msg < *result_size) return 0;
+    *result = (const void*)*msg;
+    *msg += *result_size;
+    return 1;
+}
+
+static int STRING_get(const unsigned char **msg,
+                      const unsigned char *const endmsg,
+                      const char **result)
+{
+    const uint8_t *data;
+    uint32_t datalen;
+    if (!BLOCK_get(msg,endmsg,&data,&datalen)) return 0;
+    if (datalen == 0) return 0;
+    if (data[datalen-1] != '\\0') return 0;
+    *result = (const void*)data;
+    return 1;
+}
+
+END
+$out_body{'helper'} .= <<END;
+static void BLOCK_put(unsigned char *const buf,
+                      int *len,
+		      const uint8_t *bytes, uint32_t size)
+{
+    uint32_t_put(buf, len, size);
+    bytes_put(buf, len, bytes, size);
+}
+    
+static void STRING_put(unsigned char *const buf,
+		       int *len,
+		       const char *string)
+{
+    size_t slen = strlen(string);
+    assert(slen < INT_MAX / 4);
+    assert(slen < (uint32_t)0x40000000);
+    BLOCK_put(buf, len, (const void*)string, slen+1);
+}
+    
+END
+
+foreach my $sr (qw(save restore)) {
+    f_decl("${getcallbacks}_${sr}", 'callout',
+           "const ".cbtype($sr)." *",
+           "(void *data)");
+
+    f_decl("${receiveds}_${sr}", 'callout', 'int',
+	   "(const unsigned char *msg, uint32_t len, void *user)");
+
+    f_decl("${enumcallbacks}_${sr}", 'callout', 'unsigned',
+           "(const ".cbtype($sr)." *cbs)");
+    f_more("${enumcallbacks}_${sr}", "    unsigned cbflags = 0;\n");
+
+    f_decl("${setcallbacks}_${sr}", 'helper', 'void',
+           "(struct ${sr}_callbacks *cbs, unsigned cbflags)");
+
+    f_more("${receiveds}_${sr}",
+           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
+    const unsigned char *const endmsg = msg + len;
+    uint16_t mtype;
+    if (!uint16_t_get(&msg,endmsg,&mtype)) return 0;
+END_ALWAYS
+    fprintf(stderr,"libxl callout receiver: got len=%u mtype=%u\\n",len,mtype);
+END_DEBUG
+    switch (mtype) {
+
+END_ALWAYS
+
+    $cbs{$sr} = "typedef struct ".cbtype($sr)." {\n";
+}
+
+foreach my $msginfo (@msgs) {
+    my ($msgnum, $flags, $name, $args) = @$msginfo;
+    die if $msgnum_used{$msgnum}++;
+
+    my $f_more_sr = sub {
+        my ($contents_spec, $fnamebase) = @_;
+        $fnamebase ||= "${receiveds}";
+        foreach my $sr (qw(save restore)) {
+            $sr =~ m/^./;
+            next unless $flags =~ m/$&/;
+            my $contents = (!ref $contents_spec) ? $contents_spec :
+                $contents_spec->($sr);
+            f_more("${fnamebase}_${sr}", $contents);
+        }
+    };
+
+    $f_more_sr->("    case $msgnum: { /* $name */\n");
+    if ($flags =~ m/W/) {
+        $f_more_sr->("        int r;\n");
+    }
+
+    my $c_rtype_helper = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_decl = '(';
+    my $c_callback_args = '';
+
+    f_more("${encode}_$name",
+           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
+    unsigned char *buf = 0;
+    int len = 0, allocd = 0;
+
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: encoding $name\\n");
+END_DEBUG
+    for (;;) {
+        uint16_t_put(buf, &len, $msgnum /* $name */);
+END_ALWAYS
+
+    my @args = @$args;
+    my $c_recv = '';
+    my ($argtype, $arg);
+    while (($argtype, $arg, @args) = @args) {
+	my $typeid = typeid($argtype);
+        my $c_args = "$arg";
+        my $c_get_args = "&$arg";
+	if ($argtype eq 'STRING') {
+	    $c_decl .= "const char *$arg, ";
+	    $f_more_sr->("        const char *$arg;\n");
+        } elsif ($argtype eq 'BLOCK') {
+            $c_decl .= "const uint8_t *$arg, uint32_t ${arg}_size, ";
+            $c_args .= ", ${arg}_size";
+            $c_get_args .= ",&${arg}_size";
+	    $f_more_sr->("        const uint8_t *$arg;\n".
+                         "        uint32_t ${arg}_size;\n");
+	} else {
+	    $c_decl .= "$argtype $arg, ";
+	    $f_more_sr->("        $argtype $arg;\n");
+	}
+	$c_callback_args .= "$c_args, ";
+	$c_recv.=
+            "        if (!${typeid}_get(&msg,endmsg,$c_get_args)) return 0;\n";
+        f_more("${encode}_$name", "	${typeid}_put(buf, &len, $c_args);\n");
+    }
+    $f_more_sr->($c_recv);
+    $c_decl .= "void *user)";
+    $c_callback_args .= "user";
+
+    $f_more_sr->("        if (msg != endmsg) return 0;\n");
+
+    my $c_callback;
+    if ($flags !~ m/c/) {
+        $c_callback = "${callback}_$name";
+    } else {
+        $f_more_sr->(sub {
+            my ($sr) = @_;
+            $cbs{$sr} .= "    $c_rtype_callout (*${name})$c_decl;\n";
+            return
+          "        const ".cbtype($sr)." *const cbs =\n".
+            "            ${getcallbacks}_${sr}(user);\n";
+                       });
+        $c_callback = "cbs->${name}";
+    }
+    my $c_make_callback = "$c_callback($c_callback_args)";
+    if ($flags !~ m/W/) {
+	$f_more_sr->("        $c_make_callback;\n");
+    } else {
+        $f_more_sr->("        r = $c_make_callback;\n".
+                     "        $sendreply(r, user);\n");
+	f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
+    }
+    if ($flags =~ m/x/) {
+        my $c_v = "(1u<<$msgnum)";
+        my $c_cb = "cbs->$name";
+        $f_more_sr->("    if ($c_cb) cbflags |= $c_v;\n", $enumcallbacks);
+        $f_more_sr->("    $c_cb = (cbflags & $c_v) ? ${encode}_${name} : 0;\n",
+                     $setcallbacks);
+    }
+    $f_more_sr->("        return 1;\n    }\n\n");
+    f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
+    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
+    f_more("${encode}_$name",
+"        if (buf) break;
+        buf = ${helper}_allocbuf(len, user);
+        assert(buf);
+        allocd = len;
+        len = 0;
+    }
+    assert(len == allocd);
+    ${transmit}(buf, len, user);
+");
+    if ($flags =~ m/W/) {
+	f_more("${encode}_$name",
+               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
+    int r = ${helper}_getreply(user);
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
+END_DEBUG
+    return r;
+END_ALWAYS
+    }
+}
+
+print "/* AUTOGENERATED by $0 DO NOT EDIT */\n\n" or die $!;
+
+foreach my $sr (qw(save restore)) {
+    f_more("${enumcallbacks}_${sr}",
+           "    return cbflags;\n");
+    f_more("${receiveds}_${sr}",
+           "    default:\n".
+           "        return 0;\n".
+           "    }");
+    $cbs{$sr} .= "} ".cbtype($sr).";\n\n";
+    if ($ch eq 'h') {
+        print $cbs{$sr} or die $!;
+        print "struct ${sr}_callbacks;\n";
+    }
+}
+
+if ($ch eq 'c') {
+    foreach my $name (@outfuncs) {
+        next unless defined $func{$name};
+        $func{$name} .= "}\n\n";
+        $out_body{$func_ah{$name}} .= $func{$name};
+        delete $func{$name};
+    }
+    print $out_body{$want_ah} or die $!;
+} else {
+    foreach my $name (sort keys %out_decls) {
+        next unless $func_ah{$name} eq $want_ah;
+        print $out_decls{$name} or die $!;
+    }
+}
+
+close STDOUT or die $!;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/21] libxl: rename libxl_dom:save_helper to physmap_path
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (5 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 06/21] libxl: domain save/restore: run in a separate process Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 08/21] libxl: provide libxl__xs_*_checked and libxl__xs_transaction_* Ian Jackson
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

"save_helper" isn't very descriptive.  Also it is now confusing
because it reads like it might refer to the libxl-save-helper
executable which runs xc_domain_save and xc_domain_restore.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_dom.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b52d29a..ba58c45 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -736,7 +736,7 @@ int libxl__domain_suspend_common_callback(void *user)
     return 1;
 }
 
-static inline char *save_helper(libxl__gc *gc, uint32_t domid,
+static inline char *physmap_path(libxl__gc *gc, uint32_t domid,
         char *phys_offset, char *node)
 {
     return libxl__sprintf(gc,
@@ -781,21 +781,21 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
             return -1;
         }
 
-        xs_path = save_helper(gc, domid, phys_offset, "start_addr");
+        xs_path = physmap_path(gc, domid, phys_offset, "start_addr");
         start_addr = libxl__xs_read(gc, 0, xs_path);
         if (start_addr == NULL) {
             LOG(ERROR, "%s is NULL", xs_path);
             return -1;
         }
 
-        xs_path = save_helper(gc, domid, phys_offset, "size");
+        xs_path = physmap_path(gc, domid, phys_offset, "size");
         size = libxl__xs_read(gc, 0, xs_path);
         if (size == NULL) {
             LOG(ERROR, "%s is NULL", xs_path);
             return -1;
         }
 
-        xs_path = save_helper(gc, domid, phys_offset, "name");
+        xs_path = physmap_path(gc, domid, phys_offset, "name");
         name = libxl__xs_read(gc, 0, xs_path);
         if (name == NULL)
             namelen = 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/21] libxl: provide libxl__xs_*_checked and libxl__xs_transaction_*
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (6 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 07/21] libxl: rename libxl_dom:save_helper to physmap_path Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 09/21] libxl: wait for qemu to acknowledge logdirty command Ian Jackson
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

These useful utility functions make dealing with xenstore a little
less painful.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v4:
 * Fixed typo `commited' in comment.

Changes in v3:
 * Fixed typo `transacton' in log messages.
---
 tools/libxl/libxl_internal.h |   38 +++++++++++++++++++++
 tools/libxl/libxl_xshelp.c   |   76 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 1a7b526..b108d00 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -498,6 +498,44 @@ _hidden bool libxl__xs_mkdir(libxl__gc *gc, xs_transaction_t t,
 
 _hidden char *libxl__xs_libxl_path(libxl__gc *gc, uint32_t domid);
 
+
+/*----- "checked" xenstore access functions -----*/
+/* Each of these functions will check that it succeeded; if it
+ * fails it logs and returns ERROR_FAIL.
+ */
+
+/* On success, *result_out came from the gc.
+ * On error, *result_out is undefined.
+ * ENOENT counts as success but sets *result_out=0
+ */
+int libxl__xs_read_checked(libxl__gc *gc, xs_transaction_t t,
+                           const char *path, const char **result_out);
+
+/* Does not include a trailing null.
+ * May usefully be combined with GCSPRINTF if the format string
+ * behaviour of libxl__xs_write is desirable. */
+int libxl__xs_write_checked(libxl__gc *gc, xs_transaction_t t,
+                            const char *path, const char *string);
+
+/* ENOENT is not an error (even if the parent directories don't exist) */
+int libxl__xs_rm_checked(libxl__gc *gc, xs_transaction_t t, const char *path);
+
+/* Transaction functions, best used together.
+ * The caller should initialise *t to 0 (XBT_NULL) before calling start.
+ * Each function leaves *t!=0 iff the transaction needs cleaning up.
+ *
+ * libxl__xs_transaction_commit returns:
+ *   <0  failure - a libxl error code
+ *   +1  commit conflict; transaction has been destroyed and caller
+ *        must go round again (call _start again and retry)
+ *    0  committed successfully
+ */
+int libxl__xs_transaction_start(libxl__gc *gc, xs_transaction_t *t);
+int libxl__xs_transaction_commit(libxl__gc *gc, xs_transaction_t *t);
+void libxl__xs_transaction_abort(libxl__gc *gc, xs_transaction_t *t);
+
+
+
 /*
  * This is a recursive delete, from top to bottom. What this function does
  * is remove empty folders that contained the deleted entry.
diff --git a/tools/libxl/libxl_xshelp.c b/tools/libxl/libxl_xshelp.c
index c5b5364..993f527 100644
--- a/tools/libxl/libxl_xshelp.c
+++ b/tools/libxl/libxl_xshelp.c
@@ -135,6 +135,82 @@ char *libxl__xs_libxl_path(libxl__gc *gc, uint32_t domid)
     return s;
 }
 
+int libxl__xs_read_checked(libxl__gc *gc, xs_transaction_t t,
+                           const char *path, const char **result_out)
+{
+    char *result = libxl__xs_read(gc, t, path);
+    if (!result) {
+        if (errno != ENOENT) {
+            LOGE(ERROR, "xenstore read failed: `%s'", path);
+            return ERROR_FAIL;
+        }
+    }
+    *result_out = result;
+    return 0;
+}
+
+int libxl__xs_write_checked(libxl__gc *gc, xs_transaction_t t,
+                            const char *path, const char *string)
+{
+    size_t length = strlen(string);
+    if (!xs_write(CTX->xsh, t, path, string, length)) {
+        LOGE(ERROR, "xenstore write failed: `%s' = `%s'", path, string);
+        return ERROR_FAIL;
+    }
+    return 0;
+}
+
+int libxl__xs_rm_checked(libxl__gc *gc, xs_transaction_t t, const char *path)
+{
+    if (!xs_rm(CTX->xsh, t, path)) {
+        if (errno == ENOENT)
+            return 0;
+
+        LOGE(ERROR, "xenstore rm failed: `%s'", path);
+        return ERROR_FAIL;
+    }
+    return 0;
+}
+
+int libxl__xs_transaction_start(libxl__gc *gc, xs_transaction_t *t)
+{
+    assert(!*t);
+    *t = xs_transaction_start(CTX->xsh);
+    if (!*t) {
+        LOGE(ERROR, "could not create xenstore transaction");
+        return ERROR_FAIL;
+    }
+    return 0;
+}
+
+int libxl__xs_transaction_commit(libxl__gc *gc, xs_transaction_t *t)
+{
+    assert(*t);
+
+    if (!xs_transaction_end(CTX->xsh, *t, 0)) {
+        if (errno == EAGAIN)
+            return +1;
+
+        *t = 0;
+        LOGE(ERROR, "could not commit xenstore transaction");
+        return ERROR_FAIL;
+    }
+
+    *t = 0;
+    return 0;
+}
+
+void libxl__xs_transaction_abort(libxl__gc *gc, xs_transaction_t *t)
+{
+    if (!*t)
+        return;
+
+    if (!xs_transaction_end(CTX->xsh, *t, 1))
+        LOGE(ERROR, "could not abort xenstore transaction");
+
+    *t = 0;
+}
+
 int libxl__xs_path_cleanup(libxl__gc *gc, xs_transaction_t t, char *user_path)
 {
     unsigned int nb = 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/21] libxl: wait for qemu to acknowledge logdirty command
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (7 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 08/21] libxl: provide libxl__xs_*_checked and libxl__xs_transaction_* Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 10/21] libxl: datacopier: provide "prefix data" facility Ian Jackson
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

The current migration code in libxl instructs qemu to start or stop
logdirty, but it does not wait for an acknowledgement from qemu before
continuing.  This might lead to memory corruption (!)

Fix this by waiting for qemu to acknowledge the command.

Unfortunately the necessary ao arrangements for waiting for this
command are unique because qemu has a special protocol for this
particular operation.

Also, this change means that the switch_qemu_logdirty callback
implementation in libxl can no longer synchronously produce its return
value, as it now needs to wait for xenstore.  So we tell the
marshalling code generator that it is a message which does not need a
reply.  This turns the callback function called by the marshaller into
one which returns void; the callback function arranges to later
explicitly sends the reply to the helper, when the xs watch triggers
and the appropriate value is read from xenstore.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v4:
 * Correct the sense of the return value from switch_qemu_logdirty.
 * Fix a somewhat mendacious log message.
---
 tools/libxl/libxl_dom.c            |  177 +++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_internal.h       |   18 ++++-
 tools/libxl/libxl_save_callout.c   |    8 ++
 tools/libxl/libxl_save_msgs_gen.pl |    7 +-
 4 files changed, 194 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index ba58c45..4a588fb 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -530,30 +530,181 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 
-/*----- callbacks, called by xc_domain_save -----*/
+/*----- complicated callback, called by xc_domain_save -----*/
+
+/*
+ * We implement the other end of protocol for controlling qemu-dm's
+ * logdirty.  There is no documentation for this protocol, but our
+ * counterparty's implementation is in
+ * qemu-xen-traditional.git:xenstore.c in the function
+ * xenstore_process_logdirty_event
+ */
+
+static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
+                                    const struct timeval *requested_abs);
+static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
+                            const char *watch_path, const char *event_path);
+static void switch_logdirty_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss, int ok);
 
-int libxl__domain_suspend_common_switch_qemu_logdirty
+static void logdirty_init(libxl__logdirty_switch *lds)
+{
+    lds->cmd_path = 0;
+    libxl__ev_xswatch_init(&lds->watch);
+    libxl__ev_time_init(&lds->timeout);
+}
+
+void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned enable, void *user)
 {
     libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__logdirty_switch *lds = &dss->logdirty;
     STATE_AO_GC(dss->ao);
-    char *path;
-    bool rc;
+    int rc;
+    xs_transaction_t t = 0;
+    const char *got;
 
-    path = libxl__sprintf(gc,
+    if (!lds->cmd_path) {
+        lds->cmd_path = GCSPRINTF(
                    "/local/domain/0/device-model/%u/logdirty/cmd", domid);
-    if (!path)
-        return 1;
+        lds->ret_path = GCSPRINTF(
+                   "/local/domain/0/device-model/%u/logdirty/ret", domid);
+    }
+    lds->cmd = enable ? "enable" : "disable";
 
-    if (enable)
-        rc = xs_write(CTX->xsh, XBT_NULL, path, "enable", strlen("enable"));
-    else
-        rc = xs_write(CTX->xsh, XBT_NULL, path, "disable", strlen("disable"));
+    rc = libxl__ev_xswatch_register(gc, &lds->watch,
+                                switch_logdirty_xswatch, lds->ret_path);
+    if (rc) goto out;
+
+    rc = libxl__ev_time_register_rel(gc, &lds->timeout,
+                                switch_logdirty_timeout, 10*1000);
+    if (rc) goto out;
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto out;
+
+        rc = libxl__xs_read_checked(gc, t, lds->cmd_path, &got);
+        if (rc) goto out;
+
+        if (got) {
+            const char *got_ret;
+            rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got_ret);
+            if (rc) goto out;
 
-    return rc ? 0 : 1;
+            if (strcmp(got, got_ret)) {
+                LOG(ERROR,"controlling logdirty: qemu was already sent"
+                    " command `%s' (xenstore path `%s') but result is `%s'",
+                    got, lds->cmd_path, got_ret ? got_ret : "<none>");
+                rc = ERROR_FAIL;
+                goto out;
+            }
+            rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
+            if (rc) goto out;
+        }
+
+        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, lds->cmd_path, lds->cmd);
+        if (rc) goto out;
+
+        rc = libxl__xs_transaction_commit(gc, &t);
+        if (!rc) break;
+        if (rc<0) goto out;
+    }
+
+    /* OK, wait for some callback */
+    return;
+
+ out:
+    LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
+    switch_logdirty_done(egc,dss,-1);
+}
+
+static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
+                                    const struct timeval *requested_abs)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
+    STATE_AO_GC(dss->ao);
+    LOG(ERROR,"logdirty switch: wait for device model timed out");
+    switch_logdirty_done(egc,dss,-1);
 }
 
+static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
+                            const char *watch_path, const char *event_path)
+{
+    libxl__domain_suspend_state *dss =
+        CONTAINER_OF(watch, *dss, logdirty.watch);
+    libxl__logdirty_switch *lds = &dss->logdirty;
+    STATE_AO_GC(dss->ao);
+    const char *got;
+    xs_transaction_t t = 0;
+    int rc;
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto out;
+
+        rc = libxl__xs_read_checked(gc, t, lds->ret_path, &got);
+        if (rc) goto out;
+
+        if (!got) {
+            rc = +1;
+            goto out;
+        }
+
+        if (strcmp(got, lds->cmd)) {
+            LOG(ERROR,"logdirty switch: sent command `%s' but got reply `%s'"
+                " (xenstore paths `%s' / `%s')", lds->cmd, got,
+                lds->cmd_path, lds->ret_path);
+            rc = ERROR_FAIL;
+            goto out;
+        }
+
+        rc = libxl__xs_rm_checked(gc, t, lds->cmd_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_rm_checked(gc, t, lds->ret_path);
+        if (rc) goto out;
+
+        rc = libxl__xs_transaction_commit(gc, &t); 
+        if (!rc) break;
+        if (rc<0) goto out;
+    }
+
+ out:
+    /* rc < 0: error
+     * rc == 0: ok, we are done
+     * rc == +1: need to keep waiting
+     */
+    libxl__xs_transaction_abort(gc, &t);
+
+    if (!rc) {
+        switch_logdirty_done(egc,dss,0);
+    } else if (rc < 0) {
+        LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
+        switch_logdirty_done(egc,dss,-1);
+    }
+}
+
+static void switch_logdirty_done(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss,
+                                 int broke)
+{
+    STATE_AO_GC(dss->ao);
+    libxl__logdirty_switch *lds = &dss->logdirty;
+
+    libxl__ev_xswatch_deregister(gc, &lds->watch);
+    libxl__ev_time_deregister(gc, &lds->timeout);
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
+}
+
+/*----- callbacks, called by xc_domain_save -----*/
+
 int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
@@ -875,6 +1026,8 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
 
+    logdirty_init(&dss->logdirty);
+
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
         char *path;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b108d00..05bed01 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1864,6 +1864,14 @@ typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
 typedef void libxl__domain_suspend_cb(libxl__egc*,
                                       libxl__domain_suspend_state*, int rc);
 
+typedef struct libxl__logdirty_switch {
+    const char *cmd;
+    const char *cmd_path;
+    const char *ret_path;
+    libxl__ev_xswatch watch;
+    libxl__ev_time timeout;
+} libxl__logdirty_switch;
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -1883,6 +1891,7 @@ struct libxl__domain_suspend_state {
     int guest_responded;
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
+    libxl__logdirty_switch logdirty;
 };
 
 
@@ -2013,8 +2022,15 @@ _hidden void libxl__xc_domain_save(libxl__egc*, libxl__domain_suspend_state*,
 _hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
                                         int rc, int retval, int errnoval);
 
+/* Used by asynchronous callbacks: ie ones which xc regards as
+ * returning a value, but which we want to handle asynchronously.
+ * Such functions' actual callback function return void in libxl
+ * When they are ready to indicate completion, they call this. */
+void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
+                           libxl__save_helper_state *shs, int return_value);
+
 _hidden int libxl__domain_suspend_common_callback(void *data);
-_hidden int libxl__domain_suspend_common_switch_qemu_logdirty
+_hidden void libxl__domain_suspend_common_switch_qemu_logdirty
                                (int domid, unsigned int enable, void *data);
 _hidden int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         uint32_t *len, void *data);
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 19fff1b..6332beb 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -142,6 +142,14 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
 }
 
 
+void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc,
+                           libxl__save_helper_state *shs, int return_value)
+{
+    shs->egc = egc;
+    libxl__srm_callout_sendreply(return_value, shs);
+    shs->egc = 0;
+}
+
 /*----- helper execution -----*/
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index c45986e..a9ac808 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -14,6 +14,7 @@ our @msgs = (
     #   x  - function pointer is in struct {save,restore}_callbacks
     #         and its null-ness needs to be passed through to the helper's xc
     #   W  - needs a return value; callback is synchronous
+    #   A  - needs a return value; callback is asynchronous
     [  1, 'sr',     "log",                   [qw(uint32_t level
                                                  uint32_t errnoval
                                                  STRING context
@@ -25,7 +26,7 @@ our @msgs = (
     [  3, 'scxW',   "suspend", [] ],         
     [  4, 'scxW',   "postcopy", [] ],        
     [  5, 'scxW',   "checkpoint", [] ],      
-    [  6, 'scxW',   "switch_qemu_logdirty",  [qw(int domid
+    [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
     #                toolstack_save          done entirely `by hand'
     [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
@@ -262,7 +263,7 @@ foreach my $msginfo (@msgs) {
         $f_more_sr->("        int r;\n");
     }
 
-    my $c_rtype_helper = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_helper = $flags =~ m/[WA]/ ? 'int' : 'void';
     my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
     my $c_decl = '(';
     my $c_callback_args = '';
@@ -351,7 +352,7 @@ END_ALWAYS
     assert(len == allocd);
     ${transmit}(buf, len, user);
 ");
-    if ($flags =~ m/W/) {
+    if ($flags =~ m/[WA]/) {
 	f_more("${encode}_$name",
                (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
     int r = ${helper}_getreply(user);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/21] libxl: datacopier: provide "prefix data" facility
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (8 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 09/21] libxl: wait for qemu to acknowledge logdirty command Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 11/21] libxl: prepare for asynchronous writing of qemu save file Ian Jackson
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

This will be used to write the qemu data banner to the save/migration
stream.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v3:
 * Clarified and added comments explaining the `immediately'
   constraint and the lack of a reentrancy/threading hazard.
 * Fixed subject line typo.
---
 tools/libxl/libxl_aoutils.c  |   22 ++++++++++++++++++++++
 tools/libxl/libxl_internal.h |    6 ++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index ee0df57..7f8d6d3 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -74,6 +74,28 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
     }
 }
 
+void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
+                                  const void *data, size_t len)
+{
+    libxl__datacopier_buf *buf;
+    /*
+     * It is safe for this to be called immediately after _start, as
+     * is documented in the public comment.  _start's caller must have
+     * the ctx locked, so other threads don't get to mess with the
+     * contents, and the fd events cannot happen reentrantly.  So we
+     * are guaranteed to beat the first data from the read fd.
+     */
+
+    assert(len < dc->maxsz - dc->used);
+
+    buf = libxl__zalloc(0, sizeof(*buf) - sizeof(buf->buf) + len);
+    buf->used = len;
+    memcpy(buf->buf, data, len);
+
+    dc->used += len;
+    LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
+}
+
 static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
                                 int fd, short events, short revents) {
     libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 05bed01..8b582e4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1811,6 +1811,12 @@ _hidden void libxl__datacopier_init(libxl__datacopier_state *dc);
 _hidden void libxl__datacopier_kill(libxl__datacopier_state *dc);
 _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
 
+/* Inserts literal data into the output stream.  The data is copied.
+ * May safely be used only immediately after libxl__datacopier_start
+ * (before the ctx is unlocked).  But may be called multiple times.
+ * NB exceeding maxsz will fail an assertion! */
+_hidden void libxl__datacopier_prefixdata(libxl__egc*, libxl__datacopier_state*,
+                                          const void *data, size_t len);
 
 /*----- Save/restore helper (used by creation and suspend) -----*/
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/21] libxl: prepare for asynchronous writing of qemu save file
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (9 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 10/21] libxl: datacopier: provide "prefix data" facility Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 12/21] libxl: Make libxl__domain_save_device_model asynchronous Ian Jackson
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

* Combine the various calls to libxl__device_model_savefile into one
  at the start of libxl__domain_suspend, storing the result in the
  dss.  Consequently a few functions take a dss instead of some or all
  of their other arguments.

* Make libxl__domain_save_device_model's API into an asynchronous
  style which takes a callback.  The function is, however, still
  synchronous; it will be made actually async in the next patch.

* Consequently make libxl__remus_domain_checkpoint_callback into an
  asynchronous callback.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_dom.c            |   54 ++++++++++++++++++++++++++----------
 tools/libxl/libxl_internal.h       |   18 ++++++++++--
 tools/libxl/libxl_save_msgs_gen.pl |    2 +-
 3 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 4a588fb..439b4da 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -705,11 +705,13 @@ static void switch_logdirty_done(libxl__egc *egc,
 
 /*----- callbacks, called by xc_domain_save -----*/
 
-int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid)
+int libxl__domain_suspend_device_model(libxl__gc *gc,
+                                       libxl__domain_suspend_state *dss)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     int ret = 0;
-    const char *filename = libxl__device_model_savefile(gc, domid);
+    uint32_t const domid = dss->domid;
+    const char *const filename = dss->dm_savefile;
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
@@ -878,7 +880,7 @@ int libxl__domain_suspend_common_callback(void *user)
 
  guest_suspended:
     if (dss->hvm) {
-        ret = libxl__domain_suspend_device_model(gc, dss->domid);
+        ret = libxl__domain_suspend_device_model(gc, dss);
         if (ret) {
             LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
             return 0;
@@ -993,19 +995,32 @@ static int libxl__remus_domain_resume_callback(void *data)
     return 1;
 }
 
-static int libxl__remus_domain_checkpoint_callback(void *data)
+/*----- remus asynchronous checkpoint callback -----*/
+
+static void remus_checkpoint_dm_saved(libxl__egc *egc,
+                                      libxl__domain_suspend_state *dss, int rc);
+
+static void libxl__remus_domain_checkpoint_callback(void *data)
 {
     libxl__domain_suspend_state *dss = data;
+    libxl__egc *egc = dss->shs.egc;
     STATE_AO_GC(dss->ao);
 
     /* This would go into tailbuf. */
-    if (dss->hvm &&
-        libxl__domain_save_device_model(gc, dss->domid, dss->fd))
-        return 0;
+    if (dss->hvm) {
+        libxl__domain_save_device_model(egc, dss, remus_checkpoint_dm_saved);
+    } else {
+        remus_checkpoint_dm_saved(egc, dss, 0);
+    }
+}
 
+static void remus_checkpoint_dm_saved(libxl__egc *egc,
+                                      libxl__domain_suspend_state *dss, int rc)
+{
     /* TODO: Wait for disk and memory ack, release network buffer */
+    /* TODO: make this asynchronous */
     usleep(dss->interval * 1000);
-    return 1;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
 /*----- main code for suspending, in order of execution -----*/
@@ -1055,6 +1070,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
 
     dss->suspend_eventchn = -1;
     dss->guest_responded = 0;
+    dss->dm_savefile = libxl__device_model_savefile(gc, domid);
 
     if (r_info != NULL) {
         dss->interval = r_info->interval;
@@ -1104,7 +1120,6 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
 
     /* Convenience aliases */
     const libxl_domain_type type = dss->type;
-    const uint32_t domid = dss->domid;
 
     if (rc)
         goto out;
@@ -1122,11 +1137,11 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
     }
 
     if (type == LIBXL_DOMAIN_TYPE_HVM) {
-        rc = libxl__domain_suspend_device_model(gc, domid);
+        rc = libxl__domain_suspend_device_model(gc, dss);
         if (rc) goto out;
         
-        rc = libxl__domain_save_device_model(gc, domid, dss->fd);
-        if (rc) goto out;
+        libxl__domain_save_device_model(egc, dss, domain_suspend_done);
+        return;
     }
 
     rc = 0;
@@ -1135,14 +1150,22 @@ out:
     domain_suspend_done(egc, dss, rc);
 }
 
-int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd)
+void libxl__domain_save_device_model(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     libxl__save_device_model_cb *callback)
 {
+    STATE_AO_GC(dss->ao);
     int rc, fd2 = -1, c;
     char buf[1024];
-    const char *filename = libxl__device_model_savefile(gc, domid);
     struct stat st;
     uint32_t qemu_state_len;
 
+    dss->save_dm_callback = callback;
+
+    /* Convenience aliases */
+    const char *const filename = dss->dm_savefile;
+    const int fd = dss->fd;
+
     if (stat(filename, &st) < 0)
     {
         LOG(ERROR, "Unable to stat qemu save file\n");
@@ -1184,7 +1207,8 @@ int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd)
 out:
     if (fd2 >= 0) close(fd2);
     unlink(filename);
-    return rc;
+
+    dss->save_dm_callback(egc, dss, rc);
 }
 
 static void domain_suspend_done(libxl__egc *egc,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8b582e4..e95892a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -824,10 +824,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid,
 
 _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
                                      uint32_t size, void *data);
-_hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
-_hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
-_hidden int libxl__domain_save_device_model(libxl__gc *gc, uint32_t domid, int fd);
+
 _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid);
 
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
@@ -1869,6 +1867,8 @@ typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
 
 typedef void libxl__domain_suspend_cb(libxl__egc*,
                                       libxl__domain_suspend_state*, int rc);
+typedef void libxl__save_device_model_cb(libxl__egc*,
+                                         libxl__domain_suspend_state*, int rc);
 
 typedef struct libxl__logdirty_switch {
     const char *cmd;
@@ -1895,9 +1895,12 @@ struct libxl__domain_suspend_state {
     int hvm;
     int xcflags;
     int guest_responded;
+    const char *dm_savefile;
     int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
+    /* private for libxl__domain_save_device_model */
+    libxl__save_device_model_cb *save_dm_callback;
 };
 
 
@@ -2053,6 +2056,15 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
 _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                            int rc, int retval, int errnoval);
 
+/* Each time the dm needs to be saved, we must call suspend and then save */
+_hidden int libxl__domain_suspend_device_model(libxl__gc *gc,
+                                           libxl__domain_suspend_state *dss);
+_hidden void libxl__domain_save_device_model(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     libxl__save_device_model_cb *callback);
+
+_hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
+
 
 /*
  * Convenience macros.
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
index a9ac808..ee126c7 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -25,7 +25,7 @@ our @msgs = (
                                                 'unsigned long', 'total'] ],
     [  3, 'scxW',   "suspend", [] ],         
     [  4, 'scxW',   "postcopy", [] ],        
-    [  5, 'scxW',   "checkpoint", [] ],      
+    [  5, 'scxA',   "checkpoint", [] ],      
     [  6, 'scxA',   "switch_qemu_logdirty",  [qw(int domid
                                               unsigned enable)] ],
     #                toolstack_save          done entirely `by hand'
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/21] libxl: Make libxl__domain_save_device_model asynchronous
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (10 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 11/21] libxl: prepare for asynchronous writing of qemu save file Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 13/21] libxl: Add a gc to libxl_get_cpu_topology Ian Jackson
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in series v3:
 * Improve one more a debugging message.
---
 tools/libxl/libxl_dom.c      |  100 +++++++++++++++++++++++++++---------------
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 439b4da..abc5932 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1150,15 +1150,17 @@ out:
     domain_suspend_done(egc, dss, rc);
 }
 
+static void save_device_model_datacopier_done(libxl__egc *egc,
+     libxl__datacopier_state *dc, int onwrite, int errnoval);
+
 void libxl__domain_save_device_model(libxl__egc *egc,
                                      libxl__domain_suspend_state *dss,
                                      libxl__save_device_model_cb *callback)
 {
     STATE_AO_GC(dss->ao);
-    int rc, fd2 = -1, c;
-    char buf[1024];
     struct stat st;
     uint32_t qemu_state_len;
+    int rc;
 
     dss->save_dm_callback = callback;
 
@@ -1166,49 +1168,77 @@ void libxl__domain_save_device_model(libxl__egc *egc,
     const char *const filename = dss->dm_savefile;
     const int fd = dss->fd;
 
-    if (stat(filename, &st) < 0)
+    libxl__datacopier_state *dc = &dss->save_dm_datacopier;
+    memset(dc, 0, sizeof(*dc));
+    dc->readwhat = GCSPRINTF("qemu save file %s", filename);
+    dc->ao = ao;
+    dc->readfd = -1;
+    dc->writefd = fd;
+    dc->maxsz = INT_MAX;
+    dc->copywhat = GCSPRINTF("qemu save file for domain %"PRIu32, dss->domid);
+    dc->writewhat = "save/migration stream";
+    dc->callback = save_device_model_datacopier_done;
+
+    dc->readfd = open(filename, O_RDONLY);
+    if (dc->readfd < 0) {
+        LOGE(ERROR, "unable to open %s", dc->readwhat);
+        goto out;
+    }
+
+    if (fstat(dc->readfd, &st))
     {
-        LOG(ERROR, "Unable to stat qemu save file\n");
-        rc = ERROR_FAIL;
+        LOGE(ERROR, "unable to fstat %s", dc->readwhat);
+        goto out;
+    }
+
+    if (!S_ISREG(st.st_mode)) {
+        LOG(ERROR, "%s is not a plain file!", dc->readwhat);
         goto out;
     }
 
     qemu_state_len = st.st_size;
-    LOG(DEBUG, "Qemu state is %d bytes\n", qemu_state_len);
+    LOG(DEBUG, "%s is %d bytes", dc->readwhat, qemu_state_len);
 
-    rc = libxl_write_exactly(CTX, fd, QEMU_SIGNATURE, strlen(QEMU_SIGNATURE),
-                              "saved-state file", "qemu signature");
-    if (rc)
-        goto out;
+    rc = libxl__datacopier_start(dc);
+    if (rc) goto out;
 
-    rc = libxl_write_exactly(CTX, fd, &qemu_state_len, sizeof(qemu_state_len),
-                            "saved-state file", "saved-state length");
-    if (rc)
-        goto out;
+    libxl__datacopier_prefixdata(egc, dc,
+                                 QEMU_SIGNATURE, strlen(QEMU_SIGNATURE));
 
-    fd2 = open(filename, O_RDONLY);
-    if (fd2 < 0) {
-        LOGE(ERROR, "Unable to open qemu save file\n");
-        goto out;
-    }
-    while ((c = read(fd2, buf, sizeof(buf))) != 0) {
-        if (c < 0) {
-            if (errno == EINTR)
-                continue;
-            rc = errno;
-            goto out;
-        }
-        rc = libxl_write_exactly(
-            CTX, fd, buf, c, "saved-state file", "qemu state");
-        if (rc)
-            goto out;
+    libxl__datacopier_prefixdata(egc, dc,
+                                 &qemu_state_len, sizeof(qemu_state_len));
+    return;
+
+ out:
+    save_device_model_datacopier_done(egc, dc, -1, 0);
+}
+
+static void save_device_model_datacopier_done(libxl__egc *egc,
+     libxl__datacopier_state *dc, int onwrite, int errnoval)
+{
+    libxl__domain_suspend_state *dss =
+        CONTAINER_OF(dc, *dss, save_dm_datacopier);
+    STATE_AO_GC(dss->ao);
+
+    /* Convenience aliases */
+    const char *const filename = dss->dm_savefile;
+    int our_rc = 0;
+    int rc;
+
+    libxl__datacopier_kill(dc);
+
+    if (onwrite || errnoval)
+        our_rc = ERROR_FAIL;
+
+    if (dc->readfd >= 0) {
+        close(dc->readfd);
+        dc->readfd = -1;
     }
-    rc = 0;
-out:
-    if (fd2 >= 0) close(fd2);
-    unlink(filename);
 
-    dss->save_dm_callback(egc, dss, rc);
+    rc = libxl__remove_file(gc, filename);
+    if (!our_rc) our_rc = rc;
+
+    dss->save_dm_callback(egc, dss, our_rc);
 }
 
 static void domain_suspend_done(libxl__egc *egc,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e95892a..c9b4189 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1901,6 +1901,7 @@ struct libxl__domain_suspend_state {
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
     libxl__save_device_model_cb *save_dm_callback;
+    libxl__datacopier_state save_dm_datacopier;
 };
 
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/21] libxl: Add a gc to libxl_get_cpu_topology
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (11 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 12/21] libxl: Make libxl__domain_save_device_model asynchronous Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 14/21] libxl: Do not pass NULL as gc_opt; introduce NOGC Ian Jackson
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

In the next patch we are going to change the definition of NOGC to
require a local variable libxl__gc *gc.

libxl_get_cpu_topology doesn't have one but does use NOGC.
Fix this by:
 - introducing an `out' label
 - replacing the only call to `return' with a suitable assignment
   to ret and a `goto out'.
 - adding uses of GC_INIT and GC_FREE.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6ec7471..a259d65 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -3201,6 +3201,7 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nr)
 {
+    GC_INIT(ctx);
     xc_topologyinfo_t tinfo;
     DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
     DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
@@ -3213,7 +3214,8 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nr)
     if (max_cpus == 0)
     {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
-        return NULL;
+        ret = NULL;
+        goto out;
     }
 
     coremap = xc_hypercall_buffer_alloc
@@ -3258,6 +3260,8 @@ fail:
 
     if (ret)
         *nr = max_cpus;
+ out:
+    GC_FREE;
     return ret;
 }
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/21] libxl: Do not pass NULL as gc_opt; introduce NOGC
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (12 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 13/21] libxl: Add a gc to libxl_get_cpu_topology Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL Ian Jackson
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Bamvor Jian Zhang

In 25182:6c3345d7e9d9 the practice of passing NULL to gc-using memory
allocation functions was introduced.  However, the arrangements there
were not correct as committed, because the error handling and logging
depends on getting a ctx from the gc - so an allocation error would in
fact result in libxl dereferencing NULL.

Instead, provide a special dummy gc in the ctx, called `nogc_gc'.  It
is marked out specially by having alloc_maxsize==-1, which is
otherwise invalid.

Functions which need to actually look into the gc use the new test
function gc_is_real (whose purpose is mainly clarity of the code) to
check whether the gc is the dummy one, and do nothing if it is.  And
we provide a helper macro NOGC which uses the in-scope real gc to find
the ctx and hence the dummy gc (and which replaces the previous
#define NOGC NULL).

Change all callers which pass 0 or NULL to an allocation function to
use NOGC or &ctx->nogc_gc, as applicable in the context.

We add a comment near the definition of LIBXL_INIT_GC pointing out
that it isn't any more the only place a libxl__gc struct is
initialised, for the benefit of anyone changing the contents of gc's
in the future.

Also, actually document that libxl__ptr_add is legal with ptr==NULL,
and change a couple of calls not to check for NULL argument.

Reported-by: Bamvor Jian Zhang <bjzhang@suse.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Bamvor Jian Zhang <bjzhang@suse.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl.c          |    3 +++
 tools/libxl/libxl_aoutils.c  |    3 ++-
 tools/libxl/libxl_create.c   |    2 +-
 tools/libxl/libxl_event.c    |    5 +++--
 tools/libxl/libxl_exec.c     |    2 +-
 tools/libxl/libxl_fork.c     |    2 +-
 tools/libxl/libxl_internal.c |   11 +++++++++--
 tools/libxl/libxl_internal.h |   37 +++++++++++++++++++++++--------------
 tools/libxl/libxl_utils.c    |    6 ++----
 tools/libxl/libxl_xshelp.c   |    7 ++-----
 10 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index a259d65..1a7404a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -41,6 +41,9 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
 
     /* First initialise pointers etc. (cannot fail) */
 
+    ctx->nogc_gc.alloc_maxsize = -1;
+    ctx->nogc_gc.owner = ctx;
+
     LIBXL_TAILQ_INIT(&ctx->occurred);
 
     ctx->osevent_hooks = 0;
diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 7f8d6d3..99972a2 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -77,6 +77,7 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
 void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
                                   const void *data, size_t len)
 {
+    EGC_GC;
     libxl__datacopier_buf *buf;
     /*
      * It is safe for this to be called immediately after _start, as
@@ -88,7 +89,7 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
 
     assert(len < dc->maxsz - dc->used);
 
-    buf = libxl__zalloc(0, sizeof(*buf) - sizeof(buf->buf) + len);
+    buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
     buf->used = len;
     memcpy(buf->buf, data, len);
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 7b92539..b95a2fe 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -163,7 +163,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
     }
 
     if (b_info->blkdev_start == NULL)
-        b_info->blkdev_start = libxl__strdup(0, "xvda");
+        b_info->blkdev_start = libxl__strdup(NOGC, "xvda");
 
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (!b_info->u.hvm.bios)
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 565d2c2..eb23a93 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -772,7 +772,7 @@ static int beforepoll_internal(libxl__gc *gc, libxl__poller *poller,
         if (poller->fd_rindices_allocd < maxfd) {
             assert(ARRAY_SIZE_OK(poller->fd_rindices, maxfd));
             poller->fd_rindices =
-                libxl__realloc(0, poller->fd_rindices,
+                libxl__realloc(NOGC, poller->fd_rindices,
                                maxfd * sizeof(*poller->fd_rindices));
             memset(poller->fd_rindices + poller->fd_rindices_allocd,
                    0,
@@ -1099,9 +1099,10 @@ void libxl_event_free(libxl_ctx *ctx, libxl_event *event)
 libxl_event *libxl__event_new(libxl__egc *egc,
                               libxl_event_type type, uint32_t domid)
 {
+    EGC_GC;
     libxl_event *ev;
 
-    ev = libxl__zalloc(0,sizeof(*ev));
+    ev = libxl__zalloc(NOGC,sizeof(*ev));
     ev->type = type;
     ev->domid = domid;
 
diff --git a/tools/libxl/libxl_exec.c b/tools/libxl/libxl_exec.c
index 082bf44..cfa379c 100644
--- a/tools/libxl/libxl_exec.c
+++ b/tools/libxl/libxl_exec.c
@@ -280,7 +280,7 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
     int status, rc;
 
     libxl__spawn_init(ss);
-    ss->ssd = libxl__zalloc(0, sizeof(*ss->ssd));
+    ss->ssd = libxl__zalloc(NOGC, sizeof(*ss->ssd));
     libxl__ev_child_init(&ss->ssd->mid);
 
     rc = libxl__ev_time_register_rel(gc, &ss->timeout,
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 9ff99e0..044ddad 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -92,7 +92,7 @@ libxl__carefd *libxl__carefd_record(libxl_ctx *ctx, int fd)
     libxl__carefd *cf = 0;
 
     libxl_fd_set_cloexec(ctx, fd, 1);
-    cf = libxl__zalloc(NULL, sizeof(*cf));
+    cf = libxl__zalloc(&ctx->nogc_gc, sizeof(*cf));
     cf->fd = fd;
     LIBXL_LIST_INSERT_HEAD(&carefds, cf, entry);
     return cf;
diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c
index 8139520..fbff7d0 100644
--- a/tools/libxl/libxl_internal.c
+++ b/tools/libxl/libxl_internal.c
@@ -30,11 +30,16 @@ void libxl__alloc_failed(libxl_ctx *ctx, const char *func,
 #undef L
 }
 
+static int gc_is_real(const libxl__gc *gc)
+{
+    return gc->alloc_maxsize >= 0;
+}
+
 void libxl__ptr_add(libxl__gc *gc, void *ptr)
 {
     int i;
 
-    if (!gc)
+    if (!gc_is_real(gc))
         return;
 
     if (!ptr)
@@ -66,6 +71,8 @@ void libxl__free_all(libxl__gc *gc)
     void *ptr;
     int i;
 
+    assert(gc_is_real(gc));
+
     for (i = 0; i < gc->alloc_maxsize; i++) {
         ptr = gc->alloc_ptrs[i];
         gc->alloc_ptrs[i] = NULL;
@@ -104,7 +111,7 @@ void *libxl__realloc(libxl__gc *gc, void *ptr, size_t new_size)
 
     if (ptr == NULL) {
         libxl__ptr_add(gc, new_ptr);
-    } else if (new_ptr != ptr && gc != NULL) {
+    } else if (new_ptr != ptr && gc_is_real(gc)) {
         for (i = 0; i < gc->alloc_maxsize; i++) {
             if (gc->alloc_ptrs[i] == ptr) {
                 gc->alloc_ptrs[i] = new_ptr;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c9b4189..aa150b5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -277,10 +277,18 @@ struct libxl__poller {
     int wakeup_pipe[2]; /* 0 means no fd allocated */
 };
 
+struct libxl__gc {
+    /* mini-GC */
+    int alloc_maxsize; /* -1 means this is the dummy non-gc gc */
+    void **alloc_ptrs;
+    libxl_ctx *owner;
+};
+
 struct libxl__ctx {
     xentoollog_logger *lg;
     xc_interface *xch;
     struct xs_handle *xsh;
+    libxl__gc nogc_gc;
 
     const libxl_event_hooks *event_hooks;
     void *event_hooks_user;
@@ -356,13 +364,6 @@ typedef struct {
 
 #define PRINTF_ATTRIBUTE(x, y) __attribute__((format(printf, x, y)))
 
-struct libxl__gc {
-    /* mini-GC */
-    int alloc_maxsize;
-    void **alloc_ptrs;
-    libxl_ctx *owner;
-};
-
 struct libxl__egc {
     /* For event-generating functions only.
      * The egc and its gc may be accessed only on the creating thread. */
@@ -420,6 +421,7 @@ struct libxl__ao {
         (gc).alloc_ptrs = 0;                    \
         (gc).owner = (ctx);                     \
     } while(0)
+    /* NB, also, a gc struct ctx->nogc_gc is initialised in libxl_ctx_alloc */
 
 static inline libxl_ctx *libxl__gc_owner(libxl__gc *gc)
 {
@@ -438,13 +440,20 @@ static inline libxl_ctx *libxl__gc_owner(libxl__gc *gc)
  * All pointers returned by these functions are registered for garbage
  * collection on exit from the outermost libxl callframe.
  *
- * However, where the argument is stated to be "gc_opt", NULL may be
- * passed instead, in which case no garbage collection will occur; the
- * pointer must later be freed with free().  This is for memory
- * allocations of types (b) and (c).
+ * However, where the argument is stated to be "gc_opt", &ctx->nogc_gc
+ * may be passed instead, in which case no garbage collection will
+ * occur; the pointer must later be freed with free().  (Passing NULL
+ * for gc_opt is not permitted.)  This is for memory allocations of
+ * types (b) and (c).  The convenience macro NOGC should be used where
+ * possible.
+ *
+ * NOGC (and ctx->nogc_gc) may ONLY be used with functions which
+ * explicitly declare that it's OK.  Use with nonconsenting functions
+ * may result in leaks of those functions' internal allocations on the
+ * psuedo-gc.
  */
-/* register @ptr in @gc for free on exit from outermost libxl callframe. */
-_hidden void libxl__ptr_add(libxl__gc *gc_opt, void *ptr);
+/* register ptr in gc for free on exit from outermost libxl callframe. */
+_hidden void libxl__ptr_add(libxl__gc *gc_opt, void *ptr /* may be NULL */);
 /* if this is the outermost libxl callframe then free all pointers in @gc */
 _hidden void libxl__free_all(libxl__gc *gc);
 /* allocate and zero @bytes. (similar to a gc'd malloc(3)+memzero()) */
@@ -2110,7 +2119,7 @@ _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
 #define GC_INIT(ctx)  libxl__gc gc[1]; LIBXL_INIT_GC(gc[0],ctx)
 #define GC_FREE       libxl__free_all(gc)
 #define CTX           libxl__gc_owner(gc)
-#define NOGC          NULL
+#define NOGC          (&CTX->nogc_gc) /* pass only to consenting functions */
 
 /* Allocation macros all of which use the gc. */
 
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 67ef82c..08c7dac 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -58,8 +58,7 @@ char *libxl_domid_to_name(libxl_ctx *ctx, uint32_t domid)
 char *libxl__domid_to_name(libxl__gc *gc, uint32_t domid)
 {
     char *s = libxl_domid_to_name(libxl__gc_owner(gc), domid);
-    if ( s )
-        libxl__ptr_add(gc, s);
+    libxl__ptr_add(gc, s);
     return s;
 }
 
@@ -107,8 +106,7 @@ char *libxl_cpupoolid_to_name(libxl_ctx *ctx, uint32_t poolid)
 char *libxl__cpupoolid_to_name(libxl__gc *gc, uint32_t poolid)
 {
     char *s = libxl_cpupoolid_to_name(libxl__gc_owner(gc), poolid);
-    if ( s )
-        libxl__ptr_add(gc, s);
+    libxl__ptr_add(gc, s);
     return s;
 }
 
diff --git a/tools/libxl/libxl_xshelp.c b/tools/libxl/libxl_xshelp.c
index 993f527..7fdf164 100644
--- a/tools/libxl/libxl_xshelp.c
+++ b/tools/libxl/libxl_xshelp.c
@@ -86,11 +86,8 @@ char * libxl__xs_read(libxl__gc *gc, xs_transaction_t t, const char *path)
     char *ptr;
 
     ptr = xs_read(ctx->xsh, t, path, NULL);
-    if (ptr != NULL) {
-        libxl__ptr_add(gc, ptr);
-        return ptr;
-    }
-    return 0;
+    libxl__ptr_add(gc, ptr);
+    return ptr;
 }
 
 char *libxl__xs_get_dompath(libxl__gc *gc, uint32_t domid)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (13 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 14/21] libxl: Do not pass NULL as gc_opt; introduce NOGC Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-28 17:56   ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 16/21] xl: Handle return value from libxl_domain_suspend correctly Ian Jackson
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

Since it used to be legal to pass gc_opt==NULL, and there are various
patches floating about and under development which do so, add a
compiler annotation which makes the build fail when that is done.

This turns a runtime crash into a build failure, and should ensure
that we don't accidentally commit a broken combination of patches.

This is something of an annoying approach because it adds a macro
invocation to the RHS of every declaration of a function taking a
gc_opt.  So it should be reverted after Xen 4.2rc1.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_internal.h |   21 +++++++++++++--------
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index aa150b5..85f4bc6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -453,28 +453,33 @@ static inline libxl_ctx *libxl__gc_owner(libxl__gc *gc)
  * psuedo-gc.
  */
 /* register ptr in gc for free on exit from outermost libxl callframe. */
-_hidden void libxl__ptr_add(libxl__gc *gc_opt, void *ptr /* may be NULL */);
+
+#define NN1 __attribute__((nonnull(1)))
+ /* It used to be legal to pass NULL for gc_opt.  Get the compiler to
+  * warn about this if any slip through. */
+
+_hidden void libxl__ptr_add(libxl__gc *gc_opt, void *ptr /* may be NULL */) NN1;
 /* if this is the outermost libxl callframe then free all pointers in @gc */
 _hidden void libxl__free_all(libxl__gc *gc);
 /* allocate and zero @bytes. (similar to a gc'd malloc(3)+memzero()) */
-_hidden void *libxl__zalloc(libxl__gc *gc_opt, int bytes);
+_hidden void *libxl__zalloc(libxl__gc *gc_opt, int bytes) NN1;
 /* allocate and zero memory for an array of @nmemb members of @size each.
  * (similar to a gc'd calloc(3)). */
-_hidden void *libxl__calloc(libxl__gc *gc_opt, size_t nmemb, size_t size);
+_hidden void *libxl__calloc(libxl__gc *gc_opt, size_t nmemb, size_t size) NN1;
 /* change the size of the memory block pointed to by @ptr to @new_size bytes.
  * unlike other allocation functions here any additional space between the
  * oldsize and @new_size is not initialised (similar to a gc'd realloc(3)). */
-_hidden void *libxl__realloc(libxl__gc *gc_opt, void *ptr, size_t new_size);
+_hidden void *libxl__realloc(libxl__gc *gc_opt, void *ptr, size_t new_size) NN1;
 /* print @fmt into an allocated string large enoughto contain the result.
  * (similar to gc'd asprintf(3)). */
-_hidden char *libxl__sprintf(libxl__gc *gc_opt, const char *fmt, ...) PRINTF_ATTRIBUTE(2, 3);
+_hidden char *libxl__sprintf(libxl__gc *gc_opt, const char *fmt, ...) PRINTF_ATTRIBUTE(2, 3) NN1;
 /* duplicate the string @c (similar to a gc'd strdup(3)). */
-_hidden char *libxl__strdup(libxl__gc *gc_opt, const char *c);
+_hidden char *libxl__strdup(libxl__gc *gc_opt, const char *c) NN1;
 /* duplicate at most @n bytes of string @c (similar to a gc'd strndup(3)). */
-_hidden char *libxl__strndup(libxl__gc *gc_opt, const char *c, size_t n);
+_hidden char *libxl__strndup(libxl__gc *gc_opt, const char *c, size_t n) NN1;
 /* strip the last path component from @s and return as a newly allocated
  * string. (similar to a gc'd dirname(3)). */
-_hidden char *libxl__dirname(libxl__gc *gc_opt, const char *s);
+_hidden char *libxl__dirname(libxl__gc *gc_opt, const char *s) NN1;
 
 /* Each of these logs errors and returns a libxl error code.
  * They do not mind if path is already removed.
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 16/21] xl: Handle return value from libxl_domain_suspend correctly
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (14 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 17/21] libxl: do not leak dms->saved_state Ian Jackson
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

libxl_domain_suspend returns a libxl error code.  So it must be
wrapped with MUST and not CHK_ERRNO.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/xl_cmdimpl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4aea1c7..56e51aa 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2817,7 +2817,7 @@ static int save_domain(const char *p, const char *filename, int checkpoint,
 
     save_domain_core_writeconfig(fd, filename, config_data, config_len);
 
-    CHK_ERRNO(libxl_domain_suspend(ctx, domid, fd, 0, NULL));
+    MUST(libxl_domain_suspend(ctx, domid, fd, 0, NULL));
     close(fd);
 
     if (checkpoint)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 17/21] libxl: do not leak dms->saved_state
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (15 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 16/21] xl: Handle return value from libxl_domain_suspend correctly Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 18/21] libxl: do not leak spawned middle children Ian Jackson
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

This was allocated using asprintf but never freed.  Use GCSPRINTF.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_create.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b95a2fe..2d21be5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -801,9 +801,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
         goto out;
 
     if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
-        ret = asprintf(&state->saved_state,
+        state->saved_state = GCSPRINTF(
                        XC_DEVICE_MODEL_RESTORE_FILE".%d", domid);
-        ret = (ret < 0) ? ERROR_FAIL : 0;
     }
 
 out:
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 18/21] libxl: do not leak spawned middle children
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (16 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 17/21] libxl: do not leak dms->saved_state Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 19/21] libxl: do not leak an event struct on ignored ao progress Ian Jackson
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

libxl__spawn_spawn would, when libxl__spawn_detach was called, make
the spawn become idle immediately.  However it still has a child
process which needs to be waited for: the `detachable' spawned
child.

This is wrong because the ultimate in-libxl caller may return to the
application, with a child process still forked but not reaped libxl
contrary to the documented behaviour of libxl.

Instead, replace libxl__spawn_detach with libxl__spawn_initiate_detach
which is asynchronous.  The detachable spawned children are abolished;
instead, we defer calling back to the in-libxl user until the middle
child has been reaped.

Also, remove erroneous comment suggesting that `death' callback
parameter to libxl__ev_child_fork may be NULL.  It may not, and there
are no callers which pass NULL.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v4:
 * Clarify semantics of sub-states of Attached.
 * Fix reference to `Failing' sub-state in comment to `Failed'.

Changes in v3 of series:
 * Now also remove erroneous comment about NULL fork death callback.
---
 tools/libxl/libxl_dm.c       |   14 ++++-
 tools/libxl/libxl_exec.c     |  130 +++++++++++++++++++++++++-----------------
 tools/libxl/libxl_internal.h |   43 +++++++-------
 3 files changed, 110 insertions(+), 77 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 340fcfa..b3de145 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -908,6 +908,8 @@ static void device_model_confirm(libxl__egc *egc, libxl__spawn_state *spawn,
                                  const char *xsdata);
 static void device_model_startup_failed(libxl__egc *egc,
                                         libxl__spawn_state *spawn);
+static void device_model_detached(libxl__egc *egc,
+                                  libxl__spawn_state *spawn);
 
 /* our "next step" function, called from those callbacks and elsewhere */
 static void device_model_spawn_outcome(libxl__egc *egc,
@@ -1015,6 +1017,7 @@ retry_transaction:
     spawn->midproc_cb = libxl__spawn_record_pid;
     spawn->confirm_cb = device_model_confirm;
     spawn->failure_cb = device_model_startup_failed;
+    spawn->detached_cb = device_model_detached;
 
     rc = libxl__spawn_spawn(egc, spawn);
     if (rc < 0)
@@ -1048,9 +1051,7 @@ static void device_model_confirm(libxl__egc *egc, libxl__spawn_state *spawn,
     if (strcmp(xsdata, "running"))
         return;
 
-    libxl__spawn_detach(gc, spawn);
-
-    device_model_spawn_outcome(egc, dmss, 0);
+    libxl__spawn_initiate_detach(gc, spawn);
 }
 
 static void device_model_startup_failed(libxl__egc *egc,
@@ -1060,6 +1061,13 @@ static void device_model_startup_failed(libxl__egc *egc,
     device_model_spawn_outcome(egc, dmss, ERROR_FAIL);
 }
 
+static void device_model_detached(libxl__egc *egc,
+                                  libxl__spawn_state *spawn)
+{
+    libxl__dm_spawn_state *dmss = CONTAINER_OF(spawn, *dmss, spawn);
+    device_model_spawn_outcome(egc, dmss, 0);
+}
+
 static void device_model_spawn_outcome(libxl__egc *egc,
                                        libxl__dm_spawn_state *dmss,
                                        int rc)
diff --git a/tools/libxl/libxl_exec.c b/tools/libxl/libxl_exec.c
index cfa379c..0477386 100644
--- a/tools/libxl/libxl_exec.c
+++ b/tools/libxl/libxl_exec.c
@@ -238,15 +238,22 @@ err:
 /*
  * Full set of possible states of a libxl__spawn_state and its _detachable:
  *
- *               ss->        ss->        ss->    | ssd->       ssd->
- *               timeout     xswatch     ssd     |  mid         ss
- *  - Undefined   undef       undef       no     |  -           -
- *  - Idle        Idle        Idle        no     |  -           -
- *  - Active      Active      Active      yes    |  Active      yes
- *  - Partial     Active/Idle Active/Idle maybe  |  Active/Idle yes  (if exists)
- *  - Detached    -           -           -      |  Active      no
+ *                   detaching failed  mid     timeout      xswatch          
+ *  - Undefined         undef   undef   -        undef        undef
+ *  - Idle              any     any     Idle     Idle         Idle
+ *  - Attached OK       0       0       Active   Active       Active
+ *  - Attached Failed   0       1       Active   Idle         Idle
+ *  - Detaching         1       maybe   Active   Idle         Idle
+ *  - Partial           any     any     Idle     Active/Idle  Active/Idle
  *
- * When in state Detached, the middle process has been sent a SIGKILL.
+ * When in states Detaching or Attached Failed, the middle process has
+ * been sent a SIGKILL.
+ *
+ * The difference between Attached OK and Attached Failed is not
+ * directly visible to callers - callers see these two the same,
+ * although of course Attached OK will hopefully eventually result in
+ * a call to detached_cb, whereas Attached Failed will end up
+ * in a call to failure_cb.
  */
 
 /* Event callbacks. */
@@ -257,19 +264,18 @@ static void spawn_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void spawn_middle_death(libxl__egc *egc, libxl__ev_child *childw,
                                pid_t pid, int status);
 
-/* Precondition: Partial.  Results: Detached. */
+/* Precondition: Partial.  Results: Idle. */
 static void spawn_cleanup(libxl__gc *gc, libxl__spawn_state *ss);
 
-/* Precondition: Partial; caller has logged failure reason.
- * Results: Caller notified of failure;
- *  after return, ss may be completely invalid as caller may reuse it */
-static void spawn_failed(libxl__egc *egc, libxl__spawn_state *ss);
+/* Precondition: Attached or Detaching; caller has logged failure reason.
+ * Results: Detaching, or Attached Failed */
+static void spawn_fail(libxl__egc *egc, libxl__spawn_state *ss);
 
 void libxl__spawn_init(libxl__spawn_state *ss)
 {
+    libxl__ev_child_init(&ss->mid);
     libxl__ev_time_init(&ss->timeout);
     libxl__ev_xswatch_init(&ss->xswatch);
-    ss->ssd = 0;
 }
 
 int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
@@ -280,8 +286,7 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
     int status, rc;
 
     libxl__spawn_init(ss);
-    ss->ssd = libxl__zalloc(NOGC, sizeof(*ss->ssd));
-    libxl__ev_child_init(&ss->ssd->mid);
+    ss->failed = ss->detaching = 0;
 
     rc = libxl__ev_time_register_rel(gc, &ss->timeout,
                                      spawn_timeout, ss->timeout_ms);
@@ -291,7 +296,7 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
                                     spawn_watch_event, ss->xspath);
     if (rc) goto out_err;
 
-    pid_t middle = libxl__ev_child_fork(gc, &ss->ssd->mid, spawn_middle_death);
+    pid_t middle = libxl__ev_child_fork(gc, &ss->mid, spawn_middle_death);
     if (middle ==-1) { rc = ERROR_FAIL; goto out_err; }
 
     if (middle) {
@@ -344,54 +349,64 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
 
 static void spawn_cleanup(libxl__gc *gc, libxl__spawn_state *ss)
 {
+    assert(!libxl__ev_child_inuse(&ss->mid));
+    libxl__ev_time_deregister(gc, &ss->timeout);
+    libxl__ev_xswatch_deregister(gc, &ss->xswatch);
+}
+
+static void spawn_detach(libxl__gc *gc, libxl__spawn_state *ss)
+/* Precondition: Attached or Detaching, but caller must have just set
+ * at least one of detaching or failed.
+ * Results: Detaching or Attached Failed */
+{
     int r;
 
+    assert(libxl__ev_child_inuse(&ss->mid));
     libxl__ev_time_deregister(gc, &ss->timeout);
     libxl__ev_xswatch_deregister(gc, &ss->xswatch);
 
-    libxl__spawn_state_detachable *ssd = ss->ssd;
-    if (ssd) {
-        if (libxl__ev_child_inuse(&ssd->mid)) {
-            pid_t child = ssd->mid.pid;
-            r = kill(child, SIGKILL);
-            if (r && errno != ESRCH)
-                LOGE(WARN, "%s: failed to kill intermediate child (pid=%lu)",
-                     ss->what, (unsigned long)child);
-        }
+    pid_t child = ss->mid.pid;
+    r = kill(child, SIGKILL);
+    if (r && errno != ESRCH)
+        LOGE(WARN, "%s: failed to kill intermediate child (pid=%lu)",
+             ss->what, (unsigned long)child);
+}
 
-        /* disconnect the ss and ssd from each other */
-        ssd->ss = 0;
-        ss->ssd = 0;
-    }
+void libxl__spawn_initiate_detach(libxl__gc *gc, libxl__spawn_state *ss)
+{
+    ss->detaching = 1;
+    spawn_detach(gc, ss);
 }
 
-static void spawn_failed(libxl__egc *egc, libxl__spawn_state *ss)
+static void spawn_fail(libxl__egc *egc, libxl__spawn_state *ss)
+/* Caller must have logged.  Must be last thing in calling function,
+ * as it may make the callback.  Precondition: Attached or Detaching. */
 {
     EGC_GC;
-    spawn_cleanup(gc, ss);
-    ss->failure_cb(egc, ss); /* must be last; callback may do anything to ss */
+    ss->failed = 1;
+    spawn_detach(gc, ss);
 }
 
 static void spawn_timeout(libxl__egc *egc, libxl__ev_time *ev,
                           const struct timeval *requested_abs)
 {
-    /* Before event, was Active; is now Partial. */
+    /* Before event, was Attached. */
     EGC_GC;
     libxl__spawn_state *ss = CONTAINER_OF(ev, *ss, timeout);
     LOG(ERROR, "%s: startup timed out", ss->what);
-    spawn_failed(egc, ss); /* must be last */
+    spawn_fail(egc, ss); /* must be last */
 }
 
 static void spawn_watch_event(libxl__egc *egc, libxl__ev_xswatch *xsw,
                               const char *watch_path, const char *event_path)
 {
-    /* On entry, is Active. */
+    /* On entry, is Attached. */
     EGC_GC;
     libxl__spawn_state *ss = CONTAINER_OF(xsw, *ss, xswatch);
     char *p = libxl__xs_read(gc, 0, ss->xspath);
     if (!p && errno != ENOENT) {
         LOG(ERROR, "%s: xenstore read of %s failed", ss->what, ss->xspath);
-        spawn_failed(egc, ss); /* must be last */
+        spawn_fail(egc, ss); /* must be last */
         return;
     }
     ss->confirm_cb(egc, ss, p); /* must be last */
@@ -399,20 +414,22 @@ static void spawn_watch_event(libxl__egc *egc, libxl__ev_xswatch *xsw,
 
 static void spawn_middle_death(libxl__egc *egc, libxl__ev_child *childw,
                                pid_t pid, int status)
-    /* Before event, was Active or Detached;
-     * is now Active or Detached except that ssd->mid is Idle */
+    /* On entry, is Attached or Detaching */
 {
     EGC_GC;
-    libxl__spawn_state_detachable *ssd = CONTAINER_OF(childw, *ssd, mid);
-    libxl__spawn_state *ss = ssd->ss;
-
-    if (!WIFEXITED(status)) {
+    libxl__spawn_state *ss = CONTAINER_OF(childw, *ss, mid);
+
+    if ((ss->failed || ss->detaching) &&
+        ((WIFEXITED(status) && WEXITSTATUS(status)==0) ||
+         (WIFSIGNALED(status) && WTERMSIG(status)==SIGKILL))) {
+        /* as expected */
+    } else if (!WIFEXITED(status)) {
+        int loglevel = ss->detaching ? XTL_WARN : XTL_ERROR;
         const char *what =
-            GCSPRINTF("%s intermediate process (startup monitor)",
-                      ss ? ss->what : "(detached)");
-        int loglevel = ss ? XTL_ERROR : XTL_WARN;
+            GCSPRINTF("%s intermediate process (startup monitor)", ss->what);
         libxl_report_child_exitstatus(CTX, loglevel, what, pid, status);
-    } else if (ss) { /* otherwise it was supposed to be a daemon by now */
+        ss->failed = 1;
+    } else {
         if (!status)
             LOG(ERROR, "%s [%ld]: unexpectedly exited with exit status 0,"
                 " when we were waiting for it to confirm startup",
@@ -430,15 +447,22 @@ static void spawn_middle_death(libxl__egc *egc, libxl__ev_child *childw,
                 LOG(ERROR, "%s [%ld]: died during startup due to unknown fatal"
                     " signal number %d", ss->what, (unsigned long)pid, sig);
         }
-        ss->ssd = 0; /* detatch the ssd to make the ss be in state Partial */
-        spawn_failed(egc, ss); /* must be last use of ss */
+        ss->failed = 1;
     }
-    free(ssd);
-}
 
-void libxl__spawn_detach(libxl__gc *gc, libxl__spawn_state *ss)
-{
     spawn_cleanup(gc, ss);
+
+    if (ss->failed && !ss->detaching) {
+        ss->failure_cb(egc, ss); /* must be last */
+        return;
+    }
+    
+    if (ss->failed && ss->detaching)
+        LOG(WARN,"%s underlying machinery seemed to fail,"
+            " but its function seems to have been successful", ss->what);
+
+    assert(ss->detaching);
+    ss->detached_cb(egc, ss);
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 85f4bc6..9df0db5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -690,8 +690,7 @@ static inline int libxl__ev_xswatch_isregistered(const libxl__ev_xswatch *xw)
  * the libxl event machinery.
  *
  * The parent may signal the child but it must not reap it.  That will
- * be done by the event machinery.  death may be NULL in which case
- * the child is still reaped but its death is ignored.
+ * be done by the event machinery.
  *
  * It is not possible to "deregister" the child death event source.
  * It will generate exactly one event callback; until then the childw
@@ -998,8 +997,8 @@ _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid);
  *
  * Higher-level double-fork and separate detach eg as for device models
  *
- * Each libxl__spawn_state is in one of the conventional states
- *    Undefined, Idle, Active
+ * Each libxl__spawn_state is in one of these states
+ *    Undefined, Idle, Attached, Detaching
  */
 
 typedef struct libxl__obsolete_spawn_starting libxl__spawn_starting;
@@ -1040,15 +1039,15 @@ _hidden void libxl__spawn_init(libxl__spawn_state*);
  * intermediate or final child; an error message will have been
  * logged.
  *
- * confirm_cb and failure_cb will not be called reentrantly from
- * within libxl__spawn_spawn.
+ * confirm_cb, failure_cb and detached_cb will not be called
+ * reentrantly from within libxl__spawn_spawn.
  *
  * what: string describing the spawned process, used for logging
  *
  * Logs errors.  A copy of "what" is taken. 
  * Return values:
  *  < 0   error, *spawn is now Idle and need not be detached
- *   +1   caller is the parent, *spawn is Active and must eventually be detached
+ *   +1   caller is the parent, *spawn is Attached and must be detached
  *    0   caller is now the inner child, should probably call libxl__exec
  *
  * The spawn state must be Undefined or Idle on entry.
@@ -1056,12 +1055,15 @@ _hidden void libxl__spawn_init(libxl__spawn_state*);
 _hidden int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *spawn);
 
 /*
- * libxl__spawn_detach - Detaches the daemonic child.
+ * libxl__spawn_request_detach - Detaches the daemonic child.
  *
  * Works by killing the intermediate process from spawn_spawn.
  * After this function returns, failures of either child are no
  * longer reported via failure_cb.
  *
+ * This is not synchronous: there will be a further callback when
+ * the detach is complete.
+ *
  * If called before the inner child has been created, this may prevent
  * it from running at all.  Thus this should be called only when the
  * inner child has notified that it is ready.  Normally it will be
@@ -1069,10 +1071,10 @@ _hidden int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *spawn);
  *
  * Logs errors.
  *
- * The spawn state must be Active or Idle on entry and will be Idle
+ * The spawn state must be Attached entry and will be Detaching
  * on return.
  */
-_hidden void libxl__spawn_detach(libxl__gc *gc, libxl__spawn_state*);
+_hidden void libxl__spawn_initiate_detach(libxl__gc *gc, libxl__spawn_state*);
 
 /*
  * If successful, this should return 0.
@@ -1109,15 +1111,11 @@ typedef void libxl__spawn_failure_cb(libxl__egc*, libxl__spawn_state*);
 typedef void libxl__spawn_confirm_cb(libxl__egc*, libxl__spawn_state*,
                                      const char *xsdata);
 
-typedef struct {
-    /* Private to the spawn implementation.
-     */
-    /* This separate struct, from malloc, allows us to "detach"
-     * the child and reap it later, when our user has gone
-     * away and freed its libxl__spawn_state */
-    struct libxl__spawn_state *ss;
-    libxl__ev_child mid;
-} libxl__spawn_state_detachable;
+/*
+ * Called when the detach (requested by libxl__spawn_initiate_detach) has
+ * completed.  On entry to the callback the spawn state is Idle.
+ */
+typedef void libxl__spawn_detached_cb(libxl__egc*, libxl__spawn_state*);
 
 struct libxl__spawn_state {
     /* must be filled in by user and remain valid */
@@ -1129,15 +1127,18 @@ struct libxl__spawn_state {
     libxl__spawn_midproc_cb *midproc_cb;
     libxl__spawn_failure_cb *failure_cb;
     libxl__spawn_confirm_cb *confirm_cb;
+    libxl__spawn_detached_cb *detached_cb;
 
     /* remaining fields are private to libxl_spawn_... */
+    int detaching; /* we are in Detaching */
+    int failed; /* might be true whenever we are not Idle */
+    libxl__ev_child mid; /* always in use whenever we are not Idle */
     libxl__ev_time timeout;
     libxl__ev_xswatch xswatch;
-    libxl__spawn_state_detachable *ssd;
 };
 
 static inline int libxl__spawn_inuse(libxl__spawn_state *ss)
-    { return !!ss->ssd; }
+    { return libxl__ev_child_inuse(&ss->mid); }
 
 /*
  * libxl_spawner_record_pid - Record given pid in xenstore
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 19/21] libxl: do not leak an event struct on ignored ao progress
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (17 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 18/21] libxl: do not leak spawned middle children Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 20/21] libxl: further fixups re LIBXL_DOMAIN_TYPE Ian Jackson
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

On entry to libxl__ao_progress_report, the caller has allocated an
event.  If the progress report is to be ignored, we need to free it.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxl/libxl_event.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index eb23a93..1957505 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1602,6 +1602,7 @@ void libxl__ao_progress_report(libxl__egc *egc, libxl__ao *ao,
     ev->for_user = how->for_event;
     if (how->callback == dummy_asyncprogress_callback_ignore) {
         LOG(DEBUG,"ao %p: progress report: ignored",ao);
+        libxl_event_free(CTX,ev);
         /* ignore */
     } else if (how->callback) {
         libxl__aop_occurred *aop = libxl__zalloc(&egc->gc, sizeof(*aop));
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 20/21] libxl: further fixups re LIBXL_DOMAIN_TYPE
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (18 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 19/21] libxl: do not leak an event struct on ignored ao progress Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 17:55 ` [PATCH 21/21] libxl: DO NOT APPLY enforce prohibition on internal Ian Jackson
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson

* Abolish the macro LIBXL__DOMAIN_IS_TYPE which had incorrect error
  handling.  At every call site, replace it with an open-coded call to
  libxl_domain_type and check against LIBXL_DOMAIN_TYPE_INVALID.

* This involves adding an `out:' to libxl_domain_unpause.

* In libxl_domain_destroy and do_pci_add, do not `default: abort();'
  if the domain type cannot be found.  Instead switch on
  LIBXL_DOMAIN_TYPE_INVALID specifically and do some actual error
  handling.

* In libxl__primary_console_find, remove a spurious default clause
  from the domain type switch.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Changes in v5:
 * Add `default: abort()' to libxl__domain_type switch.

Changes in v4 of series:
 * Hunk
     In libxl_domain_suspend (as reorganised) error check, check for
     LIBXL_DOMAIN_TYPE_INVALID and remove a pointless extra log message.
   merged into the earlier patch where the slightly-wrong code was
   introduced.
---
 tools/libxl/libxl.c          |   28 +++++++++++++++++++++++-----
 tools/libxl/libxl_internal.h |    5 +++--
 tools/libxl/libxl_pci.c      |   18 +++++++++++++-----
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 1a7404a..1b84398 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -390,7 +390,13 @@ int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel)
         goto out;
     }
 
-    if (LIBXL__DOMAIN_IS_TYPE(gc,  domid, HVM)) {
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (type == LIBXL_DOMAIN_TYPE_HVM) {
         rc = libxl__domain_resume_device_model(gc, domid);
         if (rc) {
             LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
@@ -788,7 +794,13 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
     char *state;
     int ret, rc = 0;
 
-    if (LIBXL__DOMAIN_IS_TYPE(gc,  domid, HVM)) {
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    if (type == LIBXL_DOMAIN_TYPE_INVALID) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (type == LIBXL_DOMAIN_TYPE_HVM) {
         path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid);
         state = libxl__xs_read(gc, XBT_NULL, path);
         if (state != NULL && !strcmp(state, "paused")) {
@@ -802,6 +814,7 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid)
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unpausing domain %d", domid);
         rc = ERROR_FAIL;
     }
+ out:
     GC_FREE;
     return rc;
 }
@@ -813,7 +826,11 @@ int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid)
     unsigned long pvdriver = 0;
     int ret;
 
-    if (LIBXL__DOMAIN_IS_TYPE(gc, domid, PV))
+    libxl_domain_type domtype = libxl__domain_type(gc, domid);
+    if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
+        return ERROR_FAIL;
+
+    if (domtype == LIBXL_DOMAIN_TYPE_PV)
         return 1;
 
     ret = xc_get_hvm_param(ctx->xch, domid, HVM_PARAM_CALLBACK_IRQ, &pvdriver);
@@ -1213,6 +1230,9 @@ int libxl_domain_destroy(libxl_ctx *ctx, uint32_t domid)
         pid = libxl__xs_read(gc, XBT_NULL, libxl__sprintf(gc, "/local/domain/%d/image/device-model-pid", domid));
         dm_present = (pid != NULL);
         break;
+    case LIBXL_DOMAIN_TYPE_INVALID:
+        rc = ERROR_FAIL;
+        goto out;
     default:
         abort();
     }
@@ -1362,8 +1382,6 @@ static int libxl__primary_console_find(libxl_ctx *ctx, uint32_t domid_vm,
         case LIBXL_DOMAIN_TYPE_INVALID:
             rc = ERROR_INVAL;
             goto out;
-        default:
-            abort();
         }
     }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9df0db5..36c75ed 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -797,8 +797,7 @@ _hidden int libxl__domain_cpupool(libxl__gc *gc, uint32_t domid);
 _hidden libxl_scheduler libxl__domain_scheduler(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__sched_set_params(libxl__gc *gc, uint32_t domid,
                                     libxl_domain_sched_params *scparams);
-#define LIBXL__DOMAIN_IS_TYPE(gc, domid, type) \
-    libxl__domain_type((gc), (domid)) == LIBXL_DOMAIN_TYPE_##type
+
 typedef struct {
     uint32_t store_port;
     uint32_t store_domid;
@@ -841,7 +840,9 @@ _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
 
 _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid);
 
+/* returns 0 or 1, or a libxl error code */
 _hidden int libxl__domain_pvcontrol_available(libxl__gc *gc, uint32_t domid);
+
 _hidden char * libxl__domain_pvcontrol_read(libxl__gc *gc,
                                             xs_transaction_t t, uint32_t domid);
 _hidden int libxl__domain_pvcontrol_write(libxl__gc *gc, xs_transaction_t t,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index de1b79f..81438be 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -128,7 +128,11 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d
     if (!num_devs)
         return libxl__create_pci_backend(gc, domid, pcidev, 1);
 
-    if (!starting && LIBXL__DOMAIN_IS_TYPE(gc, domid, PV)) {
+    libxl_domain_type domtype = libxl__domain_type(gc, domid);
+    if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
+        return ERROR_FAIL;
+
+    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, "4") < 0)
             return ERROR_FAIL;
     }
@@ -171,7 +175,11 @@ static int libxl__device_pci_remove_xenstore(libxl__gc *gc, uint32_t domid, libx
         return ERROR_INVAL;
     num = atoi(num_devs);
 
-    if (LIBXL__DOMAIN_IS_TYPE(gc, domid, PV)) {
+    libxl_domain_type domtype = libxl__domain_type(gc, domid);
+    if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
+        return ERROR_FAIL;
+
+    if (domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, "4") < 0) {
             LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "pci backend at %s is not ready", be_path);
             return ERROR_FAIL;
@@ -199,7 +207,7 @@ retry_transaction:
         if (errno == EAGAIN)
             goto retry_transaction;
 
-    if (LIBXL__DOMAIN_IS_TYPE(gc, domid, PV)) {
+    if (domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, "4") < 0) {
             LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "pci backend at %s is not ready", be_path);
             return ERROR_FAIL;
@@ -939,8 +947,8 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         break;
     }
-    default:
-        abort();
+    case LIBXL_DOMAIN_TYPE_INVALID:
+        return ERROR_FAIL;
     }
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 21/21] libxl: DO NOT APPLY enforce prohibition on internal
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (19 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 20/21] libxl: further fixups re LIBXL_DOMAIN_TYPE Ian Jackson
@ 2012-06-26 17:55 ` Ian Jackson
  2012-06-26 18:00 ` [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
  2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
  22 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 17:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Ian Campbell, Roger Pau Monne

DO NOT APPLY THIS PATCH.
It contains -Wno-error.  Without that it would break the build.

Subject [PATCH] libxl: enforce prohibitions of internal callers

libxl_internal.h says:

 * Functions using LIBXL__INIT_EGC may *not* generally be called from
 * within libxl, because libxl__egc_cleanup may call back into the
 * application. ...

and

 *                    ...  [Functions which take an ao_how] MAY NOT
 * be called from inside libxl, because they can cause reentrancy
 * callbacks.

However, this was not enforced.  Particularly the latter restriction
is easy to overlook, especially since during the transition period to
the new event system we have bent this rule a couple of times, and the
bad pattern simply involves passing 0 or NULL for the ao_how.

So use the compiler to enforce this property, as follows:

 - Mark all functions which take a libxl_asyncop_how, or which
   use EGC_INIT or LIBXL__INIT_EGC, with a new annotation
   LIBXL_EXTERNAL_CALLERS_ONLY in the public header.

 - Change the documentation comment for asynch operations and egcs to
   say that this should always be done.

 - Arrange that if libxl.h is included via libxl_internal.h,
   LIBXL_EXTERNAL_CALLERS_ONLY expands to __attribute__((warning(...))),
   which generates a message like this:
      libxl.c:1772: warning: call to 'libxl_device_disk_remove'
             declared with attribute warning:
             may not be called from within libxl
   Otherwise, the annotation expands to nothing, so external
   callers are unaffected.

 - Forbid inclusion of both libxl.h and libxl_internal.h unless
   libxl_internal.h came first, so that the above check doesn't have
   any loopholes.  Files which include libxl_internal.h should not
   include libxl.h as well.

   This is enforced explicitly using #error.  However, in practice
   with the current tree it just changes the error message when this
   mistake is made; otherwise we would carry on to immediately
   following #define which would cause the compiler to complain that
   LIBXL_EXTERNAL_CALLERS_ONLY was redefined.  Then the developer
   might be tempted to add a #ifndef which would be wrong - it would
   leave the affected translation unit unprotected by the new
   enforcement regime.  So let's be explicit.

 - Fix the one source of files which violate the above principle, the
   output from the idl compiler, by removing the redundant inclusion
   of libxl.h from the output.

In the tree I am using as a base at the time of writing, this new
restriction catches three errors: two in libxl_device_disk_remove and
one in libxl__device_disk_local_detach.  To avoid entirely breaking my
build I have also done this:

 - Temporarily change -Werror to -Wno-error in the libxl Makefile.

This patch should not be applied in this form.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 .gitignore                        |    1 +
 .hgignore                         |    1 +
 tools/libxl/Makefile              |   16 +++++++++++++---
 tools/libxl/check-libxl-api-rules |   15 +++++++++++++++
 tools/libxl/gentypes.py           |    1 -
 tools/libxl/libxl.h               |   34 +++++++++++++++++++++++++---------
 tools/libxl/libxl_event.h         |   21 ++++++++++++++-------
 tools/libxl/libxl_internal.h      |   14 ++++++++++----
 8 files changed, 79 insertions(+), 24 deletions(-)
 create mode 100755 tools/libxl/check-libxl-api-rules

diff --git a/.gitignore b/.gitignore
index 3451e52..22faeaa 100644
--- a/.gitignore
+++ b/.gitignore
@@ -187,6 +187,7 @@ tools/libxl/xl
 tools/libxl/testenum
 tools/libxl/testenum.c
 tools/libxl/tmp.*
+tools/libxl/libxl.api-for-check
 tools/libaio/src/*.ol
 tools/libaio/src/*.os
 tools/misc/cpuperf/cpuperf-perfcntr
diff --git a/.hgignore b/.hgignore
index 05304ea..5756bf8 100644
--- a/.hgignore
+++ b/.hgignore
@@ -185,6 +185,7 @@
 ^tools/libxl/testidl\.c$
 ^tools/libxl/tmp\..*$
 ^tools/libxl/.*\.new$
+^tools/libxl/libxl\.api-for-check
 ^tools/libvchan/vchan-node[12]$
 ^tools/libaio/src/.*\.ol$
 ^tools/libaio/src/.*\.os$
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index ddc2624..1c8b62b 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -11,7 +11,7 @@ MINOR = 0
 XLUMAJOR = 1.0
 XLUMINOR = 0
 
-CFLAGS += -Werror -Wno-format-zero-length -Wmissing-declarations \
+CFLAGS += -Wno-error -Wno-format-zero-length -Wmissing-declarations \
 	-Wno-declaration-after-statement -Wformat-nonliteral
 CFLAGS += -I. -fPIC
 
@@ -74,7 +74,8 @@ LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 $(LIBXL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) $(CFLAGS_libxenstore) $(CFLAGS_libblktapctl) -include $(XEN_ROOT)/tools/config.h
 
 AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \
-	_libxl_save_msgs_callout.h _libxl_save_msgs_helper.h
+	_libxl_save_msgs_callout.h _libxl_save_msgs_helper.h \
+        libxl.api-ok
 AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c
 AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c
 LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \
@@ -113,6 +114,15 @@ $(LIBXL_OBJS) $(LIBXLU_OBJS) $(XL_OBJS): $(AUTOINCS)
 genpath-target = $(call buildmakevars2file,_paths.h.tmp)
 $(eval $(genpath-target))
 
+libxl.api-ok: check-libxl-api-rules libxl.api-for-check
+	perl $^
+
+%.api-for-check: %.h
+	$(CC) $(CPPFLAGS) $(CFLAGS) $(CFLAGS_$*.o) -c -E $< $(APPEND_CFLAGS) \
+		-DLIBXL_EXTERNAL_CALLERS_ONLY=LIBXL_EXTERNAL_CALLERS_ONLY \
+		>$@.new
+	$(call move-if-changed,$@.new,$@)
+
 _paths.h: genpath
 	sed -e "s/\([^=]*\)=\(.*\)/#define \1 \2/g" $@.tmp >$@.2.tmp
 	rm -f $@.tmp
@@ -200,7 +210,7 @@ install: all
 .PHONY: clean
 clean:
 	$(RM) -f _*.h *.o *.so* *.a $(CLIENTS) $(DEPS)
-	$(RM) -f _*.c *.pyc _paths.*.tmp
+	$(RM) -f _*.c *.pyc _paths.*.tmp *.api-for-check
 	$(RM) -f testidl.c.new testidl.c
 #	$(RM) -f $(AUTOSRCS) $(AUTOINCS)
 
diff --git a/tools/libxl/check-libxl-api-rules b/tools/libxl/check-libxl-api-rules
new file mode 100755
index 0000000..e056573
--- /dev/null
+++ b/tools/libxl/check-libxl-api-rules
@@ -0,0 +1,15 @@
+#!/usr/bin/perl -w
+use strict;
+our $needed=0;
+while (<>) {
+      if (m/libxl_asyncop_how[^;]/) {
+         $needed=1;
+      }      
+      if (m/LIBXL_EXTERNAL_CALLERS_ONLY/) {
+          $needed=0;
+      }
+      next unless $needed;
+      if (m/\;/) {
+          die "$ARGV:$.:missing LIBXL_EXTERNAL_CALLERS_ONLY";
+      }
+}
diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py
index 3c561ba..6e83b21 100644
--- a/tools/libxl/gentypes.py
+++ b/tools/libxl/gentypes.py
@@ -341,7 +341,6 @@ if __name__ == '__main__':
 #include <stdlib.h>
 #include <string.h>
 
-#include "libxl.h"
 #include "libxl_internal.h"
 
 #define LIBXL_DTOR_POISON 0xa5
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 10d7115..1a32d9e 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -266,6 +266,13 @@
 #endif
 #endif
 
+/* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
+ * called from within libxl itself. Callers outside libxl, who
+ * do not #include libxl_internal.h, are fine. */
+#ifndef LIBXL_EXTERNAL_CALLERS_ONLY
+#define LIBXL_EXTERNAL_CALLERS_ONLY /* disappears for callers outside libxl */
+#endif
+
 typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMT "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx"
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
@@ -495,11 +502,13 @@ int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
-                            const libxl_asyncprogress_how *aop_console_how);
+                            const libxl_asyncprogress_how *aop_console_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
                                 uint32_t *domid, int restore_fd,
                                 const libxl_asyncop_how *ao_how,
-                                const libxl_asyncprogress_how *aop_console_how);
+                                const libxl_asyncprogress_how *aop_console_how)
+                                LIBXL_EXTERNAL_CALLERS_ONLY;
   /* A progress report will be made via ao_console_how, of type
    * domain_create_console_available, when the domain's primary
    * console is available and can be connected to.
@@ -510,7 +519,8 @@ void libxl_domain_config_dispose(libxl_domain_config *d_config);
 
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
                          int flags, /* LIBXL_SUSPEND_* */
-                         const libxl_asyncop_how *ao_how);
+                         const libxl_asyncop_how *ao_how)
+                         LIBXL_EXTERNAL_CALLERS_ONLY;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
 
@@ -522,7 +532,8 @@ int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel);
 
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              uint32_t domid, int send_fd, int recv_fd,
-                             const libxl_asyncop_how *ao_how);
+                             const libxl_asyncop_how *ao_how)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
 
 int libxl_domain_shutdown(libxl_ctx *ctx, uint32_t domid);
 int libxl_domain_reboot(libxl_ctx *ctx, uint32_t domid);
@@ -544,7 +555,8 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid);
 
 int libxl_domain_core_dump(libxl_ctx *ctx, uint32_t domid,
                            const char *filename,
-                           const libxl_asyncop_how *ao_how);
+                           const libxl_asyncop_how *ao_how)
+                           LIBXL_EXTERNAL_CALLERS_ONLY;
 
 int libxl_domain_setmaxmem(libxl_ctx *ctx, uint32_t domid, uint32_t target_memkb);
 int libxl_set_memory_target(libxl_ctx *ctx, uint32_t domid, int32_t target_memkb, int relative, int enforce);
@@ -653,7 +665,8 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nr);
 int libxl_device_disk_add(libxl_ctx *ctx, uint32_t domid, libxl_device_disk *disk);
 int libxl_device_disk_remove(libxl_ctx *ctx, uint32_t domid,
                              libxl_device_disk *disk,
-                             const libxl_asyncop_how *ao_how);
+                             const libxl_asyncop_how *ao_how)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_disk_destroy(libxl_ctx *ctx, uint32_t domid,
                               libxl_device_disk *disk);
 
@@ -671,7 +684,8 @@ int libxl_cdrom_insert(libxl_ctx *ctx, uint32_t domid, libxl_device_disk *disk);
 int libxl_device_nic_add(libxl_ctx *ctx, uint32_t domid, libxl_device_nic *nic);
 int libxl_device_nic_remove(libxl_ctx *ctx, uint32_t domid,
                             libxl_device_nic *nic,
-                            const libxl_asyncop_how *ao_how);
+                            const libxl_asyncop_how *ao_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_nic_destroy(libxl_ctx *ctx, uint32_t domid, libxl_device_nic *nic);
 
 libxl_device_nic *libxl_device_nic_list(libxl_ctx *ctx, uint32_t domid, int *num);
@@ -682,14 +696,16 @@ int libxl_device_nic_getinfo(libxl_ctx *ctx, uint32_t domid,
 int libxl_device_vkb_add(libxl_ctx *ctx, uint32_t domid, libxl_device_vkb *vkb);
 int libxl_device_vkb_remove(libxl_ctx *ctx, uint32_t domid,
                             libxl_device_vkb *vkb,
-                            const libxl_asyncop_how *ao_how);
+                            const libxl_asyncop_how *ao_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_vkb_destroy(libxl_ctx *ctx, uint32_t domid, libxl_device_vkb *vkb);
 
 /* Framebuffer */
 int libxl_device_vfb_add(libxl_ctx *ctx, uint32_t domid, libxl_device_vfb *vfb);
 int libxl_device_vfb_remove(libxl_ctx *ctx, uint32_t domid,
                             libxl_device_vfb *vfb,
-                            const libxl_asyncop_how *ao_how);
+                            const libxl_asyncop_how *ao_how)
+                            LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_vfb_destroy(libxl_ctx *ctx, uint32_t domid, libxl_device_vfb *vfb);
 
 /* PCI Passthrough */
diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 713d96d..3344bc8 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -37,7 +37,8 @@ typedef int libxl_event_predicate(const libxl_event*, void *user);
 
 int libxl_event_check(libxl_ctx *ctx, libxl_event **event_r,
                       uint64_t typemask,
-                      libxl_event_predicate *predicate, void *predicate_user);
+                      libxl_event_predicate *predicate, void *predicate_user)
+                      LIBXL_EXTERNAL_CALLERS_ONLY;
   /* Searches for an event, already-happened, which matches typemask
    * and predicate.  predicate==0 matches any event.
    * libxl_event_check returns the event, which must then later be
@@ -48,7 +49,8 @@ int libxl_event_check(libxl_ctx *ctx, libxl_event **event_r,
 
 int libxl_event_wait(libxl_ctx *ctx, libxl_event **event_r,
                      uint64_t typemask,
-                     libxl_event_predicate *predicate, void *predicate_user);
+                     libxl_event_predicate *predicate, void *predicate_user)
+                     LIBXL_EXTERNAL_CALLERS_ONLY;
   /* Like libxl_event_check but blocks if no suitable events are
    * available, until some are.  Uses libxl_osevent_beforepoll/
    * _afterpoll so may be inefficient if very many domains are being
@@ -256,7 +258,8 @@ struct pollfd;
  */
 int libxl_osevent_beforepoll(libxl_ctx *ctx, int *nfds_io,
                              struct pollfd *fds, int *timeout_upd,
-                             struct timeval now);
+                             struct timeval now)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
 
 /* nfds and fds[0..nfds] must be from the most recent call to
  * _beforepoll, as modified by poll.  (It is therefore not possible
@@ -271,7 +274,8 @@ int libxl_osevent_beforepoll(libxl_ctx *ctx, int *nfds_io,
  * libxl_event_check.
  */
 void libxl_osevent_afterpoll(libxl_ctx *ctx, int nfds, const struct pollfd *fds,
-                             struct timeval now);
+                             struct timeval now)
+                             LIBXL_EXTERNAL_CALLERS_ONLY;
 
 
 typedef struct libxl_osevent_hooks {
@@ -357,14 +361,16 @@ void libxl_osevent_register_hooks(libxl_ctx *ctx,
  */
 
 void libxl_osevent_occurred_fd(libxl_ctx *ctx, void *for_libxl,
-                               int fd, short events, short revents);
+                               int fd, short events, short revents)
+                               LIBXL_EXTERNAL_CALLERS_ONLY;
 
 /* Implicitly, on entry to this function the timeout has been
  * deregistered.  If _occurred_timeout is called, libxl will not
  * call timeout_deregister; if it wants to requeue the timeout it
  * will call timeout_register again.
  */
-void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl);
+void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
+                                    LIBXL_EXTERNAL_CALLERS_ONLY;
 
 
 /*======================================================================*/
@@ -506,7 +512,8 @@ void libxl_childproc_setmode(libxl_ctx *ctx, const libxl_childproc_hooks *hooks,
  * certainly need to use the self-pipe trick (or a working pselect or
  * ppoll) to implement this.
  */
-int libxl_childproc_reaped(libxl_ctx *ctx, pid_t, int status);
+int libxl_childproc_reaped(libxl_ctx *ctx, pid_t, int status)
+                           LIBXL_EXTERNAL_CALLERS_ONLY;
 
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 36c75ed..6c859bc 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -52,6 +52,12 @@
 
 #include <xen/io/xenbus.h>
 
+#ifdef LIBXL_H
+# error libxl.h should be included via libxl_internal.h, not separately
+#endif
+#define LIBXL_EXTERNAL_CALLERS_ONLY \
+    __attribute__((warning("may not be called from within libxl")))
+
 #include "libxl.h"
 #include "_paths.h"
 #include "_libxl_save_msgs_callout.h"
@@ -1538,10 +1544,10 @@ libxl__device_model_version_running(libxl__gc *gc, uint32_t domid);
  *
  * Functions using LIBXL__INIT_EGC may *not* generally be called from
  * within libxl, because libxl__egc_cleanup may call back into the
- * application.  This should be documented near the function
- * prototype(s) for callers of LIBXL__INIT_EGC and EGC_INIT.  You
- * should in any case not find it necessary to call egc-creators from
- * within libxl.
+ * application.  This should be enforced by declaring all such
+ * functions in libxl.h or libxl_event.h with
+ * LIBXL_EXTERNAL_CALLERS_ONLY.  You should in any case not find it
+ * necessary to call egc-creators from within libxl.
  *
  * The callbacks must all take place with the ctx unlocked because
  * the application is entitled to reenter libxl from them.  This
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (20 preceding siblings ...)
  2012-06-26 17:55 ` [PATCH 21/21] libxl: DO NOT APPLY enforce prohibition on internal Ian Jackson
@ 2012-06-26 18:00 ` Ian Jackson
  2012-06-26 18:44   ` Shriram Rajagopalan
  2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
  22 siblings, 1 reply; 40+ messages in thread
From: Ian Jackson @ 2012-06-26 18:00 UTC (permalink / raw)
  To: Shriram Rajagopalan; +Cc: xen-devel

I wrote:
> This is v5 of my series to asyncify save/restore, rebased to tip and
> retested.  There are minor changes to 3 patches, as discussed on-list,
> marked with "*" below:
> 
>     01/21 libxc: xc_domain_restore, make toolstack_restore const-correct
>     02/21 libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails
>     03/21 libxl: domain save: rename variables etc.
>     04/21 libxl: domain restore: reshuffle, preparing for ao
>   * 05/21 libxl: domain save: API changes for asynchrony
>   * 06/21 libxl: domain save/restore: run in a separate process
>     07/21 libxl: rename libxl_dom:save_helper to physmap_path
>     08/21 libxl: provide libxl__xs_*_checked and libxl__xs_transaction_*
>     09/21 libxl: wait for qemu to acknowledge logdirty command
>     10/21 libxl: datacopier: provide "prefix data" facility
>     11/21 libxl: prepare for asynchronous writing of qemu save file
>     12/21 libxl: Make libxl__domain_save_device_model asynchronous
>     13/21 libxl: Add a gc to libxl_get_cpu_topology
>     14/21 libxl: Do not pass NULL as gc_opt; introduce NOGC
>     15/21 libxl: Get compiler to warn about gc_opt==NULL
>     16/21 xl: Handle return value from libxl_domain_suspend correctly
>     17/21 libxl: do not leak dms->saved_state
>     18/21 libxl: do not leak spawned middle children
>     19/21 libxl: do not leak an event struct on ignored ao progress
>   * 20/21 libxl: further fixups re LIBXL_DOMAIN_TYPE
>   ! 21/21 libxl: DO NOT APPLY enforce prohibition on internal
> 
> All of these apart from the last have been acked and I intend to
> commit those to xen-unstable.hg soon.
> 
> However, first I will invite Shriram to check that Remus is still
> working.  (I can't conveniently do this with this message due to
> shoddiness in git-send-email.)

Shriram, would you care to take a look at this series and perhaps
retest it ?

If you would prefer a git branch to a series of patches, you can find
it here:
  http://xenbits.xen.org/gitweb/?p=people/iwj/xen-unstable.git;a=shortlog;h=refs/heads/for-shriram
  git://xenbits.xen.org/people/iwj/xen-unstable.git#shriram
NB that branch is REBASING.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-26 18:00 ` [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
@ 2012-06-26 18:44   ` Shriram Rajagopalan
  2012-06-27  1:25     ` Shriram Rajagopalan
  2012-06-27 13:17     ` Ian Jackson
  0 siblings, 2 replies; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-26 18:44 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 654 bytes --]

>
> Shriram, would you care to take a look at this series and perhaps
> retest it ?
>
>
Sure will do.


If you would prefer a git branch to a series of patches, you can find
> it here:
>
> http://xenbits.xen.org/gitweb/?p=people/iwj/xen-unstable.git;a=shortlog;h=refs/heads/for-shriram
>  git://xenbits.xen.org/people/iwj/xen-unstable.git#shriram
> NB that branch is REBASING.
>
>
I am not too familiar with the git lingo.. What did you mean by "branch is
rebasing" ?
Am I supposed to do something special, apart from the normal process below:
git clone git://xen....
git checkout -b for-shriram origin/for-shriram

thanks
shriram


> Thanks,
> Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 1462 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-26 18:44   ` Shriram Rajagopalan
@ 2012-06-27  1:25     ` Shriram Rajagopalan
  2012-06-27 13:46       ` Ian Jackson
  2012-06-27 13:17     ` Ian Jackson
  1 sibling, 1 reply; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-27  1:25 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3778 bytes --]

Ian,
 The code segfaults. Here are the system details and error traces from gdb.

My setup:

dom0 : ubuntu 64bit, 2.6.32-39 (pvops kernel),
           running latest xen-4.2-unstable (built from your repo)
           tools stack also built from your repo (which I hope has all the
latest patches).

domU: ubuntu 32bit PV, xenolinux kernel (2.6.32.2 - novel suse version)
           with suspend event channel support

As a sanity check, I tested xl remus with latest tip from xen-unstable
mercurial repo, c/s: 25496:e08cf97e76f0

Blackhole replication (to /dev/null) and localhost replication worked as
expected
and the guest recovered properly without any issues.

These are the commands, just in case you wish to try them yourself on any
guest.

 nohup xl remus -b -i 100 domU dummy >logfile 2>&1 &
 nohup xl remus -i 100 -e domU localhost >logfile 2>&1 &

With the your repo, both blackhole replication and localhost replication
segfault.
I havent tested remote replication. [I dont know if the segfault is from
your patches
or someone else's :) ]

The source domain is left in ---ss- state.
With localhost replication, the targetdomain--incoming becomes operational,
but without renaming.

Blackhole replication:
================
xl error:
----------
xc: error: Could not get domain info (3 = No such process): Internal error
libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for
domain 4154075147: No such process
 libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable to
open qemu save file ?8b: No such file or directory

I also ran xl in GDB to get a stack trace and hopefully some useful debug
info.
gdb traces: http://pastebin.com/7zFwFjW4


Localhost replication: Partial success, but xl still segfaults
 dmesg shows
 [ 1399.254849] xl[4716]: segfault at 0 ip 00007f979483a417 sp
00007fffe06043e0 error 6 in libxenlight.so.2.0.0[7f9794807000+4d000]

xl error:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/487)
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/487)
 Savefile contains xl domain config
xc: error: Could not get domain info (3 = No such process): Internal error
libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for
domain 2491594763: No such process
libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable to
open qemu save file `??: No such file or directory
xc: error: 0-length read: Internal error
xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error
xc: error: Error when reading batch size (0 = Success): Internal error
xc: error: error when buffering batch, finishing (0 = Success): Internal
error
migration target: Remus Failover for domain 3
libxl: error: libxl.c:313:libxl__domain_rename: domain with name "drbd-vm"
already exists.
migration target (Remus): Failed to rename domain from drbd-vm--incoming to
drbd-vm:-6

I see calls related to qemu, but I am running a PV guest!

thanks
shriram


On Tue, Jun 26, 2012 at 2:44 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:

> Shriram, would you care to take a look at this series and perhaps
>> retest it ?
>>
>>
> Sure will do.
>
>
> If you would prefer a git branch to a series of patches, you can find
>> it here:
>>
>> http://xenbits.xen.org/gitweb/?p=people/iwj/xen-unstable.git;a=shortlog;h=refs/heads/for-shriram
>>  git://xenbits.xen.org/people/iwj/xen-unstable.git#shriram
>> NB that branch is REBASING.
>>
>>
> I am not too familiar with the git lingo.. What did you mean by "branch is
> rebasing" ?
> Am I supposed to do something special, apart from the normal process
> below:
> git clone git://xen....
> git checkout -b for-shriram origin/for-shriram
>
> thanks
> shriram
>
>
>> Thanks,
>> Ian.
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 5877 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-26 18:44   ` Shriram Rajagopalan
  2012-06-27  1:25     ` Shriram Rajagopalan
@ 2012-06-27 13:17     ` Ian Jackson
  2012-06-27 13:28       ` Shriram Rajagopalan
  1 sibling, 1 reply; 40+ messages in thread
From: Ian Jackson @ 2012-06-27 13:17 UTC (permalink / raw)
  To: rshriram; +Cc: xen-devel

Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"):
> Shriram, would you care to take a look at this series and perhaps
> retest it ?
...
> If you would prefer a git branch to a series of patches, you can find
> it here:
>  http://xenbits.xen.org/gitweb/?p=people/iwj/xen-unstable.git;a=shortlog;h=refs/heads/for-shriram
>  git://xenbits.xen.org/people/iwj/xen-unstable.git#shriram<http://xenbits.xen.org/people/iwj/xen-unstable.git#shriram>
> NB that branch is REBASING.
...
> I am not too familiar with the git lingo.. What did you mean by "branch is rebasing" ?
> Am I supposed to do something special, apart from the normal process below:
> git clone git://xen....
> git checkout -b for-shriram origin/for-shriram

Yes, that will work just fine.

The warning about rebasing is because this branch may have its
history rewritten, so if I update it in future you won't be able to
just pull from it to update.  In that case I'll be happy to help out
with instructions.

Regards,
Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 13:17     ` Ian Jackson
@ 2012-06-27 13:28       ` Shriram Rajagopalan
  0 siblings, 0 replies; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-27 13:28 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1285 bytes --]

On Wed, Jun 27, 2012 at 9:17 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain
> save/restore: run in a separate process"):
> > Shriram, would you care to take a look at this series and perhaps
> > retest it ?
> ...
> > If you would prefer a git branch to a series of patches, you can find
> > it here:
> >
> http://xenbits.xen.org/gitweb/?p=people/iwj/xen-unstable.git;a=shortlog;h=refs/heads/for-shriram
> >  git://xenbits.xen.org/people/iwj/xen-unstable.git#shriram<
> http://xenbits.xen.org/people/iwj/xen-unstable.git#shriram>
> > NB that branch is REBASING.
> ...
> > I am not too familiar with the git lingo.. What did you mean by "branch
> is rebasing" ?
> > Am I supposed to do something special, apart from the normal process
> below:
> > git clone git://xen....
> > git checkout -b for-shriram origin/for-shriram
>
> Yes, that will work just fine.
>
> The warning about rebasing is because this branch may have its
> history rewritten, so if I update it in future you won't be able to
> just pull from it to update.  In that case I'll be happy to help out
> with instructions.
>
>
Yep got that. Pulled from that branch and tested. Sent out the results
yesterday (under this thread).


> Regards,
> Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 2233 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27  1:25     ` Shriram Rajagopalan
@ 2012-06-27 13:46       ` Ian Jackson
  2012-06-27 15:59         ` Ian Jackson
  2012-06-27 16:06         ` Shriram Rajagopalan
  0 siblings, 2 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-27 13:46 UTC (permalink / raw)
  To: rshriram; +Cc: xen-devel

Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"):
> Ian,
>  The code segfaults. Here are the system details and error traces from gdb.

Thanks.

> My setup:
> 
> dom0 : ubuntu 64bit, 2.6.32-39 (pvops kernel),
>            running latest xen-4.2-unstable (built from your repo)
>            tools stack also built from your repo (which I hope has all the latest patches).
> 
> domU: ubuntu 32bit PV, xenolinux kernel (2.6.32.2 - novel suse version)
>            with suspend event channel support
> 
> As a sanity check, I tested xl remus with latest tip from xen-unstable
> mercurial repo, c/s: 25496:e08cf97e76f0
> 
> Blackhole replication (to /dev/null) and localhost replication worked as expected
> and the guest recovered properly without any issues.

Thanks for the test runes.  That didn't work entirely properly for
me, even with the xen-unstable baseline.

I did this
   xl -vvvv remus -b -i 100 debian.guest.osstest dummy >remus.log 2>&1 &
The result was that the guest's networking broke.  The guest shows up
in xl list as
   debian.guest.osstest                      7   512     1     ---ss-       5.2
and is still responsive on its pv console.  After I killed the remus
process, the guest's networking was still broken.

At the start, the guest prints this on its console:
  [   36.017241] WARNING: g.e. still in use!
  [   36.021056] WARNING: g.e. still in use!
  [   36.024740] WARNING: g.e. still in use!
  [   36.024763] WARNING: g.e. still in use!

If I try the rune with "localhost" I would have expected, surely, to
see a domain with the incoming migration ?  But I don't.  I tried
killing the `xl remus' process and the guest became wedged.


However, when I apply my series, I can indeed produce an assertion
failure:

 xc: detail: All memory is saved
 xc: error: Could not get domain info (3 = No such process): Internal error
 libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for domain 3077579968: No such process
 xl: libxl_event.c:1426: libxl__ao_inprogress_gc: Assertion `ao->magic == 0xA0FACE00ul' failed.

So I have indeed made matters worse.


> Blackhole replication:
> ================
> xl error:
> ----------
> xc: error: Could not get domain info (3 = No such process): Internal error
> libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for domain 4154075147<tel:4154075147>: No such process
> libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable to open qemu save file ?8b: No such file or directory

I don't see that at all.

NB that PV guests may have a qemu for certain disk backends, or
consoles, depending on the configuration.  Can you show me your domain
config ?  Mine is below.

> I also ran xl in GDB to get a stack trace and hopefully some useful debug info.
> gdb traces: http://pastebin.com/7zFwFjW4

I get a different crash - see above.

> Localhost replication: Partial success, but xl still segfaults
>  dmesg shows
>  [ 1399.254849] xl[4716]: segfault at 0 ip 00007f979483a417 sp 00007fffe06043e0 error 6 in libxenlight.so.2.0.0[7f9794807000+4d000]

I see exactly the same thing with `localhost' instead of `dummy'.  And
I see no incoming domain.

I will investigate the crash I see.  In the meantime can you try to
help me see why it doesn't work me even with the baseline ?

Thanks,
Ian.

#
# Configuration file for the Xen instance debian.guest.osstest, created
# by xen-tools 4.2 on Thu Apr  5 16:43:43 2012.
#

#
#  Kernel + memory size
#
#kernel      = '/boot/vmlinuz-2.6.32.57'
#ramdisk     = '/boot/initrd.img-2.6.32.57'

#bootloader = 'pygrub'
bootloader = '/root/strace-pygrub'


memory      = '512'

#
#  Disk device(s).
#
root        = '/dev/xvda2 ro'
disk        = [
                  'phy:/dev/bedbug/debian.guest.osstest-disk,xvda2,w',
                  'phy:/dev/bedbug/debian.guest.osstest-swap,xvda1,w',
              ]


#
#  Physical volumes
#


#
#  Hostname
#
name        = 'debian.guest.osstest'

#
#  Networking
#
#dhcp        = 'dhcp'
vif         = [ 'mac=5a:36:0e:26:00:01' ]

#
#  Behaviour
#
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash='preserve'




vcpus = 1

extra='console=hvc0 earlyprintk=xen'

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 13:46       ` Ian Jackson
@ 2012-06-27 15:59         ` Ian Jackson
  2012-06-27 16:09           ` Shriram Rajagopalan
  2012-06-27 16:06         ` Shriram Rajagopalan
  1 sibling, 1 reply; 40+ messages in thread
From: Ian Jackson @ 2012-06-27 15:59 UTC (permalink / raw)
  To: rshriram, xen-devel

Ian Jackson writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"):
> However, when I apply my series, I can indeed produce an assertion
> failure:
...
> So I have indeed made matters worse.

I found two bugs:

1. The void* passed to the callback was being treated as a
libxl__domain_suspend_state* by the remus callbacks; this is a
holdover from a much earlier version of the series.  It should be
converted to a libxl__save_helper_state and then the dss extracted
with CONTAINER_OF.

2. The way remus works means that the toolstack save callback is
invoked more than once, which the helper's implementation was not
prepared to deal with.  Fix this by moving the rewind of the fd into
the helper.

Fixes for these are below.  With this, on top of my series, seem to I
get the same behaviour as with the baseline.  Would you like to try it ?

Thanks,
Ian.

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index abc5932..069aca1 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -984,7 +984,8 @@ static int libxl__remus_domain_suspend_callback(void *data)
 
 static int libxl__remus_domain_resume_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
     /* Resumes the domain and the device model */
@@ -1002,7 +1003,8 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     libxl__egc *egc = dss->shs.egc;
     STATE_AO_GC(dss->ao);
 
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 6332beb..078b7ee 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -105,13 +105,6 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
                                 toolstack_data_buf, toolstack_data_len,
                                 "toolstack data tmpfile", 0);
         if (r) { rc = ERROR_FAIL; goto out; }
-
-        r = lseek(toolstack_data_fd, 0, SEEK_SET);
-        if (r) {
-            LOGE(ERROR, "rewind toolstack data tmpfile");
-            rc = ERROR_FAIL;
-            goto out;
-        }
     }
 
     const unsigned long argnums[] = {
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 3bdfa28..772251a 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -171,12 +171,14 @@ static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
 {
     assert(toolstack_save_fd > 0);
 
+    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
+    if (r) fail(errno,"rewind toolstack data tmpfile");
+
     *buf = xmalloc(toolstack_save_len);
-    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
+    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
     if (r<0) fail(errno,"read toolstack data");
     if (r==0) fail(0,"read toolstack data eof");
 
-    toolstack_save_fd = -1;
     *len = toolstack_save_len;
     return 0;
 }

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 13:46       ` Ian Jackson
  2012-06-27 15:59         ` Ian Jackson
@ 2012-06-27 16:06         ` Shriram Rajagopalan
  1 sibling, 0 replies; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-27 16:06 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7105 bytes --]

On Wed, Jun 27, 2012 at 9:46 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain
> save/restore: run in a separate process"):
> > Ian,
> >  The code segfaults. Here are the system details and error traces from
> gdb.
>
> Thanks.
>
> > My setup:
> >
> > dom0 : ubuntu 64bit, 2.6.32-39 (pvops kernel),
> >            running latest xen-4.2-unstable (built from your repo)
> >            tools stack also built from your repo (which I hope has all
> the latest patches).
> >
> > domU: ubuntu 32bit PV, xenolinux kernel (2.6.32.2 - novel suse version)
> >            with suspend event channel support
> >
> > As a sanity check, I tested xl remus with latest tip from xen-unstable
> > mercurial repo, c/s: 25496:e08cf97e76f0
> >
> > Blackhole replication (to /dev/null) and localhost replication worked as
> expected
> > and the guest recovered properly without any issues.
>
> Thanks for the test runes.  That didn't work entirely properly for
> me, even with the xen-unstable baseline.
>
> I did this
>   xl -vvvv remus -b -i 100 debian.guest.osstest dummy >remus.log 2>&1 &
> The result was that the guest's networking broke.  The guest shows up
> in xl list as
>   debian.guest.osstest                      7   512     1     ---ss-
> 5.2
> and is still responsive on its pv console.


This is normal. You are suspending every 100ms. So, when you see ---ss-,
you just ended up doing "xl list" right when the guest was suspended. :)

do a xl top and you would see the guest's state oscillate from --b-- to
--s--
depending on the checkpoint interval. Or do xl list multiple times.



> After I killed the remus
> process, the guest's networking was still broken.
>
>
That is strange..  xl remus has literally no networking support on the remus
front.  So, it shouldnt affect anything in the guest. In fact I repeated
your test
on my box , where the guest was continuously pinging a host . Pings
continued
to work. so did ssh.



> At the start, the guest prints this on its console:
>  [   36.017241] WARNING: g.e. still in use!
>  [   36.021056] WARNING: g.e. still in use!
>  [   36.024740] WARNING: g.e. still in use!
>  [   36.024763] WARNING: g.e. still in use!
>
> If I try the rune with "localhost" I would have expected, surely, to
> see a domain with the incoming migration ?  But I don't.  I tried
> killing the `xl remus' process and the guest became wedged.
>
>
With "-b" option the second argument (localhost|dummy) is ignored. Did you
try the command without the -b option, i.e.
xl remus -vvv -e domU localhost

But I was partially able to reproduce some of your test results without your
patches (i.e. on xen-unstable baseline). See end of mail for more details.


> However, when I apply my series, I can indeed produce an assertion
> failure:
>
>  xc: detail: All memory is saved
>  xc: error: Could not get domain info (3 = No such process): Internal error
>  libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed
> for domain 3077579968: No such process
>  xl: libxl_event.c:1426: libxl__ao_inprogress_gc: Assertion `ao->magic ==
> 0xA0FACE00ul' failed.
>
> So I have indeed made matters worse.
>
>
> > Blackhole replication:
> > ================
> > xl error:
> > ----------
> > xc: error: Could not get domain info (3 = No such process): Internal
> error
> > libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed
> for domain 4154075147<tel:4154075147>: No such process
> > libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable
> to open qemu save file ?8b: No such file or directory
>
> I don't see that at all.
>
> NB that PV guests may have a qemu for certain disk backends, or
> consoles, depending on the configuration.  Can you show me your domain
> config ?  Mine is below.
>
>
Ah that explains the qemu related calls.

My Guest config: (from tests on 32bit PV domU w/ suspend event channel
support)

kernel = "/home/kernels/vmlinuz-2.6.32.2-xenu"
memory = 1024
name = "xltest2"
vcpus = 2
vif = [ 'mac=00:16:3e:00:00:01,bridge=eth0' ]
disk = [ 'phy:/dev/drbd1,xvda1,w']
hostname= "rshriram-vm3"
root = "/dev/xvda1 ro"
extra = "console=xvc0 3"
on_poweroff = 'destroy'
on_reboot   = 'destroy'
on_crash    = 'coredump-destroy'

NB: This guest kernel has suspend-event-channel support
which is available in all suse-kernels I suppose. If you would
just like to use mine, the source tarball (2.6.32.2 version + kernel config)
is at http://aramis.nss.cs.ubc.ca/xenolinux-2.6.32.2.tar.gz


> I also ran xl in GDB to get a stack trace and hopefully some useful debug
> info.
> > gdb traces: http://pastebin.com/7zFwFjW4
>
> I get a different crash - see above.
>
> > Localhost replication: Partial success, but xl still segfaults
> >  dmesg shows
> >  [ 1399.254849] xl[4716]: segfault at 0 ip 00007f979483a417 sp
> 00007fffe06043e0 error 6 in libxenlight.so.2.0.0[7f9794807000+4d000]
>
> I see exactly the same thing with `localhost' instead of `dummy'.  And
> I see no incoming domain.
>
> I will investigate the crash I see.  In the meantime can you try to
> help me see why it doesn't work me even with the baseline ?
>
>

I also tested with 64-bit 3.3.0 PV kernel (w/o suspend-event channel
support)

guest config:
kernel = "/home/kernels/vmlinuz-3.3.0-rc1-xenu"
memory = 1024
name = "xl-ubuntu-pv64"
vcpus = 2
vif = [ 'mac=00:16:3e:00:00:03, bridge=eth0' ]
disk = [ 'phy:/dev/vgdrbd/ubuntu-pv64,xvda1,w' ]
hostname= "rshriram-vm1"
root = "/dev/xvda1 ro"
extra = "console=hvc0 3"

With xen-unstable baseline,
Test 1. Blackhole replication
 command: nohup xl remus -vvv -e -b -i 100 xl-ubuntu-pv64 dummy
>blackhole.log 2>&1 &
 result: works (networking included)
debug output:
libxl: debug: libxl_dom.c:687:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:691:libxl__domain_suspend_common_callback: wait
for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:738:libxl__domain_suspend_common_callback: guest
acknowledged suspend request
libxl: debug: libxl_dom.c:742:libxl__domain_suspend_common_callback: wait
for the guest to suspend
libxl: debug: libxl_dom.c:754:libxl__domain_suspend_common_callback: guest
has suspended

 caveat: killing remus doesnt do a proper cleanup i.e if you killed it
while the domain was
             suspended, it leaves it in the suspended state (where libxl
waits for guest to suspend)
              Its a pain. In xend/python version, I added a handler
(SIGUSR1) , so that one could do
             pkill -USR1 -f remus and gracefully exit remus, without
wedging the domU.

             * I do not know if adding signal handlers is frowned upon in
the xl land :)
               If there is some protocol in place to handle such things, I
would be happy to send
               a patch that ensures that the guest is "resumed" while doing
blackhole replication

Test 2. Localhost replication w/ failover by destroying primary VM
 command: nohup xl remus -vvv -b -i 100 xl-ubuntu-pv64 localhost
>blackhole.log 2>&1 &
 result: works (networking included)

[-- Attachment #1.2: Type: text/html, Size: 9993 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 15:59         ` Ian Jackson
@ 2012-06-27 16:09           ` Shriram Rajagopalan
  2012-06-27 16:42             ` Shriram Rajagopalan
  0 siblings, 1 reply; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-27 16:09 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3673 bytes --]

On Wed, Jun 27, 2012 at 11:59 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Ian Jackson writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run
> in a separate process"):
> > However, when I apply my series, I can indeed produce an assertion
> > failure:
> ...
> > So I have indeed made matters worse.
>
> I found two bugs:
>
> 1. The void* passed to the callback was being treated as a
> libxl__domain_suspend_state* by the remus callbacks; this is a
> holdover from a much earlier version of the series.  It should be
> converted to a libxl__save_helper_state and then the dss extracted
> with CONTAINER_OF.
>
> 2. The way remus works means that the toolstack save callback is
> invoked more than once, which the helper's implementation was not
> prepared to deal with.  Fix this by moving the rewind of the fd into
> the helper.
>
> Fixes for these are below.  With this, on top of my series, seem to I
> get the same behaviour as with the baseline.  Would you like to try it ?
>
>
Sure, I ll give it a shot.
Btw, my earlier mail was in response to remus not
working on the baseline setup on your dev environment.



> Thanks,
> Ian.
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index abc5932..069aca1 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -984,7 +984,8 @@ static int libxl__remus_domain_suspend_callback(void
> *data)
>
>  static int libxl__remus_domain_resume_callback(void *data)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>     STATE_AO_GC(dss->ao);
>
>      /* Resumes the domain and the device model */
> @@ -1002,7 +1003,8 @@ static void remus_checkpoint_dm_saved(libxl__egc
> *egc,
>
>  static void libxl__remus_domain_checkpoint_callback(void *data)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>     libxl__egc *egc = dss->shs.egc;
>     STATE_AO_GC(dss->ao);
>
> diff --git a/tools/libxl/libxl_save_callout.c
> b/tools/libxl/libxl_save_callout.c
> index 6332beb..078b7ee 100644
> --- a/tools/libxl/libxl_save_callout.c
> +++ b/tools/libxl/libxl_save_callout.c
> @@ -105,13 +105,6 @@ void libxl__xc_domain_save(libxl__egc *egc,
> libxl__domain_suspend_state *dss,
>                                 toolstack_data_buf, toolstack_data_len,
>                                  "toolstack data tmpfile", 0);
>          if (r) { rc = ERROR_FAIL; goto out; }
> -
> -        r = lseek(toolstack_data_fd, 0, SEEK_SET);
> -        if (r) {
> -            LOGE(ERROR, "rewind toolstack data tmpfile");
> -            rc = ERROR_FAIL;
> -            goto out;
> -        }
>     }
>
>     const unsigned long argnums[] = {
> diff --git a/tools/libxl/libxl_save_helper.c
> b/tools/libxl/libxl_save_helper.c
> index 3bdfa28..772251a 100644
> --- a/tools/libxl/libxl_save_helper.c
> +++ b/tools/libxl/libxl_save_helper.c
> @@ -171,12 +171,14 @@ static int toolstack_save_cb(uint32_t domid, uint8_t
> **buf,
>  {
>     assert(toolstack_save_fd > 0);
>
> +    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
> +    if (r) fail(errno,"rewind toolstack data tmpfile");
> +
>      *buf = xmalloc(toolstack_save_len);
> -    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
> +    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
>     if (r<0) fail(errno,"read toolstack data");
>      if (r==0) fail(0,"read toolstack data eof");
>
> -    toolstack_save_fd = -1;
>     *len = toolstack_save_len;
>     return 0;
>  }
>
>

[-- Attachment #1.2: Type: text/html, Size: 4715 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 16:09           ` Shriram Rajagopalan
@ 2012-06-27 16:42             ` Shriram Rajagopalan
  2012-06-28 11:24               ` Ian Jackson
  0 siblings, 1 reply; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-27 16:42 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6212 bytes --]

On Wed, Jun 27, 2012 at 12:09 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:

> On Wed, Jun 27, 2012 at 11:59 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:
>
>> Ian Jackson writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run
>> in a separate process"):
>> > However, when I apply my series, I can indeed produce an assertion
>> > failure:
>> ...
>> > So I have indeed made matters worse.
>>
>> I found two bugs:
>>
>> 1. The void* passed to the callback was being treated as a
>> libxl__domain_suspend_state* by the remus callbacks; this is a
>> holdover from a much earlier version of the series.  It should be
>> converted to a libxl__save_helper_state and then the dss extracted
>> with CONTAINER_OF.
>>
>> 2. The way remus works means that the toolstack save callback is
>> invoked more than once, which the helper's implementation was not
>> prepared to deal with.  Fix this by moving the rewind of the fd into
>> the helper.
>>
>> Fixes for these are below.  With this, on top of my series, seem to I
>> get the same behaviour as with the baseline.  Would you like to try it ?
>>
>>
> Sure, I ll give it a shot.
> Btw, my earlier mail was in response to remus not
> working on the baseline setup on your dev environment.
>
>
The fix works for 2 out of 3 cases
 blackhole replication (xl remus -b)
 localhost replication with failover i.e. destroy primary (xl remus domU
localhost)

However, it crashes the guest for localhost replication, when I destroy the
backup
i.e. xl destroy domU--incoming . The primary guest would generally resume,
but in this
case its in --sc- state.
NB: This seems to happen in baseline xen-unstable too!.

xc: error: unexpected PFN mapping failure pfn 180e map_mfn 43b808 p2m_mfn
43b808: Internal error
libxl: error: libxl_create.c:760:libxl__xc_domain_restore_done: restoring
domain: Resource temporarily unavailable
libxl: error: libxl_create.c:844:domcreate_rebuild_done: cannot (re-)build
domain: -3
libxl: error: libxl.c:1220:libxl_domain_destroy: non-existant domain 17
libxl: error: libxl_create.c:995:domcreate_complete: unable to destroy
domain 17 following failed creation
migration target: Domain creation failed (code -3).
..
Total Data Sent= 12.597 MB
libxl: debug: libxl_dom.c:801:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:805:libxl__domain_suspend_common_callback: wait
for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:852:libxl__domain_suspend_common_callback: guest
acknowledged suspend request
libxl: debug: libxl_dom.c:856:libxl__domain_suspend_common_callback: wait
for the guest to suspend
libxl: debug: libxl_dom.c:870:libxl__domain_suspend_common_callback: guest
has suspended
pagetables=2,cache_misses=0,emptypages=45
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated
reading ipc msg header from domain 16 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 16
save/restore helper [3148] died due to fatal signal Broken pipe
libxl: debug: libxl_event.c:1434:libxl__ao_complete: ao 0x1b08c80:
complete, rc=-3
libxl: debug: libxl_event.c:1406:libxl__ao__destroy: ao 0x1b08c80: destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
xc: debug: hypercall buffer: total allocations:2116 total releases:2116
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:1729 misses:2 toobig:385




>
>
>> Thanks,
>> Ian.
>>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index abc5932..069aca1 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -984,7 +984,8 @@ static int libxl__remus_domain_suspend_callback(void
>> *data)
>>
>>  static int libxl__remus_domain_resume_callback(void *data)
>>  {
>> -    libxl__domain_suspend_state *dss = data;
>> +    libxl__save_helper_state *shs = data;
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>>     STATE_AO_GC(dss->ao);
>>
>>      /* Resumes the domain and the device model */
>> @@ -1002,7 +1003,8 @@ static void remus_checkpoint_dm_saved(libxl__egc
>> *egc,
>>
>>  static void libxl__remus_domain_checkpoint_callback(void *data)
>>  {
>> -    libxl__domain_suspend_state *dss = data;
>> +    libxl__save_helper_state *shs = data;
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>>     libxl__egc *egc = dss->shs.egc;
>>     STATE_AO_GC(dss->ao);
>>
>> diff --git a/tools/libxl/libxl_save_callout.c
>> b/tools/libxl/libxl_save_callout.c
>> index 6332beb..078b7ee 100644
>> --- a/tools/libxl/libxl_save_callout.c
>> +++ b/tools/libxl/libxl_save_callout.c
>> @@ -105,13 +105,6 @@ void libxl__xc_domain_save(libxl__egc *egc,
>> libxl__domain_suspend_state *dss,
>>                                 toolstack_data_buf, toolstack_data_len,
>>                                  "toolstack data tmpfile", 0);
>>          if (r) { rc = ERROR_FAIL; goto out; }
>> -
>> -        r = lseek(toolstack_data_fd, 0, SEEK_SET);
>> -        if (r) {
>> -            LOGE(ERROR, "rewind toolstack data tmpfile");
>> -            rc = ERROR_FAIL;
>> -            goto out;
>> -        }
>>     }
>>
>>     const unsigned long argnums[] = {
>> diff --git a/tools/libxl/libxl_save_helper.c
>> b/tools/libxl/libxl_save_helper.c
>> index 3bdfa28..772251a 100644
>> --- a/tools/libxl/libxl_save_helper.c
>> +++ b/tools/libxl/libxl_save_helper.c
>> @@ -171,12 +171,14 @@ static int toolstack_save_cb(uint32_t domid,
>> uint8_t **buf,
>>  {
>>     assert(toolstack_save_fd > 0);
>>
>> +    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
>> +    if (r) fail(errno,"rewind toolstack data tmpfile");
>> +
>>      *buf = xmalloc(toolstack_save_len);
>> -    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
>> +    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
>>     if (r<0) fail(errno,"read toolstack data");
>>      if (r==0) fail(0,"read toolstack data eof");
>>
>> -    toolstack_save_fd = -1;
>>     *len = toolstack_save_len;
>>     return 0;
>>  }
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 7981 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
  2012-06-27 16:42             ` Shriram Rajagopalan
@ 2012-06-28 11:24               ` Ian Jackson
  0 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-28 11:24 UTC (permalink / raw)
  To: rshriram; +Cc: xen-devel

Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"):
> Btw, my earlier mail was in response to remus not
> working on the baseline setup on your dev environment.

Right, thanks.

> The fix works for 2 out of 3 cases
>  blackhole replication (xl remus -b)
>  localhost replication with failover i.e. destroy primary (xl remus domU localhost)

Good.

> However, it crashes the guest for localhost replication, when I destroy the backup
> i.e. xl destroy domU--incoming . The primary guest would generally resume, but in this
> case its in --sc- state.
> NB: This seems to happen in baseline xen-unstable too!.

So in this case my series, with the fixup patch I sent, is no worse
than baseline xen-unstable ?

In that case I think the best thing would probably be for me to resend
my series with the fixups integrated, and commit it to xen-unstable
while we look into the remus problems.  Would that be OK with you ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
                   ` (21 preceding siblings ...)
  2012-06-26 18:00 ` [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
@ 2012-06-28 13:38 ` Ian Jackson
  2012-06-28 13:50   ` Ian Campbell
  2012-06-28 17:45   ` Ian Jackson
  22 siblings, 2 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-28 13:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Shriram Rajagopalan, Ian Campbell

I wrote:
> This is v5 of my series to asyncify save/restore, rebased to tip and
> retested.  There are minor changes to 3 patches, as discussed on-list,
> marked with "*" below:
...
>   * 06/21 libxl: domain save/restore: run in a separate process
...
> However, first I will invite Shriram to check that Remus is still
> working.  (I can't conveniently do this with this message due to
> shoddiness in git-send-email.)

Following testing by Shriram (thanks) I have an updated version of
06/21.  For the sake of everyone's sanity (and your MUAs) I shan't
repost the whole series.

Here is v6 of 06/21, which is simply the previous one with my earlier
fixup patch folded in.

CC Ian Campbell since he'd acked the previous one.  Ian, I have left
your ack on this version.  I trust that's OK.

Thanks,
Ian.


From: Ian Jackson <ian.jackson@eu.citrix.com>
Subject: [PATCH] libxl: domain save/restore: run in a separate process

libxenctrl expects to be able to simply run the save or restore
operation synchronously.  This won't work well in a process which is
trying to handle multiple domains.

The options are:

 - Block such a whole process (eg, the whole of libvirt) while
   migration completes (or until it fails).

 - Create a thread to run xc_domain_save and xc_domain_restore on.
   This is quite unpalatable.  Multithreaded programming is error
   prone enough without generating threads in libraries, particularly
   if the thread does some very complex operation.

 - Fork and run the operation in the child without execing.  This is
   no good because we would need to negotiate with the caller about
   fds we would inherit (and we might be a very large process).

 - Fork and exec a helper.

Of these options the latter is the most palatable.

Consequently:

 * A new helper program libxl-save-helper (which does both save and
   restore).  It will be installed in /usr/lib/xen/bin.  It does not
   link against libxl, only libxc, and its error handling does not
   need to be very advanced.  It does contain a plumbing through of
   the logging interface into the callback stream.

 * A small ad-hoc protocol between the helper and libxl which allows
   log messages and the libxc callbacks to be passed up and down.
   Protocol doc comment is in libxl_save_helper.c.

 * To avoid a lot of tedium the marshalling boilerplate (stubs for the
   helper and the callback decoder for libxl) is generated with a
   small perl script.

 * Implement new functionality to spawn the helper, monitor its
   output, provide responses, and check on its exit status.

 * The functions libxl__xc_domain_restore_done and
   libxl__xc_domain_save_done now turn out to want be called in the
   same place.  So make their state argument a void* so that the two
   functions are type compatible.

The domain save path still writes the qemu savefile synchronously.
This will need to be fixed in a subsequent patch.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

-
Changes in v6:
 * The void* passed to the callback was being treated as a
   libxl__domain_suspend_state* by the remus callbacks; this is a
   holdover from a much earlier version of the series.  It is now
   properly converted to a libxl__save_helper_state and then the dss
   extracted with CONTAINER_OF.
 * The way remus works means that the toolstack save callback is
   invoked more than once, which the helper's implementation was not
   prepared to deal with.  Fix this by moving the rewind of the fd
   into the helper.

Changes in v5:
 * assert that preserve_fds are >2.

Changes in v4:
 * Migration stream fd is handled specially by the run_helper
   function, rather than simply being a numarg.  Specifically:
     - dup it to a safe fd number if necessary.
     - clear cloexec flag fd before execing helper
 * Toolstack data fd argument to run_helper replaced with
   generic preserve_fds array, which get cloexec cleared.
 * libxl__xc_domain_save uses supplied callback function pointer,
   rather than calling libxl__toolstack_save directly;
   toolstack data save callback is only supplied to libxc if
   in-libxl caller supplied a callback.
 * libxl-save-helper is not needlessly linked against libxl.
 * Code which prepares pipes for helper clarified.
 * Deal properly with, and log properly, POLLPRI/POLLERR on
   pipe to save helper.
 * Spelling fix in perl script comment.
 * In message generator, use better names for the ends of serial
   conditional here documents.
 * Makefile does $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)

Changes in v3:
 * Suppress errno value in debug message when helper reports successful
   completion.
 * Significant consequential changes to cope with changes to
   earlier patches in the series.

Changes in v2:
 * Helper path can be overridden by an environment variable for testing.
 * Add a couple of debug logging messages re toolstack data.
 * Fixes from testing.
 * Helper protocol message lengths (and numbers) are 16-bit which
   more clearly avoids piling lots of junk on the stack.
 * Merged with remus changes.
 * Callback implementations in libxl now called via pointers
   so remus can have its own callbacks.
 * Better namespace prefixes on autogenerated names etc.
 * Autogenerator can generate debugging printfs too.

---
 .gitignore                         |    1 +
 .hgignore                          |    2 +
 tools/libxl/Makefile               |   22 ++-
 tools/libxl/libxl_create.c         |   22 ++-
 tools/libxl/libxl_dom.c            |   42 +++--
 tools/libxl/libxl_internal.h       |   56 +++++-
 tools/libxl/libxl_save_callout.c   |  361 +++++++++++++++++++++++++++++++-
 tools/libxl/libxl_save_helper.c    |  283 +++++++++++++++++++++++++
 tools/libxl/libxl_save_msgs_gen.pl |  397 ++++++++++++++++++++++++++++++++++++
 9 files changed, 1146 insertions(+), 40 deletions(-)

diff --git a/.gitignore b/.gitignore
index 7770e54..3451e52 100644
--- a/.gitignore
+++ b/.gitignore
@@ -353,6 +353,7 @@ tools/libxl/_*.[ch]
 tools/libxl/testidl
 tools/libxl/testidl.c
 tools/libxl/*.pyc
+tools/libxl/libxl-save-helper
 tools/blktap2/control/tap-ctl
 tools/firmware/etherboot/eb-roms.h
 tools/firmware/etherboot/gpxe-git-snapshot.tar.gz
diff --git a/.hgignore b/.hgignore
index 27d8f79..05304ea 100644
--- a/.hgignore
+++ b/.hgignore
@@ -180,9 +180,11 @@
 ^tools/libxl/_.*\.c$
 ^tools/libxl/libxlu_cfg_y\.output$
 ^tools/libxl/xl$
+^tools/libxl/libxl-save-helper$
 ^tools/libxl/testidl$
 ^tools/libxl/testidl\.c$
 ^tools/libxl/tmp\..*$
+^tools/libxl/.*\.new$
 ^tools/libvchan/vchan-node[12]$
 ^tools/libaio/src/.*\.ol$
 ^tools/libaio/src/.*\.os$
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 1d8b80a..ddc2624 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -67,25 +67,30 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
 			libxl_internal.o libxl_utils.o libxl_uuid.o \
 			libxl_json.o libxl_aoutils.o \
-			libxl_save_callout.o \
+			libxl_save_callout.o _libxl_save_msgs_callout.o \
 			libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
 $(LIBXL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) $(CFLAGS_libxenstore) $(CFLAGS_libblktapctl) -include $(XEN_ROOT)/tools/config.h
 
-AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h
+AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \
+	_libxl_save_msgs_callout.h _libxl_save_msgs_helper.h
 AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c
+AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c
 LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \
 	libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o
 $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
 
-CLIENTS = xl testidl
+CLIENTS = xl testidl libxl-save-helper
 
 XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxenlight)
 $(XL_OBJS): CFLAGS += -include $(XEN_ROOT)/tools/config.h # libxl_json.h needs it.
 
+SAVE_HELPER_OBJS = libxl_save_helper.o _libxl_save_msgs_helper.o
+$(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
+
 testidl.o: CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenlight)
 testidl.c: libxl_types.idl gentest.py libxl.h $(AUTOINCS)
 	$(PYTHON) gentest.py libxl_types.idl testidl.c.new
@@ -117,6 +122,12 @@ _libxl_list.h: $(XEN_INCLUDE)/xen-external/bsd-sys-queue-h-seddery $(XEN_INCLUDE
 	perl $^ --prefix=libxl >$@.new
 	$(call move-if-changed,$@.new,$@)
 
+_libxl_save_msgs_helper.c _libxl_save_msgs_callout.c \
+_libxl_save_msgs_helper.h _libxl_save_msgs_callout.h: \
+		libxl_save_msgs_gen.pl
+	$(PERL) -w $< $@ >$@.new
+	$(call move-if-changed,$@.new,$@)
+
 libxl.h: _libxl_types.h
 libxl_json.h: _libxl_types_json.h
 libxl_internal.h: _libxl_types_internal.h _paths.h
@@ -159,6 +170,9 @@ libxlutil.a: $(LIBXLU_OBJS)
 xl: $(XL_OBJS) libxlutil.so libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS)
 
+libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
+	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
+
 testidl: testidl.o libxlutil.so libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ testidl.o libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
@@ -169,7 +183,9 @@ install: all
 	$(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)
 	$(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR)
 	$(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR)
+	$(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
 	$(INSTALL_PROG) xl $(DESTDIR)$(SBINDIR)
+	$(INSTALL_PROG) libxl-save-helper $(DESTDIR)$(PRIVATE_BINDIR)
 	$(INSTALL_PROG) libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)
 	ln -sf libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)/libxenlight.so.$(MAJOR)
 	ln -sf libxenlight.so.$(MAJOR) $(DESTDIR)$(LIBDIR)/libxenlight.so
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9c3c671..7b92539 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -662,7 +662,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl_domain_build_info *const info = &d_config->b_info;
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
-    struct restore_callbacks *const callbacks = &dcs->callbacks;
+    libxl__srm_restore_autogen_callbacks *const callbacks =
+        &dcs->shs.callbacks.restore.a;
 
     if (rc) domcreate_rebuild_done(egc, dcs, rc);
 
@@ -702,7 +703,6 @@ static void domcreate_bootloader_done(libxl__egc *egc,
         pae = libxl_defbool_val(info->u.hvm.pae);
         no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
         callbacks->toolstack_restore = libxl__toolstack_restore;
-        callbacks->data = gc;
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         hvm = 0;
@@ -722,10 +722,24 @@ static void domcreate_bootloader_done(libxl__egc *egc,
     libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 }
 
-void libxl__xc_domain_restore_done(libxl__egc *egc,
-                                   libxl__domain_create_state *dcs,
+void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
+          unsigned long console_mfn, unsigned long genidad, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    STATE_AO_GC(dcs->ao);
+    libxl__domain_build_state *const state = &dcs->build_state;
+
+    state->store_mfn =            store_mfn;
+    state->console_mfn =          console_mfn;
+    state->vm_generationid_addr = genidad;
+    shs->need_results =           0;
+}
+
+void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                    int ret, int retval, int errnoval)
 {
+    libxl__domain_create_state *dcs = dcs_void;
     STATE_AO_GC(dcs->ao);
     libxl_ctx *ctx = libxl__gc_owner(gc);
     char **vments = NULL, **localents = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index c44dec0..0e0dbee 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -467,16 +467,20 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
 }
 
 int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
-        uint32_t size, void *data)
+                             uint32_t size, void *user)
 {
-    libxl__gc *gc = data;
-    libxl_ctx *ctx = gc->owner;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
+    STATE_AO_GC(dcs->ao);
+    libxl_ctx *ctx = CTX;
     int i, ret;
     const uint8_t *ptr = buf;
     uint32_t count = 0, version = 0;
     struct libxl__physmap_info* pi;
     char *xs_path;
 
+    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, size);
+
     if (size < sizeof(version) + sizeof(count)) {
         LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "wrong size");
         return -1;
@@ -529,9 +533,10 @@ static void domain_suspend_done(libxl__egc *egc,
 /*----- callbacks, called by xc_domain_save -----*/
 
 int libxl__domain_suspend_common_switch_qemu_logdirty
-                               (int domid, unsigned int enable, void *data)
+                               (int domid, unsigned enable, void *user)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
     char *path;
     bool rc;
@@ -597,9 +602,10 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
     return 0;
 }
 
-int libxl__domain_suspend_common_callback(void *data)
+int libxl__domain_suspend_common_callback(void *user)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = user;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
     unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
     int ret;
@@ -739,9 +745,9 @@ static inline char *save_helper(libxl__gc *gc, uint32_t domid,
 }
 
 int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
-        uint32_t *len, void *data)
+        uint32_t *len, void *dss_void)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__domain_suspend_state *dss = dss_void;
     STATE_AO_GC(dss->ao);
     int i = 0;
     char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL;
@@ -810,6 +816,8 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
         ptr += sizeof(struct libxl__physmap_info) + namelen;
     }
 
+    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, *len);
+
     return 0;
 }
 
@@ -823,7 +831,8 @@ static int libxl__remus_domain_suspend_callback(void *data)
 
 static int libxl__remus_domain_resume_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
     /* Resumes the domain and the device model */
@@ -836,7 +845,8 @@ static int libxl__remus_domain_resume_callback(void *data)
 
 static int libxl__remus_domain_checkpoint_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
     /* This would go into tailbuf. */
@@ -864,7 +874,8 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     const int live = dss->live;
     const int debug = dss->debug;
     const libxl_domain_remus_info *const r_info = dss->remus;
-    struct save_callbacks *const callbacks = &dss->callbacks;
+    libxl__srm_save_autogen_callbacks *const callbacks =
+        &dss->shs.callbacks.save.a;
 
     switch (type) {
     case LIBXL_DOMAIN_TYPE_HVM: {
@@ -925,8 +936,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
         callbacks->suspend = libxl__domain_suspend_common_callback;
 
     callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
-    callbacks->toolstack_save = libxl__toolstack_save;
-    callbacks->data = dss;
+    dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
 
     libxl__xc_domain_save(egc, dss, vm_generationid_addr);
     return;
@@ -935,10 +945,10 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     domain_suspend_done(egc, dss, rc);
 }
 
-void libxl__xc_domain_save_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss,
+void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
                                 int rc, int retval, int errnoval)
 {
+    libxl__domain_suspend_state *dss = dss_void;
     STATE_AO_GC(dss->ao);
 
     /* Convenience aliases */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7cf1b04..1a7b526 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -54,6 +54,7 @@
 
 #include "libxl.h"
 #include "_paths.h"
+#include "_libxl_save_msgs_callout.h"
 
 #if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
 #define _hidden __attribute__((visibility("hidden")))
@@ -1773,6 +1774,51 @@ _hidden void libxl__datacopier_kill(libxl__datacopier_state *dc);
 _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
 
 
+/*----- Save/restore helper (used by creation and suspend) -----*/
+
+typedef struct libxl__srm_save_callbacks {
+    libxl__srm_save_autogen_callbacks a;
+    int (*toolstack_save)(uint32_t domid, uint8_t **buf,
+                          uint32_t *len, void *data);
+} libxl__srm_save_callbacks;
+
+typedef struct libxl__srm_restore_callbacks {
+    libxl__srm_restore_autogen_callbacks a;
+} libxl__srm_restore_callbacks;
+
+/* a pointer to this struct is also passed as "user" to the
+ * save callout helper callback functions */
+typedef struct libxl__save_helper_state {
+    /* public, caller of run_helper initialises */
+    libxl__ao *ao;
+    uint32_t domid;
+    union {
+        libxl__srm_save_callbacks save;
+        libxl__srm_restore_callbacks restore;
+    } callbacks;
+    int (*recv_callback)(const unsigned char *msg, uint32_t len, void *user);
+    void (*completion_callback)(libxl__egc *egc, void *caller_state,
+                                int rc, int retval, int errnoval);
+    void *caller_state;
+    int need_results; /* set to 0 or 1 by caller of run_helper;
+                       * if set to 1 then the ultimate caller's
+                       * results function must set it to 0 */
+    /* private */
+    int rc;
+    int completed; /* retval/errnoval valid iff completed */
+    int retval, errnoval; /* from xc_domain_save / xc_domain_restore */
+    libxl__carefd *pipes[2]; /* 0 = helper's stdin, 1 = helper's stdout */
+    libxl__ev_fd readable;
+    libxl__ev_child child;
+    const char *stdin_what, *stdout_what;
+    FILE *toolstack_data_file;
+
+    libxl__egc *egc; /* valid only for duration of each event callback;
+                      * is here in this struct for the benefit of the
+                      * marshalling and xc callback functions */
+} libxl__save_helper_state;
+
+
 /*----- Domain suspend (save) state structure -----*/
 
 typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
@@ -1798,7 +1844,7 @@ struct libxl__domain_suspend_state {
     int xcflags;
     int guest_responded;
     int interval; /* checkpoint interval (for Remus) */
-    struct save_callbacks callbacks;
+    libxl__save_helper_state shs;
 };
 
 
@@ -1910,7 +1956,7 @@ struct libxl__domain_create_state {
     libxl__stub_dm_spawn_state dmss;
         /* If we're not doing stubdom, we use only dmss.dm,
          * for the non-stubdom device model. */
-    struct restore_callbacks callbacks;
+    libxl__save_helper_state shs;
 };
 
 /*----- Domain suspend (save) functions -----*/
@@ -1926,8 +1972,7 @@ _hidden void libxl__xc_domain_save(libxl__egc*, libxl__domain_suspend_state*,
 /* If rc==0 then retval is the return value from xc_domain_save
  * and errnoval is the errno value it provided.
  * If rc!=0, retval and errnoval are undefined. */
-_hidden void libxl__xc_domain_save_done(libxl__egc*,
-                                        libxl__domain_suspend_state*,
+_hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
                                         int rc, int retval, int errnoval);
 
 _hidden int libxl__domain_suspend_common_callback(void *data);
@@ -1945,8 +1990,7 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
 /* If rc==0 then retval is the return value from xc_domain_save
  * and errnoval is the errno value it provided.
  * If rc!=0, retval and errnoval are undefined. */
-_hidden void libxl__xc_domain_restore_done(libxl__egc *egc,
-                                           libxl__domain_create_state *dcs,
+_hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
                                            int rc, int retval, int errnoval);
 
 
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1b481ab..a6abcda 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -16,6 +16,30 @@
 
 #include "libxl_internal.h"
 
+/* stream_fd is as from the caller (eventually, the application).
+ * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
+ * The actual fd value is not included in the supplied argnums; rather
+ * it will be automatically supplied by run_helper as the 2nd argument.
+ *
+ * preserve_fds are fds that the caller is intending to pass to the
+ * helper so which need cloexec clearing.  They may not be 0, 1 or 2.
+ * An entry may be -1 in which case it will be ignored.
+ */
+static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
+                       const char *mode_arg,
+                       int stream_fd,
+                       const int *preserve_fds, int num_preserve_fds,
+                       const unsigned long *argnums, int num_argnums);
+
+static void helper_failed(libxl__egc*, libxl__save_helper_state *shs, int rc);
+static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
+                                   int fd, short events, short revents);
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+                          pid_t pid, int status);
+static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs);
+
+/*----- entrypoints -----*/
+
 void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
                               int hvm, int pae, int superpages,
                               int no_incr_generationid)
@@ -27,22 +51,337 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     const int restore_fd = dcs->restore_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
-    int r = xc_domain_restore(CTX->xch, restore_fd, domid,
-                              state->store_port, &state->store_mfn,
-                              state->store_domid, state->console_port,
-                              &state->console_mfn, state->console_domid,
-                              hvm, pae, superpages, no_incr_generationid,
-                              &state->vm_generationid_addr, &dcs->callbacks);
-    libxl__xc_domain_restore_done(egc, dcs, 0, r, errno);
+    unsigned cbflags = libxl__srm_callout_enumcallbacks_restore
+        (&dcs->shs.callbacks.restore.a);
+
+    const unsigned long argnums[] = {
+        domid,
+        state->store_port,
+        state->store_domid, state->console_port,
+        state->console_domid,
+        hvm, pae, superpages, no_incr_generationid,
+        cbflags,
+    };
+
+    dcs->shs.ao = ao;
+    dcs->shs.domid = domid;
+    dcs->shs.recv_callback = libxl__srm_callout_received_restore;
+    dcs->shs.completion_callback = libxl__xc_domain_restore_done;
+    dcs->shs.caller_state = dcs;
+    dcs->shs.need_results = 1;
+    dcs->shs.toolstack_data_file = 0;
+
+    run_helper(egc, &dcs->shs, "--restore-domain", restore_fd, 0,0,
+               argnums, ARRAY_SIZE(argnums));
 }
 
 void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
                            unsigned long vm_generationid_addr)
 {
     STATE_AO_GC(dss->ao);
-    int r;
+    int r, rc, toolstack_data_fd = -1;
+    uint32_t toolstack_data_len = 0;
+
+    /* Resources we need to free */
+    uint8_t *toolstack_data_buf = 0;
+
+    unsigned cbflags = libxl__srm_callout_enumcallbacks_save
+        (&dss->shs.callbacks.save.a);
+
+    if (dss->shs.callbacks.save.toolstack_save) {
+        r = dss->shs.callbacks.save.toolstack_save
+            (dss->domid, &toolstack_data_buf, &toolstack_data_len, dss);
+        if (r) { rc = ERROR_FAIL; goto out; }
+
+        dss->shs.toolstack_data_file = tmpfile();
+        if (!dss->shs.toolstack_data_file) {
+            LOGE(ERROR, "cannot create toolstack data tmpfile");
+            rc = ERROR_FAIL;
+            goto out;
+        }
+        toolstack_data_fd = fileno(dss->shs.toolstack_data_file);
+
+        r = libxl_write_exactly(CTX, toolstack_data_fd,
+                                toolstack_data_buf, toolstack_data_len,
+                                "toolstack data tmpfile", 0);
+        if (r) { rc = ERROR_FAIL; goto out; }
+    }
+
+    const unsigned long argnums[] = {
+        dss->domid, 0, 0, dss->xcflags, dss->hvm, vm_generationid_addr,
+        toolstack_data_fd, toolstack_data_len,
+        cbflags,
+    };
+
+    dss->shs.ao = ao;
+    dss->shs.domid = dss->domid;
+    dss->shs.recv_callback = libxl__srm_callout_received_save;
+    dss->shs.completion_callback = libxl__xc_domain_save_done;
+    dss->shs.caller_state = dss;
+    dss->shs.need_results = 0;
+
+    free(toolstack_data_buf);
+
+    run_helper(egc, &dss->shs, "--save-domain", dss->fd,
+               &toolstack_data_fd, 1,
+               argnums, ARRAY_SIZE(argnums));
+    return;
+
+ out:
+    free(toolstack_data_buf);
+    if (dss->shs.toolstack_data_file) fclose(dss->shs.toolstack_data_file);
+
+    libxl__xc_domain_save_done(egc, dss, rc, 0, 0);
+}
+
+
+/*----- helper execution -----*/
+
+static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
+                       const char *mode_arg, int stream_fd,
+                       const int *preserve_fds, int num_preserve_fds,
+                       const unsigned long *argnums, int num_argnums)
+{
+    STATE_AO_GC(shs->ao);
+    const char *args[4 + num_argnums];
+    const char **arg = args;
+    int i, rc;
+
+    /* Resources we must free */
+    libxl__carefd *childs_pipes[2] = { 0,0 };
+
+    /* Convenience aliases */
+    const uint32_t domid = shs->domid;
+
+    shs->rc = 0;
+    shs->completed = 0;
+    shs->pipes[0] = shs->pipes[1] = 0;
+    libxl__ev_fd_init(&shs->readable);
+    libxl__ev_child_init(&shs->child);
+
+    shs->stdin_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
+                                " stdin pipe", domid);
+    shs->stdout_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
+                                 " stdout pipe", domid);
+
+    *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC "/" "libxl-save-helper";
+    *arg++ = mode_arg;
+    const char **stream_fd_arg = arg++;
+    for (i=0; i<num_argnums; i++)
+        *arg++ = GCSPRINTF("%lu", argnums[i]);
+    *arg++ = 0;
+    assert(arg == args + ARRAY_SIZE(args));
+
+    libxl__carefd_begin();
+    int childfd;
+    for (childfd=0; childfd<2; childfd++) {
+        /* Setting up the pipe for the child's fd childfd */
+        int fds[2];
+        if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; }
+        int childs_end = childfd==0 ? 0 /*read*/  : 1 /*write*/;
+        int our_end    = childfd==0 ? 1 /*write*/ : 0 /*read*/;
+        childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]);
+        shs->pipes[childfd] =   libxl__carefd_record(CTX, fds[our_end]);
+    }
+    libxl__carefd_unlock();
+
+    pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
+    if (!pid) {
+        if (stream_fd <= 2) {
+            stream_fd = dup(stream_fd);
+            if (stream_fd < 0) {
+                LOGE(ERROR,"dup migration stream fd");
+                exit(-1);
+            }
+        }
+        libxl_fd_set_cloexec(CTX, stream_fd, 0);
+        *stream_fd_arg = GCSPRINTF("%d", stream_fd);
+
+        for (i=0; i<num_preserve_fds; i++)
+            if (preserve_fds[i] >= 0) {
+                assert(preserve_fds[i] > 2);
+                libxl_fd_set_cloexec(CTX, preserve_fds[i], 0);
+            }
+
+        libxl__exec(gc,
+                    libxl__carefd_fd(childs_pipes[0]),
+                    libxl__carefd_fd(childs_pipes[1]),
+                    -1,
+                    args[0], (char**)args, 0);
+    }
+
+    libxl__carefd_close(childs_pipes[0]);
+    libxl__carefd_close(childs_pipes[1]);
+
+    rc = libxl__ev_fd_register(gc, &shs->readable, helper_stdout_readable,
+                               libxl__carefd_fd(shs->pipes[1]), POLLIN|POLLPRI);
+    if (rc) goto out;
+    return;
+
+ out:
+    libxl__carefd_close(childs_pipes[0]);
+    libxl__carefd_close(childs_pipes[1]);
+    helper_failed(egc, shs, rc);;
+}
+
+static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
+                          int rc)
+{
+    STATE_AO_GC(shs->ao);
+
+    if (!shs->rc)
+        shs->rc = rc;
+
+    libxl__ev_fd_deregister(gc, &shs->readable);
+
+    if (!libxl__ev_child_inuse(&shs->child)) {
+        helper_done(egc, shs);
+        return;
+    }
+
+    int r = kill(shs->child.pid, SIGKILL);
+    if (r) LOGE(WARN, "failed to kill save/restore helper [%lu]",
+                (unsigned long)shs->child.pid);
+}
+
+static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
+                                   int fd, short events, short revents)
+{
+    libxl__save_helper_state *shs = CONTAINER_OF(ev, *shs, readable);
+    STATE_AO_GC(shs->ao);
+    int rc, errnoval;
+
+    if (revents & (POLLERR|POLLPRI)) {
+        LOG(ERROR, "%s signaled POLLERR|POLLPRI (%#x)",
+            shs->stdout_what, revents);
+        rc = ERROR_FAIL;
+ out:
+        /* this is here because otherwise we bypass the decl of msg[] */
+        helper_failed(egc, shs, rc);
+        return;
+    }
+
+    uint16_t msglen;
+    errnoval = libxl_read_exactly(CTX, fd, &msglen, sizeof(msglen),
+                                  shs->stdout_what, "ipc msg header");
+    if (errnoval) { rc = ERROR_FAIL; goto out; }
+
+    unsigned char msg[msglen];
+    errnoval = libxl_read_exactly(CTX, fd, msg, msglen,
+                                  shs->stdout_what, "ipc msg body");
+    if (errnoval) { rc = ERROR_FAIL; goto out; }
+
+    shs->egc = egc;
+    shs->recv_callback(msg, msglen, shs);
+    shs->egc = 0;
+    return;
+}
+
+static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
+                          pid_t pid, int status)
+{
+    libxl__save_helper_state *shs = CONTAINER_OF(ch, *shs, child);
+    STATE_AO_GC(shs->ao);
+
+    /* Convenience aliases */
+    const uint32_t domid = shs->domid;
+
+    const char *what =
+        GCSPRINTF("domain %"PRIu32" save/restore helper", domid);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, XTL_ERROR, what, pid, status);
+        shs->rc = ERROR_FAIL;
+    }
+
+    if (shs->need_results) {
+        if (!shs->rc)
+            LOG(ERROR,"%s exited without providing results",what);
+        shs->rc = ERROR_FAIL;
+    }
+
+    if (!shs->completed) {
+        if (!shs->rc)
+            LOG(ERROR,"%s exited without signaling completion",what);
+        shs->rc = ERROR_FAIL;
+    }
+
+    helper_done(egc, shs);
+    return;
+}
+
+static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs)
+{
+    STATE_AO_GC(shs->ao);
+
+    libxl__ev_fd_deregister(gc, &shs->readable);
+    libxl__carefd_close(shs->pipes[0]);  shs->pipes[0] = 0;
+    libxl__carefd_close(shs->pipes[1]);  shs->pipes[1] = 0;
+    assert(!libxl__ev_child_inuse(&shs->child));
+    if (shs->toolstack_data_file) fclose(shs->toolstack_data_file);
+
+    shs->egc = egc;
+    shs->completion_callback(egc, shs->caller_state,
+                             shs->rc, shs->retval, shs->errnoval);
+    shs->egc = 0;
+}
+
+/*----- generic helpers for the autogenerated code -----*/
+
+const libxl__srm_save_autogen_callbacks*
+libxl__srm_callout_get_callbacks_save(void *user)
+{
+    libxl__save_helper_state *shs = user;
+    return &shs->callbacks.save.a;
+}
+
+const libxl__srm_restore_autogen_callbacks*
+libxl__srm_callout_get_callbacks_restore(void *user)
+{
+    libxl__save_helper_state *shs = user;
+    return &shs->callbacks.restore.a;
+}
+
+void libxl__srm_callout_sendreply(int r, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__egc *egc = shs->egc;
+    STATE_AO_GC(shs->ao);
+    int errnoval;
+
+    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
+                                   &r, sizeof(r), shs->stdin_what,
+                                   "callback return value");
+    if (errnoval)
+        helper_failed(egc, shs, ERROR_FAIL);
+}
+
+void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
+                  const char *context, const char *formatted, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
+    xtl_log(CTX->lg, level, errnoval, context, "%s", formatted);
+}
+
+void libxl__srm_callout_callback_progress(const char *context,
+                   const char *doing_what, unsigned long done,
+                   unsigned long total, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
+    xtl_progress(CTX->lg, context, doing_what, done, total);
+}
+
+int libxl__srm_callout_callback_complete(int retval, int errnoval,
+                                         void *user)
+{
+    libxl__save_helper_state *shs = user;
+    STATE_AO_GC(shs->ao);
 
-    r = xc_domain_save(CTX->xch, dss->fd, dss->domid, 0, 0, dss->xcflags,
-                       &dss->callbacks, dss->hvm, vm_generationid_addr);
-    libxl__xc_domain_save_done(egc, dss, 0, r, errno);
+    shs->completed = 1;
+    shs->retval = retval;
+    shs->errnoval = errnoval;
+    libxl__ev_fd_deregister(gc, &shs->readable);
+    return 0;
 }
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
new file mode 100644
index 0000000..772251a
--- /dev/null
+++ b/tools/libxl/libxl_save_helper.c
@@ -0,0 +1,283 @@
+/*
+ * Copyright (C) 2012      Citrix Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+/*
+ * The libxl-save-helper utility speaks a protocol to its caller for
+ * the callbacks.  The protocol is as follows.
+ *
+ * The helper talks on stdin and stdout, in binary in machine
+ * endianness.  The helper speaks first, and only when it has a
+ * callback to make.  It writes a 16-bit number being the message
+ * length, and then the message body.
+ *
+ * Each message starts with a 16-bit number indicating which of the
+ * messages it is, and then some arguments in a binary marshalled form.
+ * If the callback does not need a reply (it returns void), the helper
+ * just continues.  Otherwise the helper waits for its caller to send a
+ * single int which is to be the return value from the callback.
+ *
+ * Where feasible the stubs and callbacks have prototypes identical to
+ * those required by xc_domain_save and xc_domain_restore, so that the
+ * autogenerated functions can be used/provided directly.
+ *
+ * The actual messages are in the array @msgs in libxl_save_msgs_gen.pl
+ */
+
+#include "libxl_osdeps.h"
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <assert.h>
+#include <inttypes.h>
+
+#include "libxl.h"
+
+#include "xenctrl.h"
+#include "xenguest.h"
+#include "_libxl_save_msgs_helper.h"
+
+/*----- globals -----*/
+
+static const char *program = "libxl-save-helper";
+static xentoollog_logger *logger;
+static xc_interface *xch;
+
+/*----- error handling -----*/
+
+static void fail(int errnoval, const char *fmt, ...)
+    __attribute__((noreturn,format(printf,2,3)));
+static void fail(int errnoval, const char *fmt, ...)
+{
+    va_list al;
+    va_start(al,fmt);
+    xtl_logv(logger,XTL_ERROR,errnoval,program,fmt,al);
+    exit(-1);
+}
+
+static int read_exactly(int fd, void *buf, size_t len)
+/* returns 0 if we get eof, even if we got it midway through; 1 if ok */
+{
+    while (len) {
+        ssize_t r = read(fd, buf, len);
+        if (r<=0) return r;
+        assert(r <= len);
+        len -= r;
+        buf = (char*)buf + r;
+    }
+    return 1;
+}
+
+static void *xmalloc(size_t sz)
+{
+    if (!sz) return 0;
+    void *r = malloc(sz);
+    if (!r) { perror("memory allocation failed"); exit(-1); }
+    return r;
+}
+
+/*----- logger -----*/
+
+typedef struct {
+    xentoollog_logger vtable;
+} xentoollog_logger_tellparent;
+
+static void tellparent_vmessage(xentoollog_logger *logger_in,
+                                xentoollog_level level,
+                                int errnoval,
+                                const char *context,
+                                const char *format,
+                                va_list al)
+{
+    char *formatted;
+    int r = vasprintf(&formatted, format, al);
+    if (r < 0) { perror("memory allocation failed during logging"); exit(-1); }
+    helper_stub_log(level, errnoval, context, formatted, 0);
+    free(formatted);
+}
+
+static void tellparent_progress(struct xentoollog_logger *logger_in,
+                                const char *context,
+                                const char *doing_what, int percent,
+                                unsigned long done, unsigned long total)
+{
+    helper_stub_progress(context, doing_what, done, total, 0);
+}
+
+static void tellparent_destroy(struct xentoollog_logger *logger_in)
+{
+    abort();
+}
+
+static xentoollog_logger_tellparent *createlogger_tellparent(void)
+{
+    xentoollog_logger_tellparent newlogger;
+    return XTL_NEW_LOGGER(tellparent, newlogger);
+}
+
+/*----- helper functions called by autogenerated stubs -----*/
+
+unsigned char * helper_allocbuf(int len, void *user)
+{
+    return xmalloc(len);
+}
+
+static void transmit(const unsigned char *msg, int len, void *user)
+{
+    while (len) {
+        int r = write(1, msg, len);
+        if (r<0) { perror("write"); exit(-1); }
+        assert(r >= 0);
+        assert(r <= len);
+        len -= r;
+        msg += r;
+    }
+}
+
+void helper_transmitmsg(unsigned char *msg_freed, int len_in, void *user)
+{
+    assert(len_in < 64*1024);
+    uint16_t len = len_in;
+    transmit((const void*)&len, sizeof(len), user);
+    transmit(msg_freed, len, user);
+    free(msg_freed);
+}
+
+int helper_getreply(void *user)
+{
+    int v;
+    int r = read_exactly(0, &v, sizeof(v));
+    if (r<=0) exit(-2);
+    return v;
+}
+
+/*----- other callbacks -----*/
+
+static int toolstack_save_fd;
+static uint32_t toolstack_save_len;
+
+static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
+                             uint32_t *len, void *data)
+{
+    assert(toolstack_save_fd > 0);
+
+    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
+    if (r) fail(errno,"rewind toolstack data tmpfile");
+
+    *buf = xmalloc(toolstack_save_len);
+    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
+    if (r<0) fail(errno,"read toolstack data");
+    if (r==0) fail(0,"read toolstack data eof");
+
+    *len = toolstack_save_len;
+    return 0;
+}
+
+static void startup(const char *op) {
+    logger = (xentoollog_logger*)createlogger_tellparent();
+    if (!logger) {
+        fprintf(stderr, "%s: cannot initialise logger\n", program);
+        exit(-1);
+    }
+
+    xtl_log(logger,XTL_DEBUG,0,program,"starting %s",op);
+
+    xch = xc_interface_open(logger,logger,0);
+    if (!xch) fail(errno,"xc_interface_open failed");
+}
+
+static void complete(int retval) {
+    int errnoval = retval ? errno : 0; /* suppress irrelevant errnos */
+    xtl_log(logger,XTL_DEBUG,errnoval,program,"complete r=%d",retval);
+    helper_stub_complete(retval,errnoval,0);
+    exit(0);
+}
+
+static struct save_callbacks helper_save_callbacks;
+static struct restore_callbacks helper_restore_callbacks;
+
+int main(int argc, char **argv)
+{
+    int r;
+
+#define NEXTARG (++argv, assert(*argv), *argv)
+
+    const char *mode = *++argv;
+    assert(mode);
+
+    if (!strcmp(mode,"--save-domain")) {
+
+        int io_fd =                atoi(NEXTARG);
+        uint32_t dom =             strtoul(NEXTARG,0,10);
+        uint32_t max_iters =       strtoul(NEXTARG,0,10);
+        uint32_t max_factor =      strtoul(NEXTARG,0,10);
+        uint32_t flags =           strtoul(NEXTARG,0,10);
+        int hvm =                  atoi(NEXTARG);
+        unsigned long genidad =    strtoul(NEXTARG,0,10);
+        toolstack_save_fd  =       atoi(NEXTARG);
+        toolstack_save_len =       strtoul(NEXTARG,0,10);
+        unsigned cbflags =         strtoul(NEXTARG,0,10);
+        assert(!*++argv);
+
+        if (toolstack_save_fd >= 0)
+            helper_save_callbacks.toolstack_save = toolstack_save_cb;
+
+        helper_setcallbacks_save(&helper_save_callbacks, cbflags);
+
+        startup("save");
+        r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
+                           &helper_save_callbacks, hvm, genidad);
+        complete(r);
+
+    } else if (!strcmp(mode,"--restore-domain")) {
+
+        int io_fd =                atoi(NEXTARG);
+        uint32_t dom =             strtoul(NEXTARG,0,10);
+        unsigned store_evtchn =    strtoul(NEXTARG,0,10);
+        domid_t store_domid =      strtoul(NEXTARG,0,10);
+        unsigned console_evtchn =  strtoul(NEXTARG,0,10);
+        domid_t console_domid =    strtoul(NEXTARG,0,10);
+        unsigned int hvm =         strtoul(NEXTARG,0,10);
+        unsigned int pae =         strtoul(NEXTARG,0,10);
+        int superpages =           strtoul(NEXTARG,0,10);
+        int no_incr_genidad =      strtoul(NEXTARG,0,10);
+        unsigned cbflags =         strtoul(NEXTARG,0,10);
+        assert(!*++argv);
+
+        helper_setcallbacks_restore(&helper_restore_callbacks, cbflags);
+
+        unsigned long store_mfn = 0;
+        unsigned long console_mfn = 0;
+        unsigned long genidad = 0;
+
+        startup("restore");
+        r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn,
+                              store_domid, console_evtchn, &console_mfn,
+                              console_domid, hvm, pae, superpages,
+                              no_incr_genidad, &genidad,
+                              &helper_restore_callbacks);
+        helper_stub_restore_results(store_mfn,console_mfn,genidad,0);
+        complete(r);
+
+    } else {
+        assert(!"unexpected mode argument");
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
new file mode 100755
index 0000000..c45986e
--- /dev/null
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -0,0 +1,397 @@
+#!/usr/bin/perl -w
+
+use warnings;
+use strict;
+use POSIX;
+
+our $debug = 0; # produce copious debugging output at run-time?
+
+our @msgs = (
+    # flags:
+    #   s  - applicable to save
+    #   r  - applicable to restore
+    #   c  - function pointer in callbacks struct rather than fixed function
+    #   x  - function pointer is in struct {save,restore}_callbacks
+    #         and its null-ness needs to be passed through to the helper's xc
+    #   W  - needs a return value; callback is synchronous
+    [  1, 'sr',     "log",                   [qw(uint32_t level
+                                                 uint32_t errnoval
+                                                 STRING context
+                                                 STRING formatted)] ],
+    [  2, 'sr',     "progress",              [qw(STRING context
+                                                 STRING doing_what),
+                                                'unsigned long', 'done',
+                                                'unsigned long', 'total'] ],
+    [  3, 'scxW',   "suspend", [] ],         
+    [  4, 'scxW',   "postcopy", [] ],        
+    [  5, 'scxW',   "checkpoint", [] ],      
+    [  6, 'scxW',   "switch_qemu_logdirty",  [qw(int domid
+                                              unsigned enable)] ],
+    #                toolstack_save          done entirely `by hand'
+    [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
+                                                BLOCK tsdata)] ],
+    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
+                                              'unsigned long', 'console_mfn',
+                                              'unsigned long', 'genidad'] ],
+    [  9, 'srW',    "complete",              [qw(int retval
+                                                 int errnoval)] ],
+);
+
+#----------------------------------------
+
+our %cbs;
+our %func;
+our %func_ah;
+our @outfuncs;
+our %out_decls;
+our %out_body;
+our %msgnum_used;
+
+die unless @ARGV==1;
+die if $ARGV[0] =~ m/^-/;
+
+our ($intendedout) = @ARGV;
+
+$intendedout =~ m/([a-z]+)\.([ch])$/ or die;
+my ($want_ah, $ch) = ($1, $2);
+
+my $declprefix = '';
+
+foreach my $ah (qw(callout helper)) {
+    $out_body{$ah} .=
+        <<END_BOTH.($ah eq 'callout' ? <<END_CALLOUT : <<END_HELPER);
+#include "libxl_osdeps.h"
+
+#include <assert.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+END_BOTH
+
+#include "libxl_internal.h"
+
+END_CALLOUT
+
+#include "_libxl_save_msgs_${ah}.h"
+#include <xenctrl.h>
+#include <xenguest.h>
+
+END_HELPER
+}
+
+die $want_ah unless defined $out_body{$want_ah};
+
+sub f_decl ($$$$) {
+    my ($name, $ah, $c_rtype, $c_decl) = @_;
+    $out_decls{$name} = "${declprefix}$c_rtype $name$c_decl;\n";
+    $func{$name} = "$c_rtype $name$c_decl\n{\n" . ($func{$name} || '');
+    $func_ah{$name} = $ah;
+}
+
+sub f_more ($$) {
+    my ($name, $addbody) = @_;
+    $func{$name} ||= '';
+    $func{$name} .= $addbody;
+    push @outfuncs, $name;
+}
+
+our $libxl = "libxl__srm";
+our $callback = "${libxl}_callout_callback";
+our $receiveds = "${libxl}_callout_received";
+our $sendreply = "${libxl}_callout_sendreply";
+our $getcallbacks = "${libxl}_callout_get_callbacks";
+our $enumcallbacks = "${libxl}_callout_enumcallbacks";
+sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
+
+f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
+
+our $helper = "helper";
+our $encode = "${helper}_stub";
+our $allocbuf = "${helper}_allocbuf";
+our $transmit = "${helper}_transmitmsg";
+our $getreply = "${helper}_getreply";
+our $setcallbacks = "${helper}_setcallbacks";
+
+f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
+f_decl($transmit, 'helper', 'void',
+       '(unsigned char *msg_freed, int len, void *user)');
+f_decl($getreply, 'helper', 'int', '(void *user)');
+
+sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
+
+$out_body{'callout'} .= <<END;
+static int bytes_get(const unsigned char **msg,
+		     const unsigned char *const endmsg,
+		     void *result, int rlen)
+{
+    if (endmsg - *msg < rlen) return 0;
+    memcpy(result,*msg,rlen);
+    *msg += rlen;
+    return 1;
+}
+
+END
+$out_body{'helper'} .= <<END;
+static void bytes_put(unsigned char *const buf, int *len,
+		      const void *value, int vlen)
+{
+    assert(vlen < INT_MAX/2 - *len);
+    if (buf)
+	memcpy(buf + *len, value, vlen);
+    *len += vlen;
+}
+
+END
+
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long') {
+    my $typeid = typeid($simpletype);
+    $out_body{'callout'} .= <<END;
+static int ${typeid}_get(const unsigned char **msg,
+                        const unsigned char *const endmsg,
+                        $simpletype *result)
+{
+    return bytes_get(msg, endmsg, result, sizeof(*result));
+}
+
+END
+    $out_body{'helper'} .= <<END;
+static void ${typeid}_put(unsigned char *const buf, int *len,
+			 const $simpletype value)
+{
+    bytes_put(buf, len, &value, sizeof(value));
+}
+
+END
+}
+
+$out_body{'callout'} .= <<END;
+static int BLOCK_get(const unsigned char **msg,
+                      const unsigned char *const endmsg,
+                      const uint8_t **result, uint32_t *result_size)
+{
+    if (!uint32_t_get(msg,endmsg,result_size)) return 0;
+    if (endmsg - *msg < *result_size) return 0;
+    *result = (const void*)*msg;
+    *msg += *result_size;
+    return 1;
+}
+
+static int STRING_get(const unsigned char **msg,
+                      const unsigned char *const endmsg,
+                      const char **result)
+{
+    const uint8_t *data;
+    uint32_t datalen;
+    if (!BLOCK_get(msg,endmsg,&data,&datalen)) return 0;
+    if (datalen == 0) return 0;
+    if (data[datalen-1] != '\\0') return 0;
+    *result = (const void*)data;
+    return 1;
+}
+
+END
+$out_body{'helper'} .= <<END;
+static void BLOCK_put(unsigned char *const buf,
+                      int *len,
+		      const uint8_t *bytes, uint32_t size)
+{
+    uint32_t_put(buf, len, size);
+    bytes_put(buf, len, bytes, size);
+}
+    
+static void STRING_put(unsigned char *const buf,
+		       int *len,
+		       const char *string)
+{
+    size_t slen = strlen(string);
+    assert(slen < INT_MAX / 4);
+    assert(slen < (uint32_t)0x40000000);
+    BLOCK_put(buf, len, (const void*)string, slen+1);
+}
+    
+END
+
+foreach my $sr (qw(save restore)) {
+    f_decl("${getcallbacks}_${sr}", 'callout',
+           "const ".cbtype($sr)." *",
+           "(void *data)");
+
+    f_decl("${receiveds}_${sr}", 'callout', 'int',
+	   "(const unsigned char *msg, uint32_t len, void *user)");
+
+    f_decl("${enumcallbacks}_${sr}", 'callout', 'unsigned',
+           "(const ".cbtype($sr)." *cbs)");
+    f_more("${enumcallbacks}_${sr}", "    unsigned cbflags = 0;\n");
+
+    f_decl("${setcallbacks}_${sr}", 'helper', 'void',
+           "(struct ${sr}_callbacks *cbs, unsigned cbflags)");
+
+    f_more("${receiveds}_${sr}",
+           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
+    const unsigned char *const endmsg = msg + len;
+    uint16_t mtype;
+    if (!uint16_t_get(&msg,endmsg,&mtype)) return 0;
+END_ALWAYS
+    fprintf(stderr,"libxl callout receiver: got len=%u mtype=%u\\n",len,mtype);
+END_DEBUG
+    switch (mtype) {
+
+END_ALWAYS
+
+    $cbs{$sr} = "typedef struct ".cbtype($sr)." {\n";
+}
+
+foreach my $msginfo (@msgs) {
+    my ($msgnum, $flags, $name, $args) = @$msginfo;
+    die if $msgnum_used{$msgnum}++;
+
+    my $f_more_sr = sub {
+        my ($contents_spec, $fnamebase) = @_;
+        $fnamebase ||= "${receiveds}";
+        foreach my $sr (qw(save restore)) {
+            $sr =~ m/^./;
+            next unless $flags =~ m/$&/;
+            my $contents = (!ref $contents_spec) ? $contents_spec :
+                $contents_spec->($sr);
+            f_more("${fnamebase}_${sr}", $contents);
+        }
+    };
+
+    $f_more_sr->("    case $msgnum: { /* $name */\n");
+    if ($flags =~ m/W/) {
+        $f_more_sr->("        int r;\n");
+    }
+
+    my $c_rtype_helper = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
+    my $c_decl = '(';
+    my $c_callback_args = '';
+
+    f_more("${encode}_$name",
+           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
+    unsigned char *buf = 0;
+    int len = 0, allocd = 0;
+
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: encoding $name\\n");
+END_DEBUG
+    for (;;) {
+        uint16_t_put(buf, &len, $msgnum /* $name */);
+END_ALWAYS
+
+    my @args = @$args;
+    my $c_recv = '';
+    my ($argtype, $arg);
+    while (($argtype, $arg, @args) = @args) {
+	my $typeid = typeid($argtype);
+        my $c_args = "$arg";
+        my $c_get_args = "&$arg";
+	if ($argtype eq 'STRING') {
+	    $c_decl .= "const char *$arg, ";
+	    $f_more_sr->("        const char *$arg;\n");
+        } elsif ($argtype eq 'BLOCK') {
+            $c_decl .= "const uint8_t *$arg, uint32_t ${arg}_size, ";
+            $c_args .= ", ${arg}_size";
+            $c_get_args .= ",&${arg}_size";
+	    $f_more_sr->("        const uint8_t *$arg;\n".
+                         "        uint32_t ${arg}_size;\n");
+	} else {
+	    $c_decl .= "$argtype $arg, ";
+	    $f_more_sr->("        $argtype $arg;\n");
+	}
+	$c_callback_args .= "$c_args, ";
+	$c_recv.=
+            "        if (!${typeid}_get(&msg,endmsg,$c_get_args)) return 0;\n";
+        f_more("${encode}_$name", "	${typeid}_put(buf, &len, $c_args);\n");
+    }
+    $f_more_sr->($c_recv);
+    $c_decl .= "void *user)";
+    $c_callback_args .= "user";
+
+    $f_more_sr->("        if (msg != endmsg) return 0;\n");
+
+    my $c_callback;
+    if ($flags !~ m/c/) {
+        $c_callback = "${callback}_$name";
+    } else {
+        $f_more_sr->(sub {
+            my ($sr) = @_;
+            $cbs{$sr} .= "    $c_rtype_callout (*${name})$c_decl;\n";
+            return
+          "        const ".cbtype($sr)." *const cbs =\n".
+            "            ${getcallbacks}_${sr}(user);\n";
+                       });
+        $c_callback = "cbs->${name}";
+    }
+    my $c_make_callback = "$c_callback($c_callback_args)";
+    if ($flags !~ m/W/) {
+	$f_more_sr->("        $c_make_callback;\n");
+    } else {
+        $f_more_sr->("        r = $c_make_callback;\n".
+                     "        $sendreply(r, user);\n");
+	f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
+    }
+    if ($flags =~ m/x/) {
+        my $c_v = "(1u<<$msgnum)";
+        my $c_cb = "cbs->$name";
+        $f_more_sr->("    if ($c_cb) cbflags |= $c_v;\n", $enumcallbacks);
+        $f_more_sr->("    $c_cb = (cbflags & $c_v) ? ${encode}_${name} : 0;\n",
+                     $setcallbacks);
+    }
+    $f_more_sr->("        return 1;\n    }\n\n");
+    f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
+    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
+    f_more("${encode}_$name",
+"        if (buf) break;
+        buf = ${helper}_allocbuf(len, user);
+        assert(buf);
+        allocd = len;
+        len = 0;
+    }
+    assert(len == allocd);
+    ${transmit}(buf, len, user);
+");
+    if ($flags =~ m/W/) {
+	f_more("${encode}_$name",
+               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
+    int r = ${helper}_getreply(user);
+END_ALWAYS
+    fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
+END_DEBUG
+    return r;
+END_ALWAYS
+    }
+}
+
+print "/* AUTOGENERATED by $0 DO NOT EDIT */\n\n" or die $!;
+
+foreach my $sr (qw(save restore)) {
+    f_more("${enumcallbacks}_${sr}",
+           "    return cbflags;\n");
+    f_more("${receiveds}_${sr}",
+           "    default:\n".
+           "        return 0;\n".
+           "    }");
+    $cbs{$sr} .= "} ".cbtype($sr).";\n\n";
+    if ($ch eq 'h') {
+        print $cbs{$sr} or die $!;
+        print "struct ${sr}_callbacks;\n";
+    }
+}
+
+if ($ch eq 'c') {
+    foreach my $name (@outfuncs) {
+        next unless defined $func{$name};
+        $func{$name} .= "}\n\n";
+        $out_body{$func_ah{$name}} .= $func{$name};
+        delete $func{$name};
+    }
+    print $out_body{$want_ah} or die $!;
+} else {
+    foreach my $name (sort keys %out_decls) {
+        next unless $func_ah{$name} eq $want_ah;
+        print $out_decls{$name} or die $!;
+    }
+}
+
+close STDOUT or die $!;
-- 
tg: (52b6131..) t/xen/xc.save-restore-protocol (depends on: t/xen/xl.ao.suspend.pre)

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
@ 2012-06-28 13:50   ` Ian Campbell
  2012-06-28 14:24     ` Ian Jackson
  2012-06-28 17:45   ` Ian Jackson
  1 sibling, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2012-06-28 13:50 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Shriram Rajagopalan, xen-devel

On Thu, 2012-06-28 at 14:38 +0100, Ian Jackson wrote:
> I wrote:
> > This is v5 of my series to asyncify save/restore, rebased to tip and
> > retested.  There are minor changes to 3 patches, as discussed on-list,
> > marked with "*" below:
> ...
> >   * 06/21 libxl: domain save/restore: run in a separate process
> ...
> > However, first I will invite Shriram to check that Remus is still
> > working.  (I can't conveniently do this with this message due to
> > shoddiness in git-send-email.)
> 
> Following testing by Shriram (thanks) I have an updated version of
> 06/21.  For the sake of everyone's sanity (and your MUAs) I shan't
> repost the whole series.
> 
> Here is v6 of 06/21, which is simply the previous one with my earlier
> fixup patch folded in.
> 
> CC Ian Campbell since he'd acked the previous one.  Ian, I have left
> your ack on this version.  I trust that's OK.

Absolutely fine.

Does this mean this series is now ready to go in?

I did wonder when I saw the incremental patch if some of those internal
callback pointers could perhaps be properly typed instead of void
(because they all end up taking the same pointer type), but lets not
worry about that here.

Ian.

> 
> Thanks,
> Ian.
> 
> 
> From: Ian Jackson <ian.jackson@eu.citrix.com>
> Subject: [PATCH] libxl: domain save/restore: run in a separate process
> 
> libxenctrl expects to be able to simply run the save or restore
> operation synchronously.  This won't work well in a process which is
> trying to handle multiple domains.
> 
> The options are:
> 
>  - Block such a whole process (eg, the whole of libvirt) while
>    migration completes (or until it fails).
> 
>  - Create a thread to run xc_domain_save and xc_domain_restore on.
>    This is quite unpalatable.  Multithreaded programming is error
>    prone enough without generating threads in libraries, particularly
>    if the thread does some very complex operation.
> 
>  - Fork and run the operation in the child without execing.  This is
>    no good because we would need to negotiate with the caller about
>    fds we would inherit (and we might be a very large process).
> 
>  - Fork and exec a helper.
> 
> Of these options the latter is the most palatable.
> 
> Consequently:
> 
>  * A new helper program libxl-save-helper (which does both save and
>    restore).  It will be installed in /usr/lib/xen/bin.  It does not
>    link against libxl, only libxc, and its error handling does not
>    need to be very advanced.  It does contain a plumbing through of
>    the logging interface into the callback stream.
> 
>  * A small ad-hoc protocol between the helper and libxl which allows
>    log messages and the libxc callbacks to be passed up and down.
>    Protocol doc comment is in libxl_save_helper.c.
> 
>  * To avoid a lot of tedium the marshalling boilerplate (stubs for the
>    helper and the callback decoder for libxl) is generated with a
>    small perl script.
> 
>  * Implement new functionality to spawn the helper, monitor its
>    output, provide responses, and check on its exit status.
> 
>  * The functions libxl__xc_domain_restore_done and
>    libxl__xc_domain_save_done now turn out to want be called in the
>    same place.  So make their state argument a void* so that the two
>    functions are type compatible.
> 
> The domain save path still writes the qemu savefile synchronously.
> This will need to be fixed in a subsequent patch.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> -
> Changes in v6:
>  * The void* passed to the callback was being treated as a
>    libxl__domain_suspend_state* by the remus callbacks; this is a
>    holdover from a much earlier version of the series.  It is now
>    properly converted to a libxl__save_helper_state and then the dss
>    extracted with CONTAINER_OF.
>  * The way remus works means that the toolstack save callback is
>    invoked more than once, which the helper's implementation was not
>    prepared to deal with.  Fix this by moving the rewind of the fd
>    into the helper.
> 
> Changes in v5:
>  * assert that preserve_fds are >2.
> 
> Changes in v4:
>  * Migration stream fd is handled specially by the run_helper
>    function, rather than simply being a numarg.  Specifically:
>      - dup it to a safe fd number if necessary.
>      - clear cloexec flag fd before execing helper
>  * Toolstack data fd argument to run_helper replaced with
>    generic preserve_fds array, which get cloexec cleared.
>  * libxl__xc_domain_save uses supplied callback function pointer,
>    rather than calling libxl__toolstack_save directly;
>    toolstack data save callback is only supplied to libxc if
>    in-libxl caller supplied a callback.
>  * libxl-save-helper is not needlessly linked against libxl.
>  * Code which prepares pipes for helper clarified.
>  * Deal properly with, and log properly, POLLPRI/POLLERR on
>    pipe to save helper.
>  * Spelling fix in perl script comment.
>  * In message generator, use better names for the ends of serial
>    conditional here documents.
>  * Makefile does $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
> 
> Changes in v3:
>  * Suppress errno value in debug message when helper reports successful
>    completion.
>  * Significant consequential changes to cope with changes to
>    earlier patches in the series.
> 
> Changes in v2:
>  * Helper path can be overridden by an environment variable for testing.
>  * Add a couple of debug logging messages re toolstack data.
>  * Fixes from testing.
>  * Helper protocol message lengths (and numbers) are 16-bit which
>    more clearly avoids piling lots of junk on the stack.
>  * Merged with remus changes.
>  * Callback implementations in libxl now called via pointers
>    so remus can have its own callbacks.
>  * Better namespace prefixes on autogenerated names etc.
>  * Autogenerator can generate debugging printfs too.
> 
> ---
>  .gitignore                         |    1 +
>  .hgignore                          |    2 +
>  tools/libxl/Makefile               |   22 ++-
>  tools/libxl/libxl_create.c         |   22 ++-
>  tools/libxl/libxl_dom.c            |   42 +++--
>  tools/libxl/libxl_internal.h       |   56 +++++-
>  tools/libxl/libxl_save_callout.c   |  361 +++++++++++++++++++++++++++++++-
>  tools/libxl/libxl_save_helper.c    |  283 +++++++++++++++++++++++++
>  tools/libxl/libxl_save_msgs_gen.pl |  397 ++++++++++++++++++++++++++++++++++++
>  9 files changed, 1146 insertions(+), 40 deletions(-)
> 
> diff --git a/.gitignore b/.gitignore
> index 7770e54..3451e52 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -353,6 +353,7 @@ tools/libxl/_*.[ch]
>  tools/libxl/testidl
>  tools/libxl/testidl.c
>  tools/libxl/*.pyc
> +tools/libxl/libxl-save-helper
>  tools/blktap2/control/tap-ctl
>  tools/firmware/etherboot/eb-roms.h
>  tools/firmware/etherboot/gpxe-git-snapshot.tar.gz
> diff --git a/.hgignore b/.hgignore
> index 27d8f79..05304ea 100644
> --- a/.hgignore
> +++ b/.hgignore
> @@ -180,9 +180,11 @@
>  ^tools/libxl/_.*\.c$
>  ^tools/libxl/libxlu_cfg_y\.output$
>  ^tools/libxl/xl$
> +^tools/libxl/libxl-save-helper$
>  ^tools/libxl/testidl$
>  ^tools/libxl/testidl\.c$
>  ^tools/libxl/tmp\..*$
> +^tools/libxl/.*\.new$
>  ^tools/libvchan/vchan-node[12]$
>  ^tools/libaio/src/.*\.ol$
>  ^tools/libaio/src/.*\.os$
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 1d8b80a..ddc2624 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -67,25 +67,30 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>                         libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
>                         libxl_internal.o libxl_utils.o libxl_uuid.o \
>                         libxl_json.o libxl_aoutils.o \
> -                       libxl_save_callout.o \
> +                       libxl_save_callout.o _libxl_save_msgs_callout.o \
>                         libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
>  LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
> 
>  $(LIBXL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) $(CFLAGS_libxenstore) $(CFLAGS_libblktapctl) -include $(XEN_ROOT)/tools/config.h
> 
> -AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h
> +AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \
> +       _libxl_save_msgs_callout.h _libxl_save_msgs_helper.h
>  AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c
> +AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c
>  LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \
>         libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o
>  $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
> 
> -CLIENTS = xl testidl
> +CLIENTS = xl testidl libxl-save-helper
> 
>  XL_OBJS = xl.o xl_cmdimpl.o xl_cmdtable.o xl_sxp.o
>  $(XL_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h
>  $(XL_OBJS): CFLAGS += $(CFLAGS_libxenlight)
>  $(XL_OBJS): CFLAGS += -include $(XEN_ROOT)/tools/config.h # libxl_json.h needs it.
> 
> +SAVE_HELPER_OBJS = libxl_save_helper.o _libxl_save_msgs_helper.o
> +$(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
> +
>  testidl.o: CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenlight)
>  testidl.c: libxl_types.idl gentest.py libxl.h $(AUTOINCS)
>         $(PYTHON) gentest.py libxl_types.idl testidl.c.new
> @@ -117,6 +122,12 @@ _libxl_list.h: $(XEN_INCLUDE)/xen-external/bsd-sys-queue-h-seddery $(XEN_INCLUDE
>         perl $^ --prefix=libxl >$@.new
>         $(call move-if-changed,$@.new,$@)
> 
> +_libxl_save_msgs_helper.c _libxl_save_msgs_callout.c \
> +_libxl_save_msgs_helper.h _libxl_save_msgs_callout.h: \
> +               libxl_save_msgs_gen.pl
> +       $(PERL) -w $< $@ >$@.new
> +       $(call move-if-changed,$@.new,$@)
> +
>  libxl.h: _libxl_types.h
>  libxl_json.h: _libxl_types_json.h
>  libxl_internal.h: _libxl_types_internal.h _paths.h
> @@ -159,6 +170,9 @@ libxlutil.a: $(LIBXLU_OBJS)
>  xl: $(XL_OBJS) libxlutil.so libxenlight.so
>         $(CC) $(LDFLAGS) -o $@ $(XL_OBJS) libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) -lyajl $(APPEND_LDFLAGS)
> 
> +libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
> +       $(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(APPEND_LDFLAGS)
> +
>  testidl: testidl.o libxlutil.so libxenlight.so
>         $(CC) $(LDFLAGS) -o $@ testidl.o libxlutil.so $(LDLIBS_libxenlight) $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
> 
> @@ -169,7 +183,9 @@ install: all
>         $(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)
>         $(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR)
>         $(INSTALL_DIR) $(DESTDIR)$(XEN_RUN_DIR)
> +       $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR)
>         $(INSTALL_PROG) xl $(DESTDIR)$(SBINDIR)
> +       $(INSTALL_PROG) libxl-save-helper $(DESTDIR)$(PRIVATE_BINDIR)
>         $(INSTALL_PROG) libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)
>         ln -sf libxenlight.so.$(MAJOR).$(MINOR) $(DESTDIR)$(LIBDIR)/libxenlight.so.$(MAJOR)
>         ln -sf libxenlight.so.$(MAJOR) $(DESTDIR)$(LIBDIR)/libxenlight.so
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 9c3c671..7b92539 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -662,7 +662,8 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>      libxl_domain_build_info *const info = &d_config->b_info;
>      const int restore_fd = dcs->restore_fd;
>      libxl__domain_build_state *const state = &dcs->build_state;
> -    struct restore_callbacks *const callbacks = &dcs->callbacks;
> +    libxl__srm_restore_autogen_callbacks *const callbacks =
> +        &dcs->shs.callbacks.restore.a;
> 
>      if (rc) domcreate_rebuild_done(egc, dcs, rc);
> 
> @@ -702,7 +703,6 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>          pae = libxl_defbool_val(info->u.hvm.pae);
>          no_incr_generationid = !libxl_defbool_val(info->u.hvm.incr_generationid);
>          callbacks->toolstack_restore = libxl__toolstack_restore;
> -        callbacks->data = gc;
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
>          hvm = 0;
> @@ -722,10 +722,24 @@ static void domcreate_bootloader_done(libxl__egc *egc,
>      libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
>  }
> 
> -void libxl__xc_domain_restore_done(libxl__egc *egc,
> -                                   libxl__domain_create_state *dcs,
> +void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
> +          unsigned long console_mfn, unsigned long genidad, void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
> +    STATE_AO_GC(dcs->ao);
> +    libxl__domain_build_state *const state = &dcs->build_state;
> +
> +    state->store_mfn =            store_mfn;
> +    state->console_mfn =          console_mfn;
> +    state->vm_generationid_addr = genidad;
> +    shs->need_results =           0;
> +}
> +
> +void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>                                     int ret, int retval, int errnoval)
>  {
> +    libxl__domain_create_state *dcs = dcs_void;
>      STATE_AO_GC(dcs->ao);
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>      char **vments = NULL, **localents = NULL;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index c44dec0..0e0dbee 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -467,16 +467,20 @@ static inline char *restore_helper(libxl__gc *gc, uint32_t domid,
>  }
> 
>  int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
> -        uint32_t size, void *data)
> +                             uint32_t size, void *user)
>  {
> -    libxl__gc *gc = data;
> -    libxl_ctx *ctx = gc->owner;
> +    libxl__save_helper_state *shs = user;
> +    libxl__domain_create_state *dcs = CONTAINER_OF(shs, *dcs, shs);
> +    STATE_AO_GC(dcs->ao);
> +    libxl_ctx *ctx = CTX;
>      int i, ret;
>      const uint8_t *ptr = buf;
>      uint32_t count = 0, version = 0;
>      struct libxl__physmap_info* pi;
>      char *xs_path;
> 
> +    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, size);
> +
>      if (size < sizeof(version) + sizeof(count)) {
>          LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "wrong size");
>          return -1;
> @@ -529,9 +533,10 @@ static void domain_suspend_done(libxl__egc *egc,
>  /*----- callbacks, called by xc_domain_save -----*/
> 
>  int libxl__domain_suspend_common_switch_qemu_logdirty
> -                               (int domid, unsigned int enable, void *data)
> +                               (int domid, unsigned enable, void *user)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = user;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>      STATE_AO_GC(dss->ao);
>      char *path;
>      bool rc;
> @@ -597,9 +602,10 @@ int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid)
>      return 0;
>  }
> 
> -int libxl__domain_suspend_common_callback(void *data)
> +int libxl__domain_suspend_common_callback(void *user)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = user;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>      STATE_AO_GC(dss->ao);
>      unsigned long hvm_s_state = 0, hvm_pvdrv = 0;
>      int ret;
> @@ -739,9 +745,9 @@ static inline char *save_helper(libxl__gc *gc, uint32_t domid,
>  }
> 
>  int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
> -        uint32_t *len, void *data)
> +        uint32_t *len, void *dss_void)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__domain_suspend_state *dss = dss_void;
>      STATE_AO_GC(dss->ao);
>      int i = 0;
>      char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL;
> @@ -810,6 +816,8 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf,
>          ptr += sizeof(struct libxl__physmap_info) + namelen;
>      }
> 
> +    LOG(DEBUG,"domain=%"PRIu32" toolstack data size=%"PRIu32, domid, *len);
> +
>      return 0;
>  }
> 
> @@ -823,7 +831,8 @@ static int libxl__remus_domain_suspend_callback(void *data)
> 
>  static int libxl__remus_domain_resume_callback(void *data)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>      STATE_AO_GC(dss->ao);
> 
>      /* Resumes the domain and the device model */
> @@ -836,7 +845,8 @@ static int libxl__remus_domain_resume_callback(void *data)
> 
>  static int libxl__remus_domain_checkpoint_callback(void *data)
>  {
> -    libxl__domain_suspend_state *dss = data;
> +    libxl__save_helper_state *shs = data;
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>      STATE_AO_GC(dss->ao);
> 
>      /* This would go into tailbuf. */
> @@ -864,7 +874,8 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>      const int live = dss->live;
>      const int debug = dss->debug;
>      const libxl_domain_remus_info *const r_info = dss->remus;
> -    struct save_callbacks *const callbacks = &dss->callbacks;
> +    libxl__srm_save_autogen_callbacks *const callbacks =
> +        &dss->shs.callbacks.save.a;
> 
>      switch (type) {
>      case LIBXL_DOMAIN_TYPE_HVM: {
> @@ -925,8 +936,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>          callbacks->suspend = libxl__domain_suspend_common_callback;
> 
>      callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
> -    callbacks->toolstack_save = libxl__toolstack_save;
> -    callbacks->data = dss;
> +    dss->shs.callbacks.save.toolstack_save = libxl__toolstack_save;
> 
>      libxl__xc_domain_save(egc, dss, vm_generationid_addr);
>      return;
> @@ -935,10 +945,10 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
>      domain_suspend_done(egc, dss, rc);
>  }
> 
> -void libxl__xc_domain_save_done(libxl__egc *egc,
> -                                libxl__domain_suspend_state *dss,
> +void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
>                                  int rc, int retval, int errnoval)
>  {
> +    libxl__domain_suspend_state *dss = dss_void;
>      STATE_AO_GC(dss->ao);
> 
>      /* Convenience aliases */
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 7cf1b04..1a7b526 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -54,6 +54,7 @@
> 
>  #include "libxl.h"
>  #include "_paths.h"
> +#include "_libxl_save_msgs_callout.h"
> 
>  #if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1)
>  #define _hidden __attribute__((visibility("hidden")))
> @@ -1773,6 +1774,51 @@ _hidden void libxl__datacopier_kill(libxl__datacopier_state *dc);
>  _hidden int libxl__datacopier_start(libxl__datacopier_state *dc);
> 
> 
> +/*----- Save/restore helper (used by creation and suspend) -----*/
> +
> +typedef struct libxl__srm_save_callbacks {
> +    libxl__srm_save_autogen_callbacks a;
> +    int (*toolstack_save)(uint32_t domid, uint8_t **buf,
> +                          uint32_t *len, void *data);
> +} libxl__srm_save_callbacks;
> +
> +typedef struct libxl__srm_restore_callbacks {
> +    libxl__srm_restore_autogen_callbacks a;
> +} libxl__srm_restore_callbacks;
> +
> +/* a pointer to this struct is also passed as "user" to the
> + * save callout helper callback functions */
> +typedef struct libxl__save_helper_state {
> +    /* public, caller of run_helper initialises */
> +    libxl__ao *ao;
> +    uint32_t domid;
> +    union {
> +        libxl__srm_save_callbacks save;
> +        libxl__srm_restore_callbacks restore;
> +    } callbacks;
> +    int (*recv_callback)(const unsigned char *msg, uint32_t len, void *user);
> +    void (*completion_callback)(libxl__egc *egc, void *caller_state,
> +                                int rc, int retval, int errnoval);
> +    void *caller_state;
> +    int need_results; /* set to 0 or 1 by caller of run_helper;
> +                       * if set to 1 then the ultimate caller's
> +                       * results function must set it to 0 */
> +    /* private */
> +    int rc;
> +    int completed; /* retval/errnoval valid iff completed */
> +    int retval, errnoval; /* from xc_domain_save / xc_domain_restore */
> +    libxl__carefd *pipes[2]; /* 0 = helper's stdin, 1 = helper's stdout */
> +    libxl__ev_fd readable;
> +    libxl__ev_child child;
> +    const char *stdin_what, *stdout_what;
> +    FILE *toolstack_data_file;
> +
> +    libxl__egc *egc; /* valid only for duration of each event callback;
> +                      * is here in this struct for the benefit of the
> +                      * marshalling and xc callback functions */
> +} libxl__save_helper_state;
> +
> +
>  /*----- Domain suspend (save) state structure -----*/
> 
>  typedef struct libxl__domain_suspend_state libxl__domain_suspend_state;
> @@ -1798,7 +1844,7 @@ struct libxl__domain_suspend_state {
>      int xcflags;
>      int guest_responded;
>      int interval; /* checkpoint interval (for Remus) */
> -    struct save_callbacks callbacks;
> +    libxl__save_helper_state shs;
>  };
> 
> 
> @@ -1910,7 +1956,7 @@ struct libxl__domain_create_state {
>      libxl__stub_dm_spawn_state dmss;
>          /* If we're not doing stubdom, we use only dmss.dm,
>           * for the non-stubdom device model. */
> -    struct restore_callbacks callbacks;
> +    libxl__save_helper_state shs;
>  };
> 
>  /*----- Domain suspend (save) functions -----*/
> @@ -1926,8 +1972,7 @@ _hidden void libxl__xc_domain_save(libxl__egc*, libxl__domain_suspend_state*,
>  /* If rc==0 then retval is the return value from xc_domain_save
>   * and errnoval is the errno value it provided.
>   * If rc!=0, retval and errnoval are undefined. */
> -_hidden void libxl__xc_domain_save_done(libxl__egc*,
> -                                        libxl__domain_suspend_state*,
> +_hidden void libxl__xc_domain_save_done(libxl__egc*, void *dss_void,
>                                          int rc, int retval, int errnoval);
> 
>  _hidden int libxl__domain_suspend_common_callback(void *data);
> @@ -1945,8 +1990,7 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc,
>  /* If rc==0 then retval is the return value from xc_domain_save
>   * and errnoval is the errno value it provided.
>   * If rc!=0, retval and errnoval are undefined. */
> -_hidden void libxl__xc_domain_restore_done(libxl__egc *egc,
> -                                           libxl__domain_create_state *dcs,
> +_hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
>                                             int rc, int retval, int errnoval);
> 
> 
> diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
> index 1b481ab..a6abcda 100644
> --- a/tools/libxl/libxl_save_callout.c
> +++ b/tools/libxl/libxl_save_callout.c
> @@ -16,6 +16,30 @@
> 
>  #include "libxl_internal.h"
> 
> +/* stream_fd is as from the caller (eventually, the application).
> + * It may be 0, 1 or 2, in which case we need to dup it elsewhere.
> + * The actual fd value is not included in the supplied argnums; rather
> + * it will be automatically supplied by run_helper as the 2nd argument.
> + *
> + * preserve_fds are fds that the caller is intending to pass to the
> + * helper so which need cloexec clearing.  They may not be 0, 1 or 2.
> + * An entry may be -1 in which case it will be ignored.
> + */
> +static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
> +                       const char *mode_arg,
> +                       int stream_fd,
> +                       const int *preserve_fds, int num_preserve_fds,
> +                       const unsigned long *argnums, int num_argnums);
> +
> +static void helper_failed(libxl__egc*, libxl__save_helper_state *shs, int rc);
> +static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
> +                                   int fd, short events, short revents);
> +static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
> +                          pid_t pid, int status);
> +static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs);
> +
> +/*----- entrypoints -----*/
> +
>  void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
>                                int hvm, int pae, int superpages,
>                                int no_incr_generationid)
> @@ -27,22 +51,337 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
>      const int restore_fd = dcs->restore_fd;
>      libxl__domain_build_state *const state = &dcs->build_state;
> 
> -    int r = xc_domain_restore(CTX->xch, restore_fd, domid,
> -                              state->store_port, &state->store_mfn,
> -                              state->store_domid, state->console_port,
> -                              &state->console_mfn, state->console_domid,
> -                              hvm, pae, superpages, no_incr_generationid,
> -                              &state->vm_generationid_addr, &dcs->callbacks);
> -    libxl__xc_domain_restore_done(egc, dcs, 0, r, errno);
> +    unsigned cbflags = libxl__srm_callout_enumcallbacks_restore
> +        (&dcs->shs.callbacks.restore.a);
> +
> +    const unsigned long argnums[] = {
> +        domid,
> +        state->store_port,
> +        state->store_domid, state->console_port,
> +        state->console_domid,
> +        hvm, pae, superpages, no_incr_generationid,
> +        cbflags,
> +    };
> +
> +    dcs->shs.ao = ao;
> +    dcs->shs.domid = domid;
> +    dcs->shs.recv_callback = libxl__srm_callout_received_restore;
> +    dcs->shs.completion_callback = libxl__xc_domain_restore_done;
> +    dcs->shs.caller_state = dcs;
> +    dcs->shs.need_results = 1;
> +    dcs->shs.toolstack_data_file = 0;
> +
> +    run_helper(egc, &dcs->shs, "--restore-domain", restore_fd, 0,0,
> +               argnums, ARRAY_SIZE(argnums));
>  }
> 
>  void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
>                             unsigned long vm_generationid_addr)
>  {
>      STATE_AO_GC(dss->ao);
> -    int r;
> +    int r, rc, toolstack_data_fd = -1;
> +    uint32_t toolstack_data_len = 0;
> +
> +    /* Resources we need to free */
> +    uint8_t *toolstack_data_buf = 0;
> +
> +    unsigned cbflags = libxl__srm_callout_enumcallbacks_save
> +        (&dss->shs.callbacks.save.a);
> +
> +    if (dss->shs.callbacks.save.toolstack_save) {
> +        r = dss->shs.callbacks.save.toolstack_save
> +            (dss->domid, &toolstack_data_buf, &toolstack_data_len, dss);
> +        if (r) { rc = ERROR_FAIL; goto out; }
> +
> +        dss->shs.toolstack_data_file = tmpfile();
> +        if (!dss->shs.toolstack_data_file) {
> +            LOGE(ERROR, "cannot create toolstack data tmpfile");
> +            rc = ERROR_FAIL;
> +            goto out;
> +        }
> +        toolstack_data_fd = fileno(dss->shs.toolstack_data_file);
> +
> +        r = libxl_write_exactly(CTX, toolstack_data_fd,
> +                                toolstack_data_buf, toolstack_data_len,
> +                                "toolstack data tmpfile", 0);
> +        if (r) { rc = ERROR_FAIL; goto out; }
> +    }
> +
> +    const unsigned long argnums[] = {
> +        dss->domid, 0, 0, dss->xcflags, dss->hvm, vm_generationid_addr,
> +        toolstack_data_fd, toolstack_data_len,
> +        cbflags,
> +    };
> +
> +    dss->shs.ao = ao;
> +    dss->shs.domid = dss->domid;
> +    dss->shs.recv_callback = libxl__srm_callout_received_save;
> +    dss->shs.completion_callback = libxl__xc_domain_save_done;
> +    dss->shs.caller_state = dss;
> +    dss->shs.need_results = 0;
> +
> +    free(toolstack_data_buf);
> +
> +    run_helper(egc, &dss->shs, "--save-domain", dss->fd,
> +               &toolstack_data_fd, 1,
> +               argnums, ARRAY_SIZE(argnums));
> +    return;
> +
> + out:
> +    free(toolstack_data_buf);
> +    if (dss->shs.toolstack_data_file) fclose(dss->shs.toolstack_data_file);
> +
> +    libxl__xc_domain_save_done(egc, dss, rc, 0, 0);
> +}
> +
> +
> +/*----- helper execution -----*/
> +
> +static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
> +                       const char *mode_arg, int stream_fd,
> +                       const int *preserve_fds, int num_preserve_fds,
> +                       const unsigned long *argnums, int num_argnums)
> +{
> +    STATE_AO_GC(shs->ao);
> +    const char *args[4 + num_argnums];
> +    const char **arg = args;
> +    int i, rc;
> +
> +    /* Resources we must free */
> +    libxl__carefd *childs_pipes[2] = { 0,0 };
> +
> +    /* Convenience aliases */
> +    const uint32_t domid = shs->domid;
> +
> +    shs->rc = 0;
> +    shs->completed = 0;
> +    shs->pipes[0] = shs->pipes[1] = 0;
> +    libxl__ev_fd_init(&shs->readable);
> +    libxl__ev_child_init(&shs->child);
> +
> +    shs->stdin_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
> +                                " stdin pipe", domid);
> +    shs->stdout_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
> +                                 " stdout pipe", domid);
> +
> +    *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC "/" "libxl-save-helper";
> +    *arg++ = mode_arg;
> +    const char **stream_fd_arg = arg++;
> +    for (i=0; i<num_argnums; i++)
> +        *arg++ = GCSPRINTF("%lu", argnums[i]);
> +    *arg++ = 0;
> +    assert(arg == args + ARRAY_SIZE(args));
> +
> +    libxl__carefd_begin();
> +    int childfd;
> +    for (childfd=0; childfd<2; childfd++) {
> +        /* Setting up the pipe for the child's fd childfd */
> +        int fds[2];
> +        if (libxl_pipe(CTX,fds)) { rc = ERROR_FAIL; goto out; }
> +        int childs_end = childfd==0 ? 0 /*read*/  : 1 /*write*/;
> +        int our_end    = childfd==0 ? 1 /*write*/ : 0 /*read*/;
> +        childs_pipes[childfd] = libxl__carefd_record(CTX, fds[childs_end]);
> +        shs->pipes[childfd] =   libxl__carefd_record(CTX, fds[our_end]);
> +    }
> +    libxl__carefd_unlock();
> +
> +    pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
> +    if (!pid) {
> +        if (stream_fd <= 2) {
> +            stream_fd = dup(stream_fd);
> +            if (stream_fd < 0) {
> +                LOGE(ERROR,"dup migration stream fd");
> +                exit(-1);
> +            }
> +        }
> +        libxl_fd_set_cloexec(CTX, stream_fd, 0);
> +        *stream_fd_arg = GCSPRINTF("%d", stream_fd);
> +
> +        for (i=0; i<num_preserve_fds; i++)
> +            if (preserve_fds[i] >= 0) {
> +                assert(preserve_fds[i] > 2);
> +                libxl_fd_set_cloexec(CTX, preserve_fds[i], 0);
> +            }
> +
> +        libxl__exec(gc,
> +                    libxl__carefd_fd(childs_pipes[0]),
> +                    libxl__carefd_fd(childs_pipes[1]),
> +                    -1,
> +                    args[0], (char**)args, 0);
> +    }
> +
> +    libxl__carefd_close(childs_pipes[0]);
> +    libxl__carefd_close(childs_pipes[1]);
> +
> +    rc = libxl__ev_fd_register(gc, &shs->readable, helper_stdout_readable,
> +                               libxl__carefd_fd(shs->pipes[1]), POLLIN|POLLPRI);
> +    if (rc) goto out;
> +    return;
> +
> + out:
> +    libxl__carefd_close(childs_pipes[0]);
> +    libxl__carefd_close(childs_pipes[1]);
> +    helper_failed(egc, shs, rc);;
> +}
> +
> +static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
> +                          int rc)
> +{
> +    STATE_AO_GC(shs->ao);
> +
> +    if (!shs->rc)
> +        shs->rc = rc;
> +
> +    libxl__ev_fd_deregister(gc, &shs->readable);
> +
> +    if (!libxl__ev_child_inuse(&shs->child)) {
> +        helper_done(egc, shs);
> +        return;
> +    }
> +
> +    int r = kill(shs->child.pid, SIGKILL);
> +    if (r) LOGE(WARN, "failed to kill save/restore helper [%lu]",
> +                (unsigned long)shs->child.pid);
> +}
> +
> +static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
> +                                   int fd, short events, short revents)
> +{
> +    libxl__save_helper_state *shs = CONTAINER_OF(ev, *shs, readable);
> +    STATE_AO_GC(shs->ao);
> +    int rc, errnoval;
> +
> +    if (revents & (POLLERR|POLLPRI)) {
> +        LOG(ERROR, "%s signaled POLLERR|POLLPRI (%#x)",
> +            shs->stdout_what, revents);
> +        rc = ERROR_FAIL;
> + out:
> +        /* this is here because otherwise we bypass the decl of msg[] */
> +        helper_failed(egc, shs, rc);
> +        return;
> +    }
> +
> +    uint16_t msglen;
> +    errnoval = libxl_read_exactly(CTX, fd, &msglen, sizeof(msglen),
> +                                  shs->stdout_what, "ipc msg header");
> +    if (errnoval) { rc = ERROR_FAIL; goto out; }
> +
> +    unsigned char msg[msglen];
> +    errnoval = libxl_read_exactly(CTX, fd, msg, msglen,
> +                                  shs->stdout_what, "ipc msg body");
> +    if (errnoval) { rc = ERROR_FAIL; goto out; }
> +
> +    shs->egc = egc;
> +    shs->recv_callback(msg, msglen, shs);
> +    shs->egc = 0;
> +    return;
> +}
> +
> +static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
> +                          pid_t pid, int status)
> +{
> +    libxl__save_helper_state *shs = CONTAINER_OF(ch, *shs, child);
> +    STATE_AO_GC(shs->ao);
> +
> +    /* Convenience aliases */
> +    const uint32_t domid = shs->domid;
> +
> +    const char *what =
> +        GCSPRINTF("domain %"PRIu32" save/restore helper", domid);
> +
> +    if (status) {
> +        libxl_report_child_exitstatus(CTX, XTL_ERROR, what, pid, status);
> +        shs->rc = ERROR_FAIL;
> +    }
> +
> +    if (shs->need_results) {
> +        if (!shs->rc)
> +            LOG(ERROR,"%s exited without providing results",what);
> +        shs->rc = ERROR_FAIL;
> +    }
> +
> +    if (!shs->completed) {
> +        if (!shs->rc)
> +            LOG(ERROR,"%s exited without signaling completion",what);
> +        shs->rc = ERROR_FAIL;
> +    }
> +
> +    helper_done(egc, shs);
> +    return;
> +}
> +
> +static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs)
> +{
> +    STATE_AO_GC(shs->ao);
> +
> +    libxl__ev_fd_deregister(gc, &shs->readable);
> +    libxl__carefd_close(shs->pipes[0]);  shs->pipes[0] = 0;
> +    libxl__carefd_close(shs->pipes[1]);  shs->pipes[1] = 0;
> +    assert(!libxl__ev_child_inuse(&shs->child));
> +    if (shs->toolstack_data_file) fclose(shs->toolstack_data_file);
> +
> +    shs->egc = egc;
> +    shs->completion_callback(egc, shs->caller_state,
> +                             shs->rc, shs->retval, shs->errnoval);
> +    shs->egc = 0;
> +}
> +
> +/*----- generic helpers for the autogenerated code -----*/
> +
> +const libxl__srm_save_autogen_callbacks*
> +libxl__srm_callout_get_callbacks_save(void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    return &shs->callbacks.save.a;
> +}
> +
> +const libxl__srm_restore_autogen_callbacks*
> +libxl__srm_callout_get_callbacks_restore(void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    return &shs->callbacks.restore.a;
> +}
> +
> +void libxl__srm_callout_sendreply(int r, void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    libxl__egc *egc = shs->egc;
> +    STATE_AO_GC(shs->ao);
> +    int errnoval;
> +
> +    errnoval = libxl_write_exactly(CTX, libxl__carefd_fd(shs->pipes[0]),
> +                                   &r, sizeof(r), shs->stdin_what,
> +                                   "callback return value");
> +    if (errnoval)
> +        helper_failed(egc, shs, ERROR_FAIL);
> +}
> +
> +void libxl__srm_callout_callback_log(uint32_t level, uint32_t errnoval,
> +                  const char *context, const char *formatted, void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    STATE_AO_GC(shs->ao);
> +    xtl_log(CTX->lg, level, errnoval, context, "%s", formatted);
> +}
> +
> +void libxl__srm_callout_callback_progress(const char *context,
> +                   const char *doing_what, unsigned long done,
> +                   unsigned long total, void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    STATE_AO_GC(shs->ao);
> +    xtl_progress(CTX->lg, context, doing_what, done, total);
> +}
> +
> +int libxl__srm_callout_callback_complete(int retval, int errnoval,
> +                                         void *user)
> +{
> +    libxl__save_helper_state *shs = user;
> +    STATE_AO_GC(shs->ao);
> 
> -    r = xc_domain_save(CTX->xch, dss->fd, dss->domid, 0, 0, dss->xcflags,
> -                       &dss->callbacks, dss->hvm, vm_generationid_addr);
> -    libxl__xc_domain_save_done(egc, dss, 0, r, errno);
> +    shs->completed = 1;
> +    shs->retval = retval;
> +    shs->errnoval = errnoval;
> +    libxl__ev_fd_deregister(gc, &shs->readable);
> +    return 0;
>  }
> diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
> new file mode 100644
> index 0000000..772251a
> --- /dev/null
> +++ b/tools/libxl/libxl_save_helper.c
> @@ -0,0 +1,283 @@
> +/*
> + * Copyright (C) 2012      Citrix Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */
> +
> +/*
> + * The libxl-save-helper utility speaks a protocol to its caller for
> + * the callbacks.  The protocol is as follows.
> + *
> + * The helper talks on stdin and stdout, in binary in machine
> + * endianness.  The helper speaks first, and only when it has a
> + * callback to make.  It writes a 16-bit number being the message
> + * length, and then the message body.
> + *
> + * Each message starts with a 16-bit number indicating which of the
> + * messages it is, and then some arguments in a binary marshalled form.
> + * If the callback does not need a reply (it returns void), the helper
> + * just continues.  Otherwise the helper waits for its caller to send a
> + * single int which is to be the return value from the callback.
> + *
> + * Where feasible the stubs and callbacks have prototypes identical to
> + * those required by xc_domain_save and xc_domain_restore, so that the
> + * autogenerated functions can be used/provided directly.
> + *
> + * The actual messages are in the array @msgs in libxl_save_msgs_gen.pl
> + */
> +
> +#include "libxl_osdeps.h"
> +
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <assert.h>
> +#include <inttypes.h>
> +
> +#include "libxl.h"
> +
> +#include "xenctrl.h"
> +#include "xenguest.h"
> +#include "_libxl_save_msgs_helper.h"
> +
> +/*----- globals -----*/
> +
> +static const char *program = "libxl-save-helper";
> +static xentoollog_logger *logger;
> +static xc_interface *xch;
> +
> +/*----- error handling -----*/
> +
> +static void fail(int errnoval, const char *fmt, ...)
> +    __attribute__((noreturn,format(printf,2,3)));
> +static void fail(int errnoval, const char *fmt, ...)
> +{
> +    va_list al;
> +    va_start(al,fmt);
> +    xtl_logv(logger,XTL_ERROR,errnoval,program,fmt,al);
> +    exit(-1);
> +}
> +
> +static int read_exactly(int fd, void *buf, size_t len)
> +/* returns 0 if we get eof, even if we got it midway through; 1 if ok */
> +{
> +    while (len) {
> +        ssize_t r = read(fd, buf, len);
> +        if (r<=0) return r;
> +        assert(r <= len);
> +        len -= r;
> +        buf = (char*)buf + r;
> +    }
> +    return 1;
> +}
> +
> +static void *xmalloc(size_t sz)
> +{
> +    if (!sz) return 0;
> +    void *r = malloc(sz);
> +    if (!r) { perror("memory allocation failed"); exit(-1); }
> +    return r;
> +}
> +
> +/*----- logger -----*/
> +
> +typedef struct {
> +    xentoollog_logger vtable;
> +} xentoollog_logger_tellparent;
> +
> +static void tellparent_vmessage(xentoollog_logger *logger_in,
> +                                xentoollog_level level,
> +                                int errnoval,
> +                                const char *context,
> +                                const char *format,
> +                                va_list al)
> +{
> +    char *formatted;
> +    int r = vasprintf(&formatted, format, al);
> +    if (r < 0) { perror("memory allocation failed during logging"); exit(-1); }
> +    helper_stub_log(level, errnoval, context, formatted, 0);
> +    free(formatted);
> +}
> +
> +static void tellparent_progress(struct xentoollog_logger *logger_in,
> +                                const char *context,
> +                                const char *doing_what, int percent,
> +                                unsigned long done, unsigned long total)
> +{
> +    helper_stub_progress(context, doing_what, done, total, 0);
> +}
> +
> +static void tellparent_destroy(struct xentoollog_logger *logger_in)
> +{
> +    abort();
> +}
> +
> +static xentoollog_logger_tellparent *createlogger_tellparent(void)
> +{
> +    xentoollog_logger_tellparent newlogger;
> +    return XTL_NEW_LOGGER(tellparent, newlogger);
> +}
> +
> +/*----- helper functions called by autogenerated stubs -----*/
> +
> +unsigned char * helper_allocbuf(int len, void *user)
> +{
> +    return xmalloc(len);
> +}
> +
> +static void transmit(const unsigned char *msg, int len, void *user)
> +{
> +    while (len) {
> +        int r = write(1, msg, len);
> +        if (r<0) { perror("write"); exit(-1); }
> +        assert(r >= 0);
> +        assert(r <= len);
> +        len -= r;
> +        msg += r;
> +    }
> +}
> +
> +void helper_transmitmsg(unsigned char *msg_freed, int len_in, void *user)
> +{
> +    assert(len_in < 64*1024);
> +    uint16_t len = len_in;
> +    transmit((const void*)&len, sizeof(len), user);
> +    transmit(msg_freed, len, user);
> +    free(msg_freed);
> +}
> +
> +int helper_getreply(void *user)
> +{
> +    int v;
> +    int r = read_exactly(0, &v, sizeof(v));
> +    if (r<=0) exit(-2);
> +    return v;
> +}
> +
> +/*----- other callbacks -----*/
> +
> +static int toolstack_save_fd;
> +static uint32_t toolstack_save_len;
> +
> +static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
> +                             uint32_t *len, void *data)
> +{
> +    assert(toolstack_save_fd > 0);
> +
> +    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
> +    if (r) fail(errno,"rewind toolstack data tmpfile");
> +
> +    *buf = xmalloc(toolstack_save_len);
> +    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
> +    if (r<0) fail(errno,"read toolstack data");
> +    if (r==0) fail(0,"read toolstack data eof");
> +
> +    *len = toolstack_save_len;
> +    return 0;
> +}
> +
> +static void startup(const char *op) {
> +    logger = (xentoollog_logger*)createlogger_tellparent();
> +    if (!logger) {
> +        fprintf(stderr, "%s: cannot initialise logger\n", program);
> +        exit(-1);
> +    }
> +
> +    xtl_log(logger,XTL_DEBUG,0,program,"starting %s",op);
> +
> +    xch = xc_interface_open(logger,logger,0);
> +    if (!xch) fail(errno,"xc_interface_open failed");
> +}
> +
> +static void complete(int retval) {
> +    int errnoval = retval ? errno : 0; /* suppress irrelevant errnos */
> +    xtl_log(logger,XTL_DEBUG,errnoval,program,"complete r=%d",retval);
> +    helper_stub_complete(retval,errnoval,0);
> +    exit(0);
> +}
> +
> +static struct save_callbacks helper_save_callbacks;
> +static struct restore_callbacks helper_restore_callbacks;
> +
> +int main(int argc, char **argv)
> +{
> +    int r;
> +
> +#define NEXTARG (++argv, assert(*argv), *argv)
> +
> +    const char *mode = *++argv;
> +    assert(mode);
> +
> +    if (!strcmp(mode,"--save-domain")) {
> +
> +        int io_fd =                atoi(NEXTARG);
> +        uint32_t dom =             strtoul(NEXTARG,0,10);
> +        uint32_t max_iters =       strtoul(NEXTARG,0,10);
> +        uint32_t max_factor =      strtoul(NEXTARG,0,10);
> +        uint32_t flags =           strtoul(NEXTARG,0,10);
> +        int hvm =                  atoi(NEXTARG);
> +        unsigned long genidad =    strtoul(NEXTARG,0,10);
> +        toolstack_save_fd  =       atoi(NEXTARG);
> +        toolstack_save_len =       strtoul(NEXTARG,0,10);
> +        unsigned cbflags =         strtoul(NEXTARG,0,10);
> +        assert(!*++argv);
> +
> +        if (toolstack_save_fd >= 0)
> +            helper_save_callbacks.toolstack_save = toolstack_save_cb;
> +
> +        helper_setcallbacks_save(&helper_save_callbacks, cbflags);
> +
> +        startup("save");
> +        r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
> +                           &helper_save_callbacks, hvm, genidad);
> +        complete(r);
> +
> +    } else if (!strcmp(mode,"--restore-domain")) {
> +
> +        int io_fd =                atoi(NEXTARG);
> +        uint32_t dom =             strtoul(NEXTARG,0,10);
> +        unsigned store_evtchn =    strtoul(NEXTARG,0,10);
> +        domid_t store_domid =      strtoul(NEXTARG,0,10);
> +        unsigned console_evtchn =  strtoul(NEXTARG,0,10);
> +        domid_t console_domid =    strtoul(NEXTARG,0,10);
> +        unsigned int hvm =         strtoul(NEXTARG,0,10);
> +        unsigned int pae =         strtoul(NEXTARG,0,10);
> +        int superpages =           strtoul(NEXTARG,0,10);
> +        int no_incr_genidad =      strtoul(NEXTARG,0,10);
> +        unsigned cbflags =         strtoul(NEXTARG,0,10);
> +        assert(!*++argv);
> +
> +        helper_setcallbacks_restore(&helper_restore_callbacks, cbflags);
> +
> +        unsigned long store_mfn = 0;
> +        unsigned long console_mfn = 0;
> +        unsigned long genidad = 0;
> +
> +        startup("restore");
> +        r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn,
> +                              store_domid, console_evtchn, &console_mfn,
> +                              console_domid, hvm, pae, superpages,
> +                              no_incr_genidad, &genidad,
> +                              &helper_restore_callbacks);
> +        helper_stub_restore_results(store_mfn,console_mfn,genidad,0);
> +        complete(r);
> +
> +    } else {
> +        assert(!"unexpected mode argument");
> +    }
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl
> new file mode 100755
> index 0000000..c45986e
> --- /dev/null
> +++ b/tools/libxl/libxl_save_msgs_gen.pl
> @@ -0,0 +1,397 @@
> +#!/usr/bin/perl -w
> +
> +use warnings;
> +use strict;
> +use POSIX;
> +
> +our $debug = 0; # produce copious debugging output at run-time?
> +
> +our @msgs = (
> +    # flags:
> +    #   s  - applicable to save
> +    #   r  - applicable to restore
> +    #   c  - function pointer in callbacks struct rather than fixed function
> +    #   x  - function pointer is in struct {save,restore}_callbacks
> +    #         and its null-ness needs to be passed through to the helper's xc
> +    #   W  - needs a return value; callback is synchronous
> +    [  1, 'sr',     "log",                   [qw(uint32_t level
> +                                                 uint32_t errnoval
> +                                                 STRING context
> +                                                 STRING formatted)] ],
> +    [  2, 'sr',     "progress",              [qw(STRING context
> +                                                 STRING doing_what),
> +                                                'unsigned long', 'done',
> +                                                'unsigned long', 'total'] ],
> +    [  3, 'scxW',   "suspend", [] ],
> +    [  4, 'scxW',   "postcopy", [] ],
> +    [  5, 'scxW',   "checkpoint", [] ],
> +    [  6, 'scxW',   "switch_qemu_logdirty",  [qw(int domid
> +                                              unsigned enable)] ],
> +    #                toolstack_save          done entirely `by hand'
> +    [  7, 'rcxW',   "toolstack_restore",     [qw(uint32_t domid
> +                                                BLOCK tsdata)] ],
> +    [  8, 'r',      "restore_results",       ['unsigned long', 'store_mfn',
> +                                              'unsigned long', 'console_mfn',
> +                                              'unsigned long', 'genidad'] ],
> +    [  9, 'srW',    "complete",              [qw(int retval
> +                                                 int errnoval)] ],
> +);
> +
> +#----------------------------------------
> +
> +our %cbs;
> +our %func;
> +our %func_ah;
> +our @outfuncs;
> +our %out_decls;
> +our %out_body;
> +our %msgnum_used;
> +
> +die unless @ARGV==1;
> +die if $ARGV[0] =~ m/^-/;
> +
> +our ($intendedout) = @ARGV;
> +
> +$intendedout =~ m/([a-z]+)\.([ch])$/ or die;
> +my ($want_ah, $ch) = ($1, $2);
> +
> +my $declprefix = '';
> +
> +foreach my $ah (qw(callout helper)) {
> +    $out_body{$ah} .=
> +        <<END_BOTH.($ah eq 'callout' ? <<END_CALLOUT : <<END_HELPER);
> +#include "libxl_osdeps.h"
> +
> +#include <assert.h>
> +#include <string.h>
> +#include <stdint.h>
> +#include <limits.h>
> +END_BOTH
> +
> +#include "libxl_internal.h"
> +
> +END_CALLOUT
> +
> +#include "_libxl_save_msgs_${ah}.h"
> +#include <xenctrl.h>
> +#include <xenguest.h>
> +
> +END_HELPER
> +}
> +
> +die $want_ah unless defined $out_body{$want_ah};
> +
> +sub f_decl ($$$$) {
> +    my ($name, $ah, $c_rtype, $c_decl) = @_;
> +    $out_decls{$name} = "${declprefix}$c_rtype $name$c_decl;\n";
> +    $func{$name} = "$c_rtype $name$c_decl\n{\n" . ($func{$name} || '');
> +    $func_ah{$name} = $ah;
> +}
> +
> +sub f_more ($$) {
> +    my ($name, $addbody) = @_;
> +    $func{$name} ||= '';
> +    $func{$name} .= $addbody;
> +    push @outfuncs, $name;
> +}
> +
> +our $libxl = "libxl__srm";
> +our $callback = "${libxl}_callout_callback";
> +our $receiveds = "${libxl}_callout_received";
> +our $sendreply = "${libxl}_callout_sendreply";
> +our $getcallbacks = "${libxl}_callout_get_callbacks";
> +our $enumcallbacks = "${libxl}_callout_enumcallbacks";
> +sub cbtype ($) { "${libxl}_".$_[0]."_autogen_callbacks"; };
> +
> +f_decl($sendreply, 'callout', 'void', "(int r, void *user)");
> +
> +our $helper = "helper";
> +our $encode = "${helper}_stub";
> +our $allocbuf = "${helper}_allocbuf";
> +our $transmit = "${helper}_transmitmsg";
> +our $getreply = "${helper}_getreply";
> +our $setcallbacks = "${helper}_setcallbacks";
> +
> +f_decl($allocbuf, 'helper', 'unsigned char *', '(int len, void *user)');
> +f_decl($transmit, 'helper', 'void',
> +       '(unsigned char *msg_freed, int len, void *user)');
> +f_decl($getreply, 'helper', 'int', '(void *user)');
> +
> +sub typeid ($) { my ($t) = @_; $t =~ s/\W/_/; return $t; };
> +
> +$out_body{'callout'} .= <<END;
> +static int bytes_get(const unsigned char **msg,
> +                    const unsigned char *const endmsg,
> +                    void *result, int rlen)
> +{
> +    if (endmsg - *msg < rlen) return 0;
> +    memcpy(result,*msg,rlen);
> +    *msg += rlen;
> +    return 1;
> +}
> +
> +END
> +$out_body{'helper'} .= <<END;
> +static void bytes_put(unsigned char *const buf, int *len,
> +                     const void *value, int vlen)
> +{
> +    assert(vlen < INT_MAX/2 - *len);
> +    if (buf)
> +       memcpy(buf + *len, value, vlen);
> +    *len += vlen;
> +}
> +
> +END
> +
> +foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long') {
> +    my $typeid = typeid($simpletype);
> +    $out_body{'callout'} .= <<END;
> +static int ${typeid}_get(const unsigned char **msg,
> +                        const unsigned char *const endmsg,
> +                        $simpletype *result)
> +{
> +    return bytes_get(msg, endmsg, result, sizeof(*result));
> +}
> +
> +END
> +    $out_body{'helper'} .= <<END;
> +static void ${typeid}_put(unsigned char *const buf, int *len,
> +                        const $simpletype value)
> +{
> +    bytes_put(buf, len, &value, sizeof(value));
> +}
> +
> +END
> +}
> +
> +$out_body{'callout'} .= <<END;
> +static int BLOCK_get(const unsigned char **msg,
> +                      const unsigned char *const endmsg,
> +                      const uint8_t **result, uint32_t *result_size)
> +{
> +    if (!uint32_t_get(msg,endmsg,result_size)) return 0;
> +    if (endmsg - *msg < *result_size) return 0;
> +    *result = (const void*)*msg;
> +    *msg += *result_size;
> +    return 1;
> +}
> +
> +static int STRING_get(const unsigned char **msg,
> +                      const unsigned char *const endmsg,
> +                      const char **result)
> +{
> +    const uint8_t *data;
> +    uint32_t datalen;
> +    if (!BLOCK_get(msg,endmsg,&data,&datalen)) return 0;
> +    if (datalen == 0) return 0;
> +    if (data[datalen-1] != '\\0') return 0;
> +    *result = (const void*)data;
> +    return 1;
> +}
> +
> +END
> +$out_body{'helper'} .= <<END;
> +static void BLOCK_put(unsigned char *const buf,
> +                      int *len,
> +                     const uint8_t *bytes, uint32_t size)
> +{
> +    uint32_t_put(buf, len, size);
> +    bytes_put(buf, len, bytes, size);
> +}
> +
> +static void STRING_put(unsigned char *const buf,
> +                      int *len,
> +                      const char *string)
> +{
> +    size_t slen = strlen(string);
> +    assert(slen < INT_MAX / 4);
> +    assert(slen < (uint32_t)0x40000000);
> +    BLOCK_put(buf, len, (const void*)string, slen+1);
> +}
> +
> +END
> +
> +foreach my $sr (qw(save restore)) {
> +    f_decl("${getcallbacks}_${sr}", 'callout',
> +           "const ".cbtype($sr)." *",
> +           "(void *data)");
> +
> +    f_decl("${receiveds}_${sr}", 'callout', 'int',
> +          "(const unsigned char *msg, uint32_t len, void *user)");
> +
> +    f_decl("${enumcallbacks}_${sr}", 'callout', 'unsigned',
> +           "(const ".cbtype($sr)." *cbs)");
> +    f_more("${enumcallbacks}_${sr}", "    unsigned cbflags = 0;\n");
> +
> +    f_decl("${setcallbacks}_${sr}", 'helper', 'void',
> +           "(struct ${sr}_callbacks *cbs, unsigned cbflags)");
> +
> +    f_more("${receiveds}_${sr}",
> +           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
> +    const unsigned char *const endmsg = msg + len;
> +    uint16_t mtype;
> +    if (!uint16_t_get(&msg,endmsg,&mtype)) return 0;
> +END_ALWAYS
> +    fprintf(stderr,"libxl callout receiver: got len=%u mtype=%u\\n",len,mtype);
> +END_DEBUG
> +    switch (mtype) {
> +
> +END_ALWAYS
> +
> +    $cbs{$sr} = "typedef struct ".cbtype($sr)." {\n";
> +}
> +
> +foreach my $msginfo (@msgs) {
> +    my ($msgnum, $flags, $name, $args) = @$msginfo;
> +    die if $msgnum_used{$msgnum}++;
> +
> +    my $f_more_sr = sub {
> +        my ($contents_spec, $fnamebase) = @_;
> +        $fnamebase ||= "${receiveds}";
> +        foreach my $sr (qw(save restore)) {
> +            $sr =~ m/^./;
> +            next unless $flags =~ m/$&/;
> +            my $contents = (!ref $contents_spec) ? $contents_spec :
> +                $contents_spec->($sr);
> +            f_more("${fnamebase}_${sr}", $contents);
> +        }
> +    };
> +
> +    $f_more_sr->("    case $msgnum: { /* $name */\n");
> +    if ($flags =~ m/W/) {
> +        $f_more_sr->("        int r;\n");
> +    }
> +
> +    my $c_rtype_helper = $flags =~ m/W/ ? 'int' : 'void';
> +    my $c_rtype_callout = $flags =~ m/W/ ? 'int' : 'void';
> +    my $c_decl = '(';
> +    my $c_callback_args = '';
> +
> +    f_more("${encode}_$name",
> +           <<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS);
> +    unsigned char *buf = 0;
> +    int len = 0, allocd = 0;
> +
> +END_ALWAYS
> +    fprintf(stderr,"libxl-save-helper: encoding $name\\n");
> +END_DEBUG
> +    for (;;) {
> +        uint16_t_put(buf, &len, $msgnum /* $name */);
> +END_ALWAYS
> +
> +    my @args = @$args;
> +    my $c_recv = '';
> +    my ($argtype, $arg);
> +    while (($argtype, $arg, @args) = @args) {
> +       my $typeid = typeid($argtype);
> +        my $c_args = "$arg";
> +        my $c_get_args = "&$arg";
> +       if ($argtype eq 'STRING') {
> +           $c_decl .= "const char *$arg, ";
> +           $f_more_sr->("        const char *$arg;\n");
> +        } elsif ($argtype eq 'BLOCK') {
> +            $c_decl .= "const uint8_t *$arg, uint32_t ${arg}_size, ";
> +            $c_args .= ", ${arg}_size";
> +            $c_get_args .= ",&${arg}_size";
> +           $f_more_sr->("        const uint8_t *$arg;\n".
> +                         "        uint32_t ${arg}_size;\n");
> +       } else {
> +           $c_decl .= "$argtype $arg, ";
> +           $f_more_sr->("        $argtype $arg;\n");
> +       }
> +       $c_callback_args .= "$c_args, ";
> +       $c_recv.=
> +            "        if (!${typeid}_get(&msg,endmsg,$c_get_args)) return 0;\n";
> +        f_more("${encode}_$name", "    ${typeid}_put(buf, &len, $c_args);\n");
> +    }
> +    $f_more_sr->($c_recv);
> +    $c_decl .= "void *user)";
> +    $c_callback_args .= "user";
> +
> +    $f_more_sr->("        if (msg != endmsg) return 0;\n");
> +
> +    my $c_callback;
> +    if ($flags !~ m/c/) {
> +        $c_callback = "${callback}_$name";
> +    } else {
> +        $f_more_sr->(sub {
> +            my ($sr) = @_;
> +            $cbs{$sr} .= "    $c_rtype_callout (*${name})$c_decl;\n";
> +            return
> +          "        const ".cbtype($sr)." *const cbs =\n".
> +            "            ${getcallbacks}_${sr}(user);\n";
> +                       });
> +        $c_callback = "cbs->${name}";
> +    }
> +    my $c_make_callback = "$c_callback($c_callback_args)";
> +    if ($flags !~ m/W/) {
> +       $f_more_sr->("        $c_make_callback;\n");
> +    } else {
> +        $f_more_sr->("        r = $c_make_callback;\n".
> +                     "        $sendreply(r, user);\n");
> +       f_decl($sendreply, 'callout', 'void', '(int r, void *user)');
> +    }
> +    if ($flags =~ m/x/) {
> +        my $c_v = "(1u<<$msgnum)";
> +        my $c_cb = "cbs->$name";
> +        $f_more_sr->("    if ($c_cb) cbflags |= $c_v;\n", $enumcallbacks);
> +        $f_more_sr->("    $c_cb = (cbflags & $c_v) ? ${encode}_${name} : 0;\n",
> +                     $setcallbacks);
> +    }
> +    $f_more_sr->("        return 1;\n    }\n\n");
> +    f_decl("${callback}_$name", 'callout', $c_rtype_callout, $c_decl);
> +    f_decl("${encode}_$name", 'helper', $c_rtype_helper, $c_decl);
> +    f_more("${encode}_$name",
> +"        if (buf) break;
> +        buf = ${helper}_allocbuf(len, user);
> +        assert(buf);
> +        allocd = len;
> +        len = 0;
> +    }
> +    assert(len == allocd);
> +    ${transmit}(buf, len, user);
> +");
> +    if ($flags =~ m/W/) {
> +       f_more("${encode}_$name",
> +               (<<END_ALWAYS.($debug ? <<END_DEBUG : '').<<END_ALWAYS));
> +    int r = ${helper}_getreply(user);
> +END_ALWAYS
> +    fprintf(stderr,"libxl-save-helper: $name got reply %d\\n",r);
> +END_DEBUG
> +    return r;
> +END_ALWAYS
> +    }
> +}
> +
> +print "/* AUTOGENERATED by $0 DO NOT EDIT */\n\n" or die $!;
> +
> +foreach my $sr (qw(save restore)) {
> +    f_more("${enumcallbacks}_${sr}",
> +           "    return cbflags;\n");
> +    f_more("${receiveds}_${sr}",
> +           "    default:\n".
> +           "        return 0;\n".
> +           "    }");
> +    $cbs{$sr} .= "} ".cbtype($sr).";\n\n";
> +    if ($ch eq 'h') {
> +        print $cbs{$sr} or die $!;
> +        print "struct ${sr}_callbacks;\n";
> +    }
> +}
> +
> +if ($ch eq 'c') {
> +    foreach my $name (@outfuncs) {
> +        next unless defined $func{$name};
> +        $func{$name} .= "}\n\n";
> +        $out_body{$func_ah{$name}} .= $func{$name};
> +        delete $func{$name};
> +    }
> +    print $out_body{$want_ah} or die $!;
> +} else {
> +    foreach my $name (sort keys %out_decls) {
> +        next unless $func_ah{$name} eq $want_ah;
> +        print $out_decls{$name} or die $!;
> +    }
> +}
> +
> +close STDOUT or die $!;
> --
> tg: (52b6131..) t/xen/xc.save-restore-protocol (depends on: t/xen/xl.ao.suspend.pre)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-28 13:50   ` Ian Campbell
@ 2012-06-28 14:24     ` Ian Jackson
  2012-06-28 14:44       ` Ian Campbell
  2012-06-28 15:17       ` Shriram Rajagopalan
  0 siblings, 2 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-28 14:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Shriram Rajagopalan, xen-devel

Ian Campbell writes ("Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process"):
> Does this mean this series is now ready to go in?

I think so.  I'm just giving Shriram a chance to object.

> I did wonder when I saw the incremental patch if some of those internal
> callback pointers could perhaps be properly typed instead of void
> (because they all end up taking the same pointer type), but lets not
> worry about that here.

The void*'s come from the libxc API.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-28 14:24     ` Ian Jackson
@ 2012-06-28 14:44       ` Ian Campbell
  2012-06-28 15:17       ` Shriram Rajagopalan
  1 sibling, 0 replies; 40+ messages in thread
From: Ian Campbell @ 2012-06-28 14:44 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Shriram Rajagopalan, xen-devel

On Thu, 2012-06-28 at 15:24 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process"):
> > Does this mean this series is now ready to go in?
> 
> I think so.  I'm just giving Shriram a chance to object.

OK.

> > I did wonder when I saw the incremental patch if some of those internal
> > callback pointers could perhaps be properly typed instead of void
> > (because they all end up taking the same pointer type), but lets not
> > worry about that here.
> 
> The void*'s come from the libxc API.

Ah, I thought they were internal, nevermind.

> 
> Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-28 14:24     ` Ian Jackson
  2012-06-28 14:44       ` Ian Campbell
@ 2012-06-28 15:17       ` Shriram Rajagopalan
  1 sibling, 0 replies; 40+ messages in thread
From: Shriram Rajagopalan @ 2012-06-28 15:17 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2798 bytes --]

On Thu, Jun 28, 2012 at 10:24 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Ian Campbell writes ("Re: [PATCH v6 00/21] libxl: domain save/restore: run
> in a separate process"):
> > Does this mean this series is now ready to go in?
>
> I think so.  I'm just giving Shriram a chance to object.
>
>
I have no objections. I just finished testing the series with
xm (to ensure xend/remus was not broken). xl remus also fails over properly.
Things are good on that front.

But for the test case where I kill the backup (even with remote
host replication), xl still crashes
the primary. [xend works properly in this case]. xl error output is at the
end of the mail.

Either way, I have no objections to this series.
Tested-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>


There is a more pressing matter that I just noticed.
The performance is extremely abysmal! Especially with xl.
Here is a comparative analysis between xend/remus and xl/remus
for a PV domU w/ and w/o suspend-event channel.

What is being measured ? time to suspend + time to resume
 I am primarily concerned with the time to suspend and time to resume.
With event channel, this should be on the order of 1ms or so. With
xenstore,
I would expect this to be max 5-7ms.
NB: This does not include the memcpy phase in xc_domain_save

Results: 32-bit PV domU w/ suspend event channel (2.6.32.2 xenolinux kernel)
xl-remus: ~1ms
xend-remus: ~1ms

So, for guests with suspend event channel support,
remus with xl/xend has the same suspend/resume overhead.

64-bit PV domU w/o suspend event channel (3.3.0 upstream kernel)
xl-remus: ~202ms !!!
xend-remus: ~2.2ms

This 202ms figure is same in both IanJ's tree and the baseline xen-unstable.

Looking back at the logs, this has been the same since January.
Is there some fixed timeout lurking in the code somewhere ?


====
xl error output, when killing backup VM (it crashes primary VM instead of
resuming
it properly)
libxl: error: libxl_create.c:760:libxl__xc_domain_restore_done: restoring
domain: Resource temporarily unavailable
libxl: error: libxl_create.c:844:domcreate_rebuild_done: cannot (re-)build
domain: -3
libxl: error: libxl.c:1220:libxl_domain_destroy: non-existant domain 24
libxl: error: libxl_create.c:995:domcreate_complete: unable to destroy
domain 24 following failed creation
migration target: Domain creation failed (code -3).
pagetables=2,cache_misses=0,emptypages=41
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated
reading ipc msg header from domain 3 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 3
save/restore helper [5620] died due to fatal signal Broken pipe
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.


thanks
shriram

[-- Attachment #1.2: Type: text/html, Size: 3874 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v6 00/21] libxl: domain save/restore: run in a separate process
  2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
  2012-06-28 13:50   ` Ian Campbell
@ 2012-06-28 17:45   ` Ian Jackson
  1 sibling, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-28 17:45 UTC (permalink / raw)
  To: xen-devel, Shriram Rajagopalan, Ian Campbell

Ian Jackson writes ("[PATCH v6 00/21] libxl: domain save/restore: run in a separate process"):
> Following testing by Shriram (thanks) I have an updated version of
> 06/21.  For the sake of everyone's sanity (and your MUAs) I shan't
> repost the whole series.
> 
> Here is v6 of 06/21, which is simply the previous one with my earlier
> fixup patch folded in.

I have now rebased these to cope with a couple of minor conflicts,
dealt with the one missing NOGC conversion, and pushed it.

Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks everyone.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL
  2012-06-26 17:55 ` [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL Ian Jackson
@ 2012-06-28 17:56   ` Ian Jackson
  0 siblings, 0 replies; 40+ messages in thread
From: Ian Jackson @ 2012-06-28 17:56 UTC (permalink / raw)
  To: xen-devel

Ian Jackson writes ("[PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL"):
> Since it used to be legal to pass gc_opt==NULL, and there are various
> patches floating about and under development which do so, add a
> compiler annotation which makes the build fail when that is done.
> 
> This turns a runtime crash into a build failure, and should ensure
> that we don't accidentally commit a broken combination of patches.

I would just like to mention that this did indeed today save me from
committing a broken combination of patches :-).

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-06-28 17:56 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 17:54 ` [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct Ian Jackson
2012-06-26 17:54 ` [PATCH 02/21] libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails Ian Jackson
2012-06-26 17:55 ` [PATCH 03/21] libxl: domain save: rename variables etc Ian Jackson
2012-06-26 17:55 ` [PATCH 04/21] libxl: domain restore: reshuffle, preparing for ao Ian Jackson
2012-06-26 17:55 ` [PATCH 05/21] libxl: domain save: API changes for asynchrony Ian Jackson
2012-06-26 17:55 ` [PATCH 06/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 17:55 ` [PATCH 07/21] libxl: rename libxl_dom:save_helper to physmap_path Ian Jackson
2012-06-26 17:55 ` [PATCH 08/21] libxl: provide libxl__xs_*_checked and libxl__xs_transaction_* Ian Jackson
2012-06-26 17:55 ` [PATCH 09/21] libxl: wait for qemu to acknowledge logdirty command Ian Jackson
2012-06-26 17:55 ` [PATCH 10/21] libxl: datacopier: provide "prefix data" facility Ian Jackson
2012-06-26 17:55 ` [PATCH 11/21] libxl: prepare for asynchronous writing of qemu save file Ian Jackson
2012-06-26 17:55 ` [PATCH 12/21] libxl: Make libxl__domain_save_device_model asynchronous Ian Jackson
2012-06-26 17:55 ` [PATCH 13/21] libxl: Add a gc to libxl_get_cpu_topology Ian Jackson
2012-06-26 17:55 ` [PATCH 14/21] libxl: Do not pass NULL as gc_opt; introduce NOGC Ian Jackson
2012-06-26 17:55 ` [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL Ian Jackson
2012-06-28 17:56   ` Ian Jackson
2012-06-26 17:55 ` [PATCH 16/21] xl: Handle return value from libxl_domain_suspend correctly Ian Jackson
2012-06-26 17:55 ` [PATCH 17/21] libxl: do not leak dms->saved_state Ian Jackson
2012-06-26 17:55 ` [PATCH 18/21] libxl: do not leak spawned middle children Ian Jackson
2012-06-26 17:55 ` [PATCH 19/21] libxl: do not leak an event struct on ignored ao progress Ian Jackson
2012-06-26 17:55 ` [PATCH 20/21] libxl: further fixups re LIBXL_DOMAIN_TYPE Ian Jackson
2012-06-26 17:55 ` [PATCH 21/21] libxl: DO NOT APPLY enforce prohibition on internal Ian Jackson
2012-06-26 18:00 ` [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 18:44   ` Shriram Rajagopalan
2012-06-27  1:25     ` Shriram Rajagopalan
2012-06-27 13:46       ` Ian Jackson
2012-06-27 15:59         ` Ian Jackson
2012-06-27 16:09           ` Shriram Rajagopalan
2012-06-27 16:42             ` Shriram Rajagopalan
2012-06-28 11:24               ` Ian Jackson
2012-06-27 16:06         ` Shriram Rajagopalan
2012-06-27 13:17     ` Ian Jackson
2012-06-27 13:28       ` Shriram Rajagopalan
2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
2012-06-28 13:50   ` Ian Campbell
2012-06-28 14:24     ` Ian Jackson
2012-06-28 14:44       ` Ian Campbell
2012-06-28 15:17       ` Shriram Rajagopalan
2012-06-28 17:45   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.