All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
@ 2015-02-10 20:09 Ian Jackson
  2015-02-10 20:09 ` [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler Ian Jackson
                   ` (31 more replies)
  0 siblings, 32 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Euan Harris

This is v2 of my work-in-progress series to support cancellation of
long-running libxl operations.

There are many improvements since v1, but the basic structure remains
the same and the external API remains unchanged.

I have rebased this onto current staging.  I have compiled it but
NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
you something you could start to work on building against.

I wouldn't recommend testing it yet until I've at least smoke tested
it to see that things still work if you don't cancel them.

Here's a list of the patches:

  01/29  libxl: Further fix exit paths from libxl_device_events_handler
  02/29  libxl: Comment cleanups
  03/29  libxl: suspend: switch_logdirty_done takes rc
  04/29  libxl: suspend: common suspend callbacks take rc
  05/29  libxl: suspend: Return correct error from callbacks
  06/29  libxl: Use libxl__xswait* in libxl__ao_device
  07/29  libxl: xswait/devstate: Move xswait to before devstate
  08/29  libxl: devstate: Use libxl__xswait*
  09/29  libxl: New error codes CANCELLED etc.
  10/29  libxl: events: Make timeout and async exec setup take an ao, not a gc
  11/29  libxl: events: Make libxl__async_exec_* pass caller an rc
  12/29  libxl: events: Permit timeouts to signal cancellation
  13/29  libxl: domain create: Do not destroy on cancellation
  14/29  libxl: ao: Record ultimate parent of a nested ao
  15/29  libxl: ao: Count the nested progeny of an ao
  16/29  libxl: ao: Provide manip_refcnt
  17/29  libxl: cancellation: Provide public ao cancellation API
  18/29  libxl: cancellation: Provide explicit internal cancel check API
  19/29  libxl: cancellation: Make timeouts cancellable
  20/29  libxl: cancellation: Note that driver domain task cannot be cancelled
  21/29  libxl: cancellation: Make spawns cancellable
  22/29  libxl: Introduce DOMAIN_DESTROYED error code
  23/29  libxl: cancellation: Support cancellation where we spot domain death
  24/29  libxl: Introduce FILLZERO
  25/29  libxl: cancellation: Preparations for save/restore cancellation
  26/29  libxl: cancellation: Handle SIGTERM in save/restore helper
  27/29  libxl: cancellation: Cancel libxc save/restore
  28/29  libxl: ao: datacopier callback gets an rc
  29/29  libxl: cancellation: Make datacopiers cancellable

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-02-18 16:23   ` Roger Pau Monné
  2015-02-10 20:09 ` [PATCH 02/29] libxl: Comment cleanups Ian Jackson
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris, Roger Pau Monne

On the success path, do not call GC_FREE explicitly.  Instead, call
AO_INPROGRESS.

GC_FREE will free the gc underlying the long-term ao, which is then
subsequently referenced in backend_watch_callback's call to
libxl__nested_ao_create.  It is a miracle that this ever works at all.

Also, add an `if (rc) goto out;' after the xswatch registration.

After this, libxl_device_events_handler has the conventional and
correct ao initiation pattern.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Roger Pau Monne <roger.pau@citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b9a1941..ae0c7e1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4540,11 +4540,12 @@ int libxl_device_events_handler(libxl_ctx *ctx,
     be_path = GCSPRINTF("/local/domain/%u/backend", domid);
     rc = libxl__ev_xswatch_register(gc, &ddomain.watch, backend_watch_callback,
                                     be_path);
+    if (rc) goto out;
 
-out:
-    GC_FREE;
-    if (rc) return AO_ABORT(rc);
     return AO_INPROGRESS;
+
+out:
+    return AO_ABORT(rc);
 }
 
 /******************************************************************************/
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 02/29] libxl: Comment cleanups
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
  2015-02-10 20:09 ` [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-02-11  1:46   ` Hongyang Yang
  2015-03-24 10:50   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc Ian Jackson
                   ` (29 subsequent siblings)
  31 siblings, 2 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, Yang Hongyang, Euan Harris, Lai Jiangshan, Wen Congyang

* Add two comments in libxl_remus_disk_drbd documenting buggy handling
  of the hotplug script exit status.

* Add a section heading for async exec in libxl_aoutils.c

* Mention the right function name (libxl__ev_child_fork, not
  libxl__ev_fork) in libxl_internal.h

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
---
v2: New patch in this version of the series.
---
 tools/libxl/libxl_aoutils.c         |    2 ++
 tools/libxl/libxl_internal.h        |    2 +-
 tools/libxl/libxl_remus_disk_drbd.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index b10d2e1..44dc222 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -451,6 +451,8 @@ int libxl__openptys(libxl__openpty_state *op,
     return rc;
 }
 
+/*----- async exec -----*/
+
 static void async_exec_timeout(libxl__egc *egc,
                                libxl__ev_time *ev,
                                const struct timeval *requested_abs)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..99db92a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1393,7 +1393,7 @@ _hidden int libxl__spawn_record_pid(libxl__gc*, libxl__spawn_state*,
  * This is a NOT function for waiting for ordinary child processes.
  * If you want to run (fork/exec/wait) subprocesses from libxl:
  *  - Make your libxl entrypoint use the ao machinery
- *  - Use libxl__ev_fork, and use the callback programming style
+ *  - Use libxl__ev_child_fork, and use the callback programming style
  *
  * This function is intended for interprocess communication with a
  * service process.  If the service process does not respond quickly,
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 3215f93..afe9b61 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -145,6 +145,8 @@ static void match_async_exec_cb(libxl__egc *egc,
 
     if (status) {
         rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+        /* BUG: seems to assume that any exit status means `no match' */
+        /* BUG: exit status will have been logged as an error */
         goto out;
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
  2015-02-10 20:09 ` [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler Ian Jackson
  2015-02-10 20:09 ` [PATCH 02/29] libxl: Comment cleanups Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 10:53   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 04/29] libxl: suspend: common suspend callbacks take rc Ian Jackson
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

switch_logdirty_done used to take the value to pass to
libxl__xc_domain_saverestore_async_callback_done (ie, the return value
from the callback).  (This was mistakenly described as "ok" in the
prototype, but in the definition it is "broke" and all the call sites
passed 0 for success or -1 for error.)

Instead, make it take a libxl error code (rc).  Convert this to the
suspend callback value at the end.

No functional change in this patch.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_dom.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 48d661a..010061e 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -947,7 +947,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
-                                 libxl__domain_suspend_state *dss, int ok);
+                                 libxl__domain_suspend_state *dss, int rc);
 
 static void logdirty_init(libxl__logdirty_switch *lds)
 {
@@ -1024,7 +1024,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
  out:
     LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
     libxl__xs_transaction_abort(gc, &t);
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,dss,rc);
 }
 
 static void domain_suspend_switch_qemu_xen_logdirty
@@ -1072,7 +1072,7 @@ static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
     libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
     STATE_AO_GC(dss->ao);
     LOG(ERROR,"logdirty switch: wait for device model timed out");
-    switch_logdirty_done(egc,dss,-1);
+    switch_logdirty_done(egc,dss,ERROR_FAIL);
 }
 
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
@@ -1124,17 +1124,16 @@ static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch *watch,
      */
     libxl__xs_transaction_abort(gc, &t);
 
-    if (!rc) {
-        switch_logdirty_done(egc,dss,0);
-    } else if (rc < 0) {
-        LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
-        switch_logdirty_done(egc,dss,-1);
+    if (rc <= 0) {
+        if (rc < 0)
+            LOG(ERROR,"logdirty switch: failed (rc=%d)",rc);
+        switch_logdirty_done(egc,dss,rc);
     }
 }
 
 static void switch_logdirty_done(libxl__egc *egc,
                                  libxl__domain_suspend_state *dss,
-                                 int broke)
+                                 int rc)
 {
     STATE_AO_GC(dss->ao);
     libxl__logdirty_switch *lds = &dss->logdirty;
@@ -1142,6 +1141,12 @@ static void switch_logdirty_done(libxl__egc *egc,
     libxl__ev_xswatch_deregister(gc, &lds->watch);
     libxl__ev_time_deregister(gc, &lds->timeout);
 
+    int broke;
+    if (rc) {
+        broke = -1;
+    } else {
+        broke = 0;
+    }
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 04/29] libxl: suspend: common suspend callbacks take rc
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (2 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 10:55   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 05/29] libxl: suspend: Return correct error from callbacks Ian Jackson
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Change the following functions to take a libxl error code rather than
a boolean "ok" value, and translate that value to the boolean expected
by libxc at the last moment:
  domain_suspend_callback_common_done        } dss->callback_common_done
  remus_domain_suspend_callback_common_done  }
  domain_suspend_common_done

Also, abolish domain_suspend_common_failed as
domain_suspend_common_done can easily do its job and the call sites
now have to supply the right rc value anyway.

In domain_suspend_common_guest_suspended, change "ret" to "rc"
as it contains a libxl error code.

There is no functional change in this patch: the proper rc value now
propagates further, but is still eventually smashed to a boolean.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Fix a leftover comment referring to domain_suspend_common_failed
---
 tools/libxl/libxl_dom.c |   54 +++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 010061e..9971cdc 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -930,7 +930,7 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf,
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok);
+                                libxl__domain_suspend_state *dss, int rc);
 
 /*----- complicated callback, called by xc_domain_save -----*/
 
@@ -1217,11 +1217,9 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs);
 
-static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss);
 static void domain_suspend_common_done(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss,
-                                       bool ok);
+                                       int rc);
 
 static bool domain_suspend_pvcontrol_acked(const char *state) {
     /* any value other than "suspend", including ENOENT (i.e. !state), is OK */
@@ -1251,6 +1249,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         ret = xc_evtchn_notify(CTX->xce, dss->guest_evtchn.port);
         if (ret < 0) {
             LOG(ERROR, "xc_evtchn_notify failed ret=%d", ret);
+            rc = ERROR_FAIL;
             goto err;
         }
 
@@ -1271,6 +1270,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
         if (ret < 0) {
             LOGE(ERROR, "xc_domain_shutdown failed");
+            rc = ERROR_FAIL;
             goto err;
         }
         /* The guest does not (need to) respond to this sort of request. */
@@ -1285,7 +1285,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
     libxl__domain_pvcontrol_write(gc, XBT_NULL, domid, "suspend");
 
     dss->pvcontrol.path = libxl__domain_pvcontrol_xspath(gc, domid);
-    if (!dss->pvcontrol.path) goto err;
+    if (!dss->pvcontrol.path) { rc = ERROR_FAIL; goto err; }
 
     dss->pvcontrol.ao = ao;
     dss->pvcontrol.what = "guest acknowledgement of suspend request";
@@ -1295,7 +1295,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_done(egc, dss, rc);
 }
 
 static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
@@ -1305,8 +1305,8 @@ static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
     STATE_AO_GC(dss->ao);
     /* If we should be done waiting, suspend_common_wait_guest_check
      * will end up calling domain_suspend_common_guest_suspended or
-     * domain_suspend_common_failed, both of which cancel the evtchn
-     * wait.  So re-enable it now. */
+     * domain_suspend_common_done, both of which cancel the evtchn
+     * wait as needed.  So re-enable it now. */
     libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
     suspend_common_wait_guest_check(egc, dss);
 }
@@ -1371,7 +1371,7 @@ static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
 
  err:
     libxl__xs_transaction_abort(gc, &t);
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_done(egc, dss, rc);
     return;
 }
 
@@ -1395,7 +1395,7 @@ static void domain_suspend_common_wait_guest(libxl__egc *egc,
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_done(egc, dss, rc);
 }
 
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
@@ -1445,7 +1445,7 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
     return;
 
  err:
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_done(egc, dss, ERROR_FAIL);
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
@@ -1454,46 +1454,40 @@ static void suspend_common_wait_guest_timeout(libxl__egc *egc,
     libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, guest_timeout);
     STATE_AO_GC(dss->ao);
     LOG(ERROR, "guest did not suspend, timed out");
-    domain_suspend_common_failed(egc, dss);
+    domain_suspend_common_done(egc, dss, ERROR_GUEST_TIMEDOUT);
 }
 
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
                                          libxl__domain_suspend_state *dss)
 {
     STATE_AO_GC(dss->ao);
-    int ret;
+    int rc;
 
     libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
     libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
     libxl__ev_time_deregister(gc, &dss->guest_timeout);
 
     if (dss->hvm) {
-        ret = libxl__domain_suspend_device_model(gc, dss);
-        if (ret) {
-            LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret);
-            domain_suspend_common_failed(egc, dss);
+        rc = libxl__domain_suspend_device_model(gc, dss);
+        if (rc) {
+            LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", rc);
+            domain_suspend_common_done(egc, dss, rc);
             return;
         }
     }
-    domain_suspend_common_done(egc, dss, 1);
-}
-
-static void domain_suspend_common_failed(libxl__egc *egc,
-                                         libxl__domain_suspend_state *dss)
-{
     domain_suspend_common_done(egc, dss, 0);
 }
 
 static void domain_suspend_common_done(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss,
-                                       bool ok)
+                                       int rc)
 {
     EGC_GC;
     assert(!libxl__xswait_inuse(&dss->pvcontrol));
     libxl__ev_evtchn_cancel(gc, &dss->guest_evtchn);
     libxl__ev_xswatch_deregister(gc, &dss->guest_watch);
     libxl__ev_time_deregister(gc, &dss->guest_timeout);
-    dss->callback_common_done(egc, dss, ok);
+    dss->callback_common_done(egc, dss, rc);
 }
 
 static inline char *physmap_path(libxl__gc *gc, uint32_t domid,
@@ -1591,9 +1585,9 @@ static void libxl__domain_suspend_callback(void *data)
 }
 
 static void domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state *dss, int rc)
 {
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
 /*----- remus callbacks -----*/
@@ -1617,9 +1611,9 @@ static void libxl__remus_domain_suspend_callback(void *data)
 }
 
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
-                                libxl__domain_suspend_state *dss, int ok)
+                                libxl__domain_suspend_state *dss, int rc)
 {
-    if (!ok)
+    if (rc)
         goto out;
 
     libxl__remus_devices_state *const rds = &dss->rds;
@@ -1628,7 +1622,7 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     return;
 
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
 static void remus_devices_postsuspend_cb(libxl__egc *egc,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 05/29] libxl: suspend: Return correct error from callbacks
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (3 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 04/29] libxl: suspend: common suspend callbacks take rc Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 10:58   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device Ian Jackson
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

If a suspend callback fails, it has a libxl error code in its hand.
However we must return to libxc the values that libxc expects.  So we
stash the libxl error code in dss->rc and fish it out again after
libxc returns from the suspend call.

While we're here, abolish the now-redundant `ok' variable in
remus_devices_postsuspend_cb.

The overall functional change is that libxl_domain_save now completes
with the correct error code as determined when the underlying failure
happened.  (Usually this is, still, ERROR_FAIL.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Add cleanup in remus_devices_postsuspend_cb.
---
 tools/libxl/libxl_dom.c      |   22 ++++++++++++++++------
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 9971cdc..90877d6 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1041,6 +1041,7 @@ static void domain_suspend_switch_qemu_xen_logdirty
         libxl__xc_domain_saverestore_async_callback_done(egc, shs, 0);
     } else {
         LOG(ERROR,"logdirty switch failed (rc=%d), aborting suspend",rc);
+        dss->rc = rc;
         libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
     }
 }
@@ -1063,6 +1064,7 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
     default:
         LOG(ERROR,"logdirty switch failed"
             ", no valid device model version found, aborting suspend");
+        dss->rc = ERROR_FAIL;
         libxl__xc_domain_saverestore_async_callback_done(egc, shs, -1);
     }
 }
@@ -1144,6 +1146,7 @@ static void switch_logdirty_done(libxl__egc *egc,
     int broke;
     if (rc) {
         broke = -1;
+        dss->rc = rc;
     } else {
         broke = 0;
     }
@@ -1587,6 +1590,7 @@ static void libxl__domain_suspend_callback(void *data)
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int rc)
 {
+    dss->rc = rc;
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
@@ -1622,6 +1626,7 @@ static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
     return;
 
 out:
+    dss->rc = rc;
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
@@ -1629,16 +1634,17 @@ static void remus_devices_postsuspend_cb(libxl__egc *egc,
                                          libxl__remus_devices_state *rds,
                                          int rc)
 {
-    int ok = 0;
     libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
 
     if (rc)
         goto out;
 
-    ok = 1;
+    rc = 0;
 
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+    if (rc)
+        dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
 static void libxl__remus_domain_resume_callback(void *data)
@@ -1657,7 +1663,6 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
                                        libxl__remus_devices_state *rds,
                                        int rc)
 {
-    int ok = 0;
     libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
     STATE_AO_GC(dss->ao);
 
@@ -1669,10 +1674,12 @@ static void remus_devices_preresume_cb(libxl__egc *egc,
     if (rc)
         goto out;
 
-    ok = 1;
+    rc = 0;
 
 out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+    if (rc)
+        dss->rc = rc;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
@@ -1790,6 +1797,7 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss)
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
 
+    dss->rc = 0;
     logdirty_init(&dss->logdirty);
     libxl__xswait_init(&dss->pvcontrol);
     libxl__ev_evtchn_init(&dss->guest_evtchn);
@@ -1877,6 +1885,8 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
                          "domain did not respond to suspend request");
         if ( !dss->guest_responded )
             rc = ERROR_GUEST_TIMEDOUT;
+        else if (dss->rc)
+            rc = dss->rc;
         else
             rc = ERROR_FAIL;
         goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 99db92a..2862c69 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2800,6 +2800,7 @@ struct libxl__domain_suspend_state {
     int debug;
     const libxl_domain_remus_info *remus;
     /* private */
+    int rc;
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
     int hvm;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (4 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 05/29] libxl: suspend: Return correct error from callbacks Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:04   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate Ian Jackson
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Replace the separate timeout and xenstore watch with use of
libxl__xswait*.

Different control flow, but no ultimate functional change apart from
slight changes to the text of error messages.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_device.c   |   64 ++++++++++++------------------------------
 tools/libxl/libxl_internal.h |    2 +-
 2 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 0f50d04..64ee541 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -450,7 +450,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
      * Initialize xs_watch, because it's not used on all possible
      * execution paths, but it's unconditionally destroyed when finished.
      */
-    libxl__ev_xswatch_init(&aodev->xs_watch);
+    libxl__xswait_init(&aodev->xswait);
     aodev->active = 1;
     /* We init this here because we might call device_hotplug_done
      * without actually calling any hotplug script */
@@ -731,13 +731,9 @@ static void device_hotplug_child_death_cb(libxl__egc *egc,
                                           libxl__async_exec_state *aes,
                                           int status);
 
-static void device_destroy_be_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                         const struct timeval *requested_abs);
-
 static void device_destroy_be_watch_cb(libxl__egc *egc,
-                                       libxl__ev_xswatch *watch,
-                                       const char *watch_path,
-                                       const char *event_path);
+                                       libxl__xswait_state *xswait,
+                                       int rc, const char *data);
 
 static void device_hotplug_done(libxl__egc *egc, libxl__ao_device *aodev);
 
@@ -988,22 +984,14 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
         if (aodev->action != LIBXL__DEVICE_ACTION_REMOVE)
             goto out;
 
-        rc = libxl__ev_time_register_rel(gc, &aodev->timeout,
-                                         device_destroy_be_timeout_cb,
-                                         LIBXL_DESTROY_TIMEOUT * 1000);
-        if (rc) {
-            LOG(ERROR, "setup of xs watch timeout failed");
-            goto out;
-        }
-
-        rc = libxl__ev_xswatch_register(gc, &aodev->xs_watch,
-                                        device_destroy_be_watch_cb,
-                                        be_path);
-        if (rc) {
-            LOG(ERROR, "setup of xs watch for %s failed", be_path);
-            libxl__ev_time_deregister(gc, &aodev->timeout);
+        aodev->xswait.ao = ao;
+        aodev->xswait.what = "removal of backend path";
+        aodev->xswait.path = be_path;
+        aodev->xswait.timeout_ms = LIBXL_DESTROY_TIMEOUT * 1000;
+        aodev->xswait.callback = device_destroy_be_watch_cb;
+        rc = libxl__xswait_start(gc, &aodev->xswait);
+        if (rc)
             goto out;
-        }
         return;
     }
 
@@ -1101,37 +1089,21 @@ error:
     device_hotplug_done(egc, aodev);
 }
 
-static void device_destroy_be_timeout_cb(libxl__egc *egc, libxl__ev_time *ev,
-                                         const struct timeval *requested_abs)
-{
-    libxl__ao_device *aodev = CONTAINER_OF(ev, *aodev, timeout);
-    STATE_AO_GC(aodev->ao);
-
-    LOG(ERROR, "timed out while waiting for %s to be removed",
-               libxl__device_backend_path(gc, aodev->dev));
-
-    aodev->rc = ERROR_TIMEDOUT;
-
-    device_hotplug_done(egc, aodev);
-    return;
-}
-
 static void device_destroy_be_watch_cb(libxl__egc *egc,
-                                       libxl__ev_xswatch *watch,
-                                       const char *watch_path,
-                                       const char *event_path)
+                                       libxl__xswait_state *xswait,
+                                       int rc, const char *dir)
 {
-    libxl__ao_device *aodev = CONTAINER_OF(watch, *aodev, xs_watch);
+    libxl__ao_device *aodev = CONTAINER_OF(xswait, *aodev, xswait);
     STATE_AO_GC(aodev->ao);
-    const char *dir;
-    int rc;
 
-    rc = libxl__xs_read_checked(gc, XBT_NULL, watch_path, &dir);
     if (rc) {
-        LOG(ERROR, "unable to read backend path: %s", watch_path);
+        if (rc == ERROR_TIMEDOUT)
+            LOG(ERROR, "timed out while waiting for %s to be removed",
+                xswait->path);
         aodev->rc = rc;
         goto out;
     }
+
     if (dir) {
         /* backend path still exists, wait a little longer... */
         return;
@@ -1164,7 +1136,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
 {
     /* Clean events and check reentrancy */
     libxl__ev_time_deregister(gc, &aodev->timeout);
-    libxl__ev_xswatch_deregister(gc, &aodev->xs_watch);
+    libxl__xswait_stop(gc, &aodev->xswait);
     assert(!libxl__async_exec_inuse(&aodev->aes));
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2862c69..5a76d51 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2152,7 +2152,7 @@ struct libxl__ao_device {
     /* Bodge for Qemu devices */
     libxl__ev_time timeout;
     /* xenstore watch for backend path of driver domains */
-    libxl__ev_xswatch xs_watch;
+    libxl__xswait_state xswait;
     int num_exec;
     /* for calling hotplug scripts */
     libxl__async_exec_state aes;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (5 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:05   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 08/29] libxl: devstate: Use libxl__xswait* Ian Jackson
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Pure code motion.  We are going to make devstate use xswait.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_internal.h |  109 +++++++++++++++++++++---------------------
 1 file changed, 55 insertions(+), 54 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5a76d51..edc33bb 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1090,6 +1090,61 @@ _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
 
 _hidden int libxl__get_domid(libxl__gc *gc, uint32_t *domid);
 
+/*----- xswait: wait for a xenstore node to be suitable -----*/
+
+typedef struct libxl__xswait_state libxl__xswait_state;
+
+/*
+ * rc describes the circumstances of this callback:
+ *
+ * rc==0
+ *
+ *     The xenstore path (may have) changed.  It has been read for
+ *     you.  The result is in data (allocated from the ao gc).
+ *     data may be NULL, which means that the xenstore read gave
+ *     ENOENT.
+ *
+ *     If you are satisfied, you MUST call libxl__xswait_stop.
+ *     Otherwise, xswait will continue waiting and watching and
+ *     will call you back later.
+ *
+ * rc==ERROR_TIMEDOUT
+ *
+ *     The specified timeout was reached.
+ *     This has NOT been logged (except to the debug log).
+ *     xswait will not continue (but calling libxl__xswait_stop is OK).
+ *
+ * rc!=0, !=ERROR_TIMEDOUT
+ *
+ *     Some other error occurred.
+ *     This HAS been logged.
+ *     xswait will not continue (but calling libxl__xswait_stop is OK).
+ *
+ * xswait.path may start with with '@', in which case no read is done
+ * and the callback will always get data==0.
+ */
+typedef void libxl__xswait_callback(libxl__egc *egc,
+      libxl__xswait_state *xswa, int rc, const char *data);
+
+struct libxl__xswait_state {
+    /* caller must fill these in, and they must all remain valid */
+    libxl__ao *ao;
+    const char *what; /* for error msgs: noun phrase, what we're waiting for */
+    const char *path;
+    int timeout_ms; /* as for poll(2) */
+    libxl__xswait_callback *callback;
+    /* remaining fields are private to xswait */
+    libxl__ev_time time_ev;
+    libxl__ev_xswatch watch_ev;
+};
+
+void libxl__xswait_init(libxl__xswait_state*);
+void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/
+bool libxl__xswait_inuse(const libxl__xswait_state *ss);
+
+int libxl__xswait_start(libxl__gc*, libxl__xswait_state*);
+
+
 /*
  * libxl__ev_devstate - waits a given time for a device to
  * reach a given state.  Follows the libxl_ev_* conventions.
@@ -1177,60 +1232,6 @@ _hidden int libxl__create_pci_backend(libxl__gc *gc, uint32_t domid,
                                       libxl_device_pci *pcidev, int num);
 _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid);
 
-/*----- xswait: wait for a xenstore node to be suitable -----*/
-
-typedef struct libxl__xswait_state libxl__xswait_state;
-
-/*
- * rc describes the circumstances of this callback:
- *
- * rc==0
- *
- *     The xenstore path (may have) changed.  It has been read for
- *     you.  The result is in data (allocated from the ao gc).
- *     data may be NULL, which means that the xenstore read gave
- *     ENOENT.
- *
- *     If you are satisfied, you MUST call libxl__xswait_stop.
- *     Otherwise, xswait will continue waiting and watching and
- *     will call you back later.
- *
- * rc==ERROR_TIMEDOUT
- *
- *     The specified timeout was reached.
- *     This has NOT been logged (except to the debug log).
- *     xswait will not continue (but calling libxl__xswait_stop is OK).
- *
- * rc!=0, !=ERROR_TIMEDOUT
- *
- *     Some other error occurred.
- *     This HAS been logged.
- *     xswait will not continue (but calling libxl__xswait_stop is OK).
- *
- * xswait.path may start with with '@', in which case no read is done
- * and the callback will always get data==0.
- */
-typedef void libxl__xswait_callback(libxl__egc *egc,
-      libxl__xswait_state *xswa, int rc, const char *data);
-
-struct libxl__xswait_state {
-    /* caller must fill these in, and they must all remain valid */
-    libxl__ao *ao;
-    const char *what; /* for error msgs: noun phrase, what we're waiting for */
-    const char *path;
-    int timeout_ms; /* as for poll(2) */
-    libxl__xswait_callback *callback;
-    /* remaining fields are private to xswait */
-    libxl__ev_time time_ev;
-    libxl__ev_xswatch watch_ev;
-};
-
-void libxl__xswait_init(libxl__xswait_state*);
-void libxl__xswait_stop(libxl__gc*, libxl__xswait_state*); /*idempotent*/
-bool libxl__xswait_inuse(const libxl__xswait_state *ss);
-
-int libxl__xswait_start(libxl__gc*, libxl__xswait_state*);
-
 /*
  *----- spawn -----
  *
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 08/29] libxl: devstate: Use libxl__xswait*
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (6 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:07   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 09/29] libxl: New error codes CANCELLED etc Ian Jackson
                   ` (23 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_device.c   |    4 +--
 tools/libxl/libxl_event.c    |   78 ++++++++++++++++++------------------------
 tools/libxl/libxl_internal.h |   11 +++---
 3 files changed, 40 insertions(+), 53 deletions(-)

diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 64ee541..0455134 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -758,7 +758,7 @@ void libxl__wait_device_connection(libxl__egc *egc, libxl__ao_device *aodev)
         return;
     }
 
-    rc = libxl__ev_devstate_wait(gc, &aodev->backend_ds,
+    rc = libxl__ev_devstate_wait(ao, &aodev->backend_ds,
                                  device_backend_callback,
                                  state_path, XenbusStateInitWait,
                                  LIBXL_INIT_TIMEOUT * 1000);
@@ -859,7 +859,7 @@ void libxl__initiate_device_remove(libxl__egc *egc,
         if (rc < 0) goto out;
     }
 
-    rc = libxl__ev_devstate_wait(gc, &aodev->backend_ds,
+    rc = libxl__ev_devstate_wait(ao, &aodev->backend_ds,
                                  device_backend_callback,
                                  state_path, XenbusStateClosed,
                                  LIBXL_DESTROY_TIMEOUT * 1000);
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 595da2b..fb6daeb 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -805,68 +805,58 @@ void libxl__ev_evtchn_cancel(libxl__gc *gc, libxl__ev_evtchn *evev)
  * waiting for device state
  */
 
-static void devstate_watch_callback(libxl__egc *egc, libxl__ev_xswatch *watch,
-                                const char *watch_path, const char *event_path)
+static void devstate_callback(libxl__egc *egc, libxl__xswait_state *xsw,
+                              int rc, const char *sstate)
 {
     EGC_GC;
-    libxl__ev_devstate *ds = CONTAINER_OF(watch, *ds, watch);
-    int rc;
+    libxl__ev_devstate *ds = CONTAINER_OF(xsw, *ds, w);
 
-    char *sstate = libxl__xs_read(gc, XBT_NULL, watch_path);
+    if (rc) {
+        if (rc == ERROR_TIMEDOUT)
+            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d "
+                       " timed out", ds->w.path, ds->wanted);
+        goto out;
+    }
     if (!sstate) {
-        if (errno == ENOENT) {
-            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d"
-                       " but it was removed", watch_path, ds->wanted);
-            rc = ERROR_INVAL;
-        } else {
-            LIBXL__LOG_ERRNO(CTX, LIBXL__LOG_ERROR, "backend %s wanted state"
-                             " %d but read failed", watch_path, ds->wanted);
-            rc = ERROR_FAIL;
-        }
+        LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d"
+                   " but it was removed", ds->w.path, ds->wanted);
+        rc = ERROR_INVAL;
+        goto out;
+    }
+
+    int got = atoi(sstate);
+    if (got == ds->wanted) {
+        LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d ok",
+                   ds->w.path, ds->wanted);
+        rc = 0;
     } else {
-        int got = atoi(sstate);
-        if (got == ds->wanted) {
-            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d ok",
-                       watch_path, ds->wanted);
-            rc = 0;
-        } else {
-            LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d"
-                       " still waiting state %d", watch_path, ds->wanted, got);
-            return;
-        }
+        LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d"
+                   " still waiting state %d", ds->w.path, ds->wanted, got);
+        return;
     }
-    libxl__ev_devstate_cancel(gc, ds);
-    ds->callback(egc, ds, rc);
-}
 
-static void devstate_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                             const struct timeval *requested_abs)
-{
-    EGC_GC;
-    libxl__ev_devstate *ds = CONTAINER_OF(ev, *ds, timeout);
-    LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "backend %s wanted state %d "
-               " timed out", ds->watch.path, ds->wanted);
+ out:
     libxl__ev_devstate_cancel(gc, ds);
-    ds->callback(egc, ds, ERROR_TIMEDOUT);
+    ds->callback(egc, ds, rc);
 }
 
-int libxl__ev_devstate_wait(libxl__gc *gc, libxl__ev_devstate *ds,
+int libxl__ev_devstate_wait(libxl__ao *ao, libxl__ev_devstate *ds,
                             libxl__ev_devstate_callback cb,
                             const char *state_path, int state, int milliseconds)
 {
+    AO_GC;
     int rc;
 
-    libxl__ev_time_init(&ds->timeout);
-    libxl__ev_xswatch_init(&ds->watch);
+    libxl__xswait_init(&ds->w);
     ds->wanted = state;
     ds->callback = cb;
 
-    rc = libxl__ev_time_register_rel(gc, &ds->timeout, devstate_timeout,
-                                     milliseconds);
-    if (rc) goto out;
-
-    rc = libxl__ev_xswatch_register(gc, &ds->watch, devstate_watch_callback,
-                                    state_path);
+    ds->w.what = GCSPRINTF("backend %s (hoping for state change to %d)",
+                           state_path, state);
+    ds->w.path = state_path;
+    ds->w.timeout_ms = milliseconds;
+    ds->w.callback = devstate_callback;
+    rc = libxl__xswait_start(gc, &ds->w);
     if (rc) goto out;
 
     return 0;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index edc33bb..6bb208c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1163,24 +1163,21 @@ struct libxl__ev_devstate {
     libxl__ev_devstate_callback *callback;
     /* as for the remainder, read-only public parts may also be
      * read by the caller (notably, watch.path), but only when waiting: */
-    libxl__ev_xswatch watch;
-    libxl__ev_time timeout;
+    libxl__xswait_state w;
 };
 
 static inline void libxl__ev_devstate_init(libxl__ev_devstate *ds)
 {
-    libxl__ev_time_init(&ds->timeout);
-    libxl__ev_xswatch_init(&ds->watch);
+    libxl__xswait_init(&ds->w);
 }
 
 static inline void libxl__ev_devstate_cancel(libxl__gc *gc,
                                              libxl__ev_devstate *ds)
 {
-    libxl__ev_time_deregister(gc,&ds->timeout);
-    libxl__ev_xswatch_deregister(gc,&ds->watch);
+    libxl__xswait_stop(gc,&ds->w);
 }
 
-_hidden int libxl__ev_devstate_wait(libxl__gc *gc, libxl__ev_devstate *ds,
+_hidden int libxl__ev_devstate_wait(libxl__ao *ao, libxl__ev_devstate *ds,
                                     libxl__ev_devstate_callback cb,
                                     const char *state_path,
                                     int state, int milliseconds);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 09/29] libxl: New error codes CANCELLED etc.
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (7 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 08/29] libxl: devstate: Use libxl__xswait* Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:08   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc Ian Jackson
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

We introduce ERROR_CANCELLED now, so that we can write code to handle
it, and decreee that functions might return it, even though currently
there is nowhere where this error is generated.

While we're here, provide ERROR_NOTFOUND and ERROR_NOTIMPLEMENTED,
which will also be used later, but only as part of the public API.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Rebase means new errors have bigger (more negative) numbers.
---
 tools/libxl/libxl_types.idl |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 02be466..193f22a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -63,6 +63,9 @@ libxl_error = Enumeration("error", [
     (-17, "DEVICE_EXISTS"),
     (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
     (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+    (-20, "CANCELLED"),
+    (-21, "NOTFOUND"),
+    (-22, "NOTIMPLEMENTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (8 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 09/29] libxl: New error codes CANCELLED etc Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-02-11  1:04   ` Wen Congyang
  2015-03-24 11:09   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc Ian Jackson
                   ` (21 subsequent siblings)
  31 siblings, 2 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, Yang Hongyang, Euan Harris, Lai Jiangshan, Wen Congyang

Change the timeout setup functions to take a libxl__ao, not a
libxl__gc.  This is going to be needed for ao cancellation, because
timeouts are going to be a main hook for ao cancellation - so the
timeouts need to be associated with an ao.

This means that timeouts can only occur as part of a long-running
libxl function (but this is of course correct, as libxl shouldn't have
any global timeouts, and indeed all the call sites have an ao).

Also remove the gc parameter from libxl__async_exec_start.  It can
just use the gc from the ao supplied in the aes.

All the callers follow the obvious patterns and therefore supply the
ao's gc to libxl__async_exec_start and the timeout setup functions.
There is therefore no functional change in this patch.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Yang Hongyang <yanghy@cn.fujitsu.com>
CC: Wen Congyang <wency@cn.fujitsu.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
---
v2: This patch split off from "Permit timeouts to signal cancellation".
    Rebased; consequently, deal with libxl__async_exec_start.
    CC'd authors of the libxl__async_exec_* functions.
---
 tools/libxl/libxl_aoutils.c         |    8 +++++---
 tools/libxl/libxl_device.c          |    4 ++--
 tools/libxl/libxl_dom.c             |    8 ++++----
 tools/libxl/libxl_event.c           |    6 ++++--
 tools/libxl/libxl_internal.h        |    6 +++---
 tools/libxl/libxl_remus_disk_drbd.c |    2 +-
 tools/libxl/libxl_test_timedereg.c  |    9 +++++----
 7 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 44dc222..754e2d1 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -46,7 +46,7 @@ int libxl__xswait_start(libxl__gc *gc, libxl__xswait_state *xswa)
 {
     int rc;
 
-    rc = libxl__ev_time_register_rel(gc, &xswa->time_ev,
+    rc = libxl__ev_time_register_rel(xswa->ao, &xswa->time_ev,
                                      xswait_timeout_callback, xswa->timeout_ms);
     if (rc) goto err;
 
@@ -496,16 +496,18 @@ void libxl__async_exec_init(libxl__async_exec_state *aes)
     libxl__ev_child_init(&aes->child);
 }
 
-int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes)
+int libxl__async_exec_start(libxl__async_exec_state *aes)
 {
     pid_t pid;
 
     /* Convenience aliases */
+    libxl__ao *ao = aes->ao;
+    AO_GC;
     libxl__ev_child *const child = &aes->child;
     char ** const args = aes->args;
 
     /* Set execution timeout */
-    if (libxl__ev_time_register_rel(gc, &aes->time,
+    if (libxl__ev_time_register_rel(ao, &aes->time,
                                     async_exec_timeout,
                                     aes->timeout_ms)) {
         LOG(ERROR, "unable to register timeout for executing: %s", aes->what);
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 0455134..c80749f 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -808,7 +808,7 @@ void libxl__initiate_device_remove(libxl__egc *egc,
              * TODO: 4.2 Bodge due to QEMU, see comment on top of
              * libxl__initiate_device_remove in libxl_internal.h
              */
-            rc = libxl__ev_time_register_rel(gc, &aodev->timeout,
+            rc = libxl__ev_time_register_rel(ao, &aodev->timeout,
                                              device_qemu_timeout,
                                              LIBXL_QEMU_BODGE_TIMEOUT * 1000);
             if (rc) {
@@ -1034,7 +1034,7 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
     aes->stdfds[1] = 2;
     aes->stdfds[2] = -1;
 
-    rc = libxl__async_exec_start(gc, aes);
+    rc = libxl__async_exec_start(aes);
     if (rc)
         goto out;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 90877d6..e292cb3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -980,7 +980,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
                                 switch_logdirty_xswatch, lds->ret_path);
     if (rc) goto out;
 
-    rc = libxl__ev_time_register_rel(gc, &lds->timeout,
+    rc = libxl__ev_time_register_rel(ao, &lds->timeout,
                                 switch_logdirty_timeout, 10*1000);
     if (rc) goto out;
 
@@ -1260,7 +1260,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
         rc = libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
         if (rc) goto err;
 
-        rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+        rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
                                          suspend_common_wait_guest_timeout,
                                          60*1000);
         if (rc) goto err;
@@ -1391,7 +1391,7 @@ static void domain_suspend_common_wait_guest(libxl__egc *egc,
                                     "@releaseDomain");
     if (rc) goto err;
 
-    rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
+    rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
                                      suspend_common_wait_guest_timeout,
                                      60*1000);
     if (rc) goto err;
@@ -1751,7 +1751,7 @@ static void remus_devices_commit_cb(libxl__egc *egc,
      */
 
     /* Set checkpoint interval timeout */
-    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+    rc = libxl__ev_time_register_rel(ao, &dss->checkpoint_timeout,
                                      remus_next_checkpoint,
                                      dss->interval);
 
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index fb6daeb..1a97cf8 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -309,10 +309,11 @@ static void time_done_debug(libxl__gc *gc, const char *func,
 #endif
 }
 
-int libxl__ev_time_register_abs(libxl__gc *gc, libxl__ev_time *ev,
+int libxl__ev_time_register_abs(libxl__ao *ao, libxl__ev_time *ev,
                                 libxl__ev_time_callback *func,
                                 struct timeval absolute)
 {
+    AO_GC;
     int rc;
 
     CTX_LOCK;
@@ -333,10 +334,11 @@ int libxl__ev_time_register_abs(libxl__gc *gc, libxl__ev_time *ev,
 }
 
 
-int libxl__ev_time_register_rel(libxl__gc *gc, libxl__ev_time *ev,
+int libxl__ev_time_register_rel(libxl__ao *ao, libxl__ev_time *ev,
                                 libxl__ev_time_callback *func,
                                 int milliseconds /* as for poll(2) */)
 {
+    AO_GC;
     struct timeval absolute;
     int rc;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6bb208c..b615fc5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -770,10 +770,10 @@ static inline void libxl__ev_fd_init(libxl__ev_fd *efd)
 static inline int libxl__ev_fd_isregistered(const libxl__ev_fd *efd)
                     { return efd->fd >= 0; }
 
-_hidden int libxl__ev_time_register_rel(libxl__gc*, libxl__ev_time *ev_out,
+_hidden int libxl__ev_time_register_rel(libxl__ao*, libxl__ev_time *ev_out,
                                         libxl__ev_time_callback*,
                                         int milliseconds /* as for poll(2) */);
-_hidden int libxl__ev_time_register_abs(libxl__gc*, libxl__ev_time *ev_out,
+_hidden int libxl__ev_time_register_abs(libxl__ao*, libxl__ev_time *ev_out,
                                         libxl__ev_time_callback*,
                                         struct timeval);
 _hidden int libxl__ev_time_modify_rel(libxl__gc*, libxl__ev_time *ev,
@@ -2108,7 +2108,7 @@ struct libxl__async_exec_state {
 };
 
 void libxl__async_exec_init(libxl__async_exec_state *aes);
-int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes);
+int libxl__async_exec_start(libxl__async_exec_state *aes);
 bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
 
 /*----- device addition/removal -----*/
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index afe9b61..5e0c9a6 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -120,7 +120,7 @@ static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
     aes->stdfds[1] = -1;
     aes->stdfds[2] = -1;
 
-    rc = libxl__async_exec_start(gc, aes);
+    rc = libxl__async_exec_start(aes);
     if (rc)
         goto out;
 
diff --git a/tools/libxl/libxl_test_timedereg.c b/tools/libxl/libxl_test_timedereg.c
index a44639f..e2cc27d 100644
--- a/tools/libxl/libxl_test_timedereg.c
+++ b/tools/libxl/libxl_test_timedereg.c
@@ -30,12 +30,13 @@ static int seq;
 static void occurs(libxl__egc *egc, libxl__ev_time *ev,
                    const struct timeval *requested_abs);
 
-static void regs(libxl__gc *gc, int j)
+static void regs(libxl__ao *ao, int j)
 {
+    AO_GC;
     int rc, i;
     LOG(DEBUG,"regs(%d)", j);
     for (i=0; i<NTIMES; i++) {
-        rc = libxl__ev_time_register_rel(gc, &et[j][i], occurs, ms[j][i]);
+        rc = libxl__ev_time_register_rel(ao, &et[j][i], occurs, ms[j][i]);
         assert(!rc);
     }    
 }
@@ -52,7 +53,7 @@ int libxl_test_timedereg(libxl_ctx *ctx, libxl_asyncop_how *ao_how)
         libxl__ev_time_init(&et[1][i]);
     }
 
-    regs(gc, 0);
+    regs(ao, 0);
 
     return AO_INPROGRESS;
 }
@@ -71,7 +72,7 @@ static void occurs(libxl__egc *egc, libxl__ev_time *ev,
         assert(ev == &et[0][1]);
         libxl__ev_time_deregister(gc, &et[0][0]);
         libxl__ev_time_deregister(gc, &et[0][2]);
-        regs(gc, 1);
+        regs(tao, 1);
         libxl__ev_time_deregister(gc, &et[0][1]);
         break;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (9 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:20   ` Ian Campbell
  2015-02-10 20:09 ` [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation Ian Jackson
                   ` (20 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

The internal user of libxl__async_exec_start et al now gets an rc as
well as the process's exit status.

For now this is always either 0 or ERROR_FAIL, but with ao
cancellation this will possibly be CANCELLED or TIMEDOUT too.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New patch due to rebause; v1 had changes to device_hotplug_*
     scripts instead.
    Callback now gets unambiguous information about error situation:
     previously, if only thing that went wrong was that child died
     badly, rc would be FAILED, which was unambigously; now rc=0.
    Add a comment document the meaning of the rc and status parameters
     to the callback.
---
 tools/libxl/libxl_aoutils.c         |    9 ++++++---
 tools/libxl/libxl_device.c          |   13 +++++++++----
 tools/libxl/libxl_internal.h        |   11 ++++++++++-
 tools/libxl/libxl_netbuffer.c       |   19 ++++++++++---------
 tools/libxl/libxl_remus_disk_drbd.c |    8 +++++---
 5 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 754e2d1..891cdb8 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -483,11 +483,12 @@ static void async_exec_done(libxl__egc *egc,
     libxl__ev_time_deregister(gc, &aes->time);
 
     if (status) {
-        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
-                                      aes->what, pid, status);
+        if (!aes->rc)
+            libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                          aes->what, pid, status);
     }
 
-    aes->callback(egc, aes, status);
+    aes->callback(egc, aes, aes->rc, status);
 }
 
 void libxl__async_exec_init(libxl__async_exec_state *aes)
@@ -506,6 +507,8 @@ int libxl__async_exec_start(libxl__async_exec_state *aes)
     libxl__ev_child *const child = &aes->child;
     char ** const args = aes->args;
 
+    aes->rc = 0;
+
     /* Set execution timeout */
     if (libxl__ev_time_register_rel(ao, &aes->time,
                                     async_exec_timeout,
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index c80749f..84114ff 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -729,7 +729,7 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev);
 
 static void device_hotplug_child_death_cb(libxl__egc *egc,
                                           libxl__async_exec_state *aes,
-                                          int status);
+                                          int rc, int status);
 
 static void device_destroy_be_watch_cb(libxl__egc *egc,
                                        libxl__xswait_state *xswait,
@@ -1052,7 +1052,7 @@ out:
 
 static void device_hotplug_child_death_cb(libxl__egc *egc,
                                           libxl__async_exec_state *aes,
-                                          int status)
+                                          int rc, int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
     STATE_AO_GC(aodev->ao);
@@ -1061,12 +1061,17 @@ static void device_hotplug_child_death_cb(libxl__egc *egc,
 
     device_hotplug_clean(gc, aodev);
 
-    if (status) {
+    if (status && !rc) {
         hotplug_error = libxl__xs_read(gc, XBT_NULL,
                                        GCSPRINTF("%s/hotplug-error", be_path));
         if (hotplug_error)
             LOG(ERROR, "script: %s", hotplug_error);
-        aodev->rc = ERROR_FAIL;
+        rc = ERROR_FAIL;
+    }
+
+    if (rc) {
+        if (!aodev->rc)
+            aodev->rc = rc;
         if (aodev->action == LIBXL__DEVICE_ACTION_ADD)
             /*
              * Only fail on device connection, on disconnection
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b615fc5..02cac7b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2089,7 +2089,15 @@ _hidden const char *libxl__run_dir_path(void);
 typedef struct libxl__async_exec_state libxl__async_exec_state;
 
 typedef void libxl__async_exec_callback(libxl__egc *egc,
-                        libxl__async_exec_state *aes, int status);
+                        libxl__async_exec_state *aes, int rc, int status);
+/*
+ * Meaning of status and rc:
+ *  rc==0, status==0    all went well
+ *  rc==0, status!=0    everything OK except child exited nonzero (logged)
+ *  rc!=0               something else went wrong (status is real
+ *                       exit status, maybe reflecting SIGKILL if aes
+ *                       code killed the child).  Logged unless CANCELLED.
+ */
 
 struct libxl__async_exec_state {
     /* caller must fill these in */
@@ -2105,6 +2113,7 @@ struct libxl__async_exec_state {
     /* private */
     libxl__ev_time time;
     libxl__ev_child child;
+    int rc;
 };
 
 void libxl__async_exec_init(libxl__async_exec_state *aes);
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index edc6843..ff2d6c7 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -219,10 +219,10 @@ out:
 
 static void netbuf_setup_script_cb(libxl__egc *egc,
                                    libxl__async_exec_state *aes,
-                                   int status);
+                                   int rc, int status);
 static void netbuf_teardown_script_cb(libxl__egc *egc,
                                       libxl__async_exec_state *aes,
-                                      int status);
+                                      int rc, int status);
 
 /*
  * the script needs the following env & args
@@ -327,14 +327,13 @@ out:
  */
 static void netbuf_setup_script_cb(libxl__egc *egc,
                                    libxl__async_exec_state *aes,
-                                   int status)
+                                   int rc, int status)
 {
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
     libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
     libxl__remus_devices_state *rds = dev->rds;
     const char *out_path_base, *hotplug_error = NULL;
-    int rc;
 
     STATE_AO_GC(rds->ao);
 
@@ -344,6 +343,11 @@ static void netbuf_setup_script_cb(libxl__egc *egc,
     const char *const vif = remus_nic->vif;
     const char **const ifb = &remus_nic->ifb;
 
+    if (status && !rc)
+        rc = ERROR_FAIL;
+    if (rc)
+        goto out;
+
     /*
      * we need to get ifb first because it's needed for teardown
      */
@@ -411,17 +415,14 @@ out:
 
 static void netbuf_teardown_script_cb(libxl__egc *egc,
                                       libxl__async_exec_state *aes,
-                                      int status)
+                                      int rc, int status)
 {
-    int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
     libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_device_nic *remus_nic = dev->concrete_data;
 
-    if (status)
+    if (status && !rc)
         rc = ERROR_FAIL;
-    else
-        rc = 0;
 
     free_qdisc(remus_nic);
 
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
index 5e0c9a6..fc76b89 100644
--- a/tools/libxl/libxl_remus_disk_drbd.c
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -78,7 +78,7 @@ out:
 /* callbacks */
 static void match_async_exec_cb(libxl__egc *egc,
                                 libxl__async_exec_state *aes,
-                                int status);
+                                int rc, int status);
 
 /* implementations */
 
@@ -133,9 +133,8 @@ out:
 
 static void match_async_exec_cb(libxl__egc *egc,
                                 libxl__async_exec_state *aes,
-                                int status)
+                                int rc, int status)
 {
-    int rc;
     libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
     libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
     libxl__remus_drbd_disk *drbd_disk;
@@ -143,6 +142,9 @@ static void match_async_exec_cb(libxl__egc *egc,
 
     STATE_AO_GC(aodev->ao);
 
+    if (rc)
+        goto out;
+
     if (status) {
         rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
         /* BUG: seems to assume that any exit status means `no match' */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (10 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc Ian Jackson
@ 2015-02-10 20:09 ` Ian Jackson
  2015-03-24 11:22   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 13/29] libxl: domain create: Do not destroy on cancellation Ian Jackson
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

The callback functions provided by users must take an rc value.  This
rc value can be ERROR_TIMEDOUT or ERROR_CANCELLED.

Users of xswait are now expected to deal correctly with
ERROR_CANCELLED.  If they experience this, it hasn't been logged.
And the caller won't log it either since it's not TIMEDOUT.
Luckily this is correct, so we can just change the doc comment.

Currently nothing generates ERROR_CANCELLED; in particular the
timeouts cannot in fact signal cancellation.

There should be no publicly visible change except that some error
returns from libxl will change from ERROR_FAIL to ERROR_TIMEDOUT, and
some changes to debugging messages.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_aoutils.c        |   11 ++++++++---
 tools/libxl/libxl_device.c         |    8 +++++---
 tools/libxl/libxl_dom.c            |   29 ++++++++++++++++++++---------
 tools/libxl/libxl_event.c          |    8 ++++----
 tools/libxl/libxl_internal.h       |   12 +++++++-----
 tools/libxl/libxl_test_timedereg.c |    8 +++++---
 6 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 891cdb8..0b6d750 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -80,12 +80,13 @@ void xswait_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *xsw,
 }
 
 void xswait_timeout_callback(libxl__egc *egc, libxl__ev_time *ev,
-                             const struct timeval *requested_abs)
+                             const struct timeval *requested_abs,
+                             int rc)
 {
     EGC_GC;
     libxl__xswait_state *xswa = CONTAINER_OF(ev, *xswa, time_ev);
     LOG(DEBUG, "%s: xswait timeout (path=%s)", xswa->what, xswa->path);
-    xswait_report_error(egc, xswa, ERROR_TIMEDOUT);
+    xswait_report_error(egc, xswa, rc);
 }
 
 static void xswait_report_error(libxl__egc *egc, libxl__xswait_state *xswa,
@@ -455,11 +456,15 @@ int libxl__openptys(libxl__openpty_state *op,
 
 static void async_exec_timeout(libxl__egc *egc,
                                libxl__ev_time *ev,
-                               const struct timeval *requested_abs)
+                               const struct timeval *requested_abs,
+                               int rc)
 {
     libxl__async_exec_state *aes = CONTAINER_OF(ev, *aes, time);
     STATE_AO_GC(aes->ao);
 
+    if (!aes->rc)
+        aes->rc = rc;
+
     libxl__ev_time_deregister(gc, &aes->time);
 
     assert(libxl__ev_child_inuse(&aes->child));
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 84114ff..3b1c3b2 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -717,7 +717,7 @@ out:
 
 /* This callback is part of the Qemu devices Badge */
 static void device_qemu_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                const struct timeval *requested_abs);
+                                const struct timeval *requested_abs, int rc);
 
 static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds,
                                    int rc);
@@ -880,7 +880,7 @@ out:
 }
 
 static void device_qemu_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                const struct timeval *requested_abs)
+                                const struct timeval *requested_abs, int rc)
 {
     libxl__ao_device *aodev = CONTAINER_OF(ev, *aodev, timeout);
     STATE_AO_GC(aodev->ao);
@@ -888,7 +888,9 @@ static void device_qemu_timeout(libxl__egc *egc, libxl__ev_time *ev,
     char *state_path = GCSPRINTF("%s/state", be_path);
     const char *xs_state;
     xs_transaction_t t = 0;
-    int rc = 0;
+
+    if (rc != ERROR_TIMEDOUT)
+        goto out;
 
     libxl__ev_time_deregister(gc, &aodev->timeout);
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e292cb3..87b5fdd 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -943,7 +943,8 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
  */
 
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                    const struct timeval *requested_abs);
+                                    const struct timeval *requested_abs,
+                                    int rc);
 static void switch_logdirty_xswatch(libxl__egc *egc, libxl__ev_xswatch*,
                             const char *watch_path, const char *event_path);
 static void switch_logdirty_done(libxl__egc *egc,
@@ -1069,7 +1070,8 @@ void libxl__domain_suspend_common_switch_qemu_logdirty
     }
 }
 static void switch_logdirty_timeout(libxl__egc *egc, libxl__ev_time *ev,
-                                    const struct timeval *requested_abs)
+                                    const struct timeval *requested_abs,
+                                    int rc)
 {
     libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, logdirty.timeout);
     STATE_AO_GC(dss->ao);
@@ -1218,7 +1220,7 @@ static void suspend_common_wait_guest_watch(libxl__egc *egc,
 static void suspend_common_wait_guest_check(libxl__egc *egc,
         libxl__domain_suspend_state *dss);
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
-      libxl__ev_time *ev, const struct timeval *requested_abs);
+      libxl__ev_time *ev, const struct timeval *requested_abs, int rc);
 
 static void domain_suspend_common_done(libxl__egc *egc,
                                        libxl__domain_suspend_state *dss,
@@ -1452,12 +1454,15 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
-      libxl__ev_time *ev, const struct timeval *requested_abs)
+      libxl__ev_time *ev, const struct timeval *requested_abs, int rc)
 {
     libxl__domain_suspend_state *dss = CONTAINER_OF(ev, *dss, guest_timeout);
     STATE_AO_GC(dss->ao);
-    LOG(ERROR, "guest did not suspend, timed out");
-    domain_suspend_common_done(egc, dss, ERROR_GUEST_TIMEDOUT);
+    if (rc == ERROR_TIMEDOUT) {
+        LOG(ERROR, "guest did not suspend, timed out");
+        rc = ERROR_GUEST_TIMEDOUT;
+    }
+    domain_suspend_common_done(egc, dss, rc);
 }
 
 static void domain_suspend_common_guest_suspended(libxl__egc *egc,
@@ -1690,7 +1695,8 @@ static void remus_devices_commit_cb(libxl__egc *egc,
                                     libxl__remus_devices_state *rds,
                                     int rc);
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs);
+                                  const struct timeval *requested_abs,
+                                  int rc);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1765,7 +1771,8 @@ out:
 }
 
 static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
-                                  const struct timeval *requested_abs)
+                                  const struct timeval *requested_abs,
+                                  int rc)
 {
     libxl__domain_suspend_state *dss =
                             CONTAINER_OF(ev, *dss, checkpoint_timeout);
@@ -1777,7 +1784,11 @@ static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
      * (xc_domain_save.c). in order to continue executing the infinite loop
      * (suspend, checkpoint, resume) in xc_domain_save().
      */
-    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
+
+    if (rc)
+        dss->rc = rc;
+
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, !rc);
 }
 
 /*----- main code for suspending, in order of execution -----*/
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 1a97cf8..7b4b141 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -383,7 +383,7 @@ void libxl__ev_time_deregister(libxl__gc *gc, libxl__ev_time *ev)
     return;
 }
 
-static void time_occurs(libxl__egc *egc, libxl__ev_time *etime)
+static void time_occurs(libxl__egc *egc, libxl__ev_time *etime, int rc)
 {
     DBG("ev_time=%p occurs abs=%lu.%06lu",
         etime, (unsigned long)etime->abs.tv_sec,
@@ -391,7 +391,7 @@ static void time_occurs(libxl__egc *egc, libxl__ev_time *etime)
 
     libxl__ev_time_callback *func = etime->func;
     etime->func = 0;
-    func(egc, etime, &etime->abs);
+    func(egc, etime, &etime->abs, rc);
 }
 
 
@@ -1187,7 +1187,7 @@ static void afterpoll_internal(libxl__egc *egc, libxl__poller *poller,
 
         time_deregister(gc, etime);
 
-        time_occurs(egc, etime);
+        time_occurs(egc, etime, ERROR_TIMEDOUT);
     }
 }
 
@@ -1271,7 +1271,7 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
 
     LIBXL_TAILQ_REMOVE(&CTX->etimes, ev, entry);
 
-    time_occurs(egc, ev);
+    time_occurs(egc, ev, ERROR_TIMEDOUT);
 
  out:
     CTX_UNLOCK;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 02cac7b..accbab8 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -175,7 +175,8 @@ struct libxl__ev_fd {
 
 typedef struct libxl__ev_time libxl__ev_time;
 typedef void libxl__ev_time_callback(libxl__egc *egc, libxl__ev_time *ev,
-                                     const struct timeval *requested_abs);
+                                     const struct timeval *requested_abs,
+                                     int rc); /* TIMEDOUT or CANCELLED */
 struct libxl__ev_time {
     /* caller should include this in their own struct */
     /* read-only for caller, who may read only when registered: */
@@ -1108,13 +1109,13 @@ typedef struct libxl__xswait_state libxl__xswait_state;
  *     Otherwise, xswait will continue waiting and watching and
  *     will call you back later.
  *
- * rc==ERROR_TIMEDOUT
+ * rc==ERROR_TIMEDOUT, rc==ERROR_CANCELLED
  *
  *     The specified timeout was reached.
  *     This has NOT been logged (except to the debug log).
  *     xswait will not continue (but calling libxl__xswait_stop is OK).
  *
- * rc!=0, !=ERROR_TIMEDOUT
+ * rc!=0, !=ERROR_TIMEDOUT, !=ERROR_CANCELLED
  *
  *     Some other error occurred.
  *     This HAS been logged.
@@ -1154,8 +1155,9 @@ int libxl__xswait_start(libxl__gc*, libxl__xswait_state*);
 typedef struct libxl__ev_devstate libxl__ev_devstate;
 typedef void libxl__ev_devstate_callback(libxl__egc *egc, libxl__ev_devstate*,
                                          int rc);
-  /* rc will be 0, ERROR_TIMEDOUT, ERROR_INVAL (meaning path was removed),
-   * or ERROR_FAIL if other stuff went wrong (in which latter case, logged) */
+  /* rc will be 0, ERROR_TIMEDOUT, ERROR_CANCELLED, ERROR_INVAL
+   * (meaning path was removed), or ERROR_FAIL if other stuff went
+   * wrong (in which latter case, logged) */
 
 struct libxl__ev_devstate {
     /* read-only for caller, who may read only when waiting: */
diff --git a/tools/libxl/libxl_test_timedereg.c b/tools/libxl/libxl_test_timedereg.c
index e2cc27d..c464663 100644
--- a/tools/libxl/libxl_test_timedereg.c
+++ b/tools/libxl/libxl_test_timedereg.c
@@ -28,7 +28,7 @@ static libxl__ao *tao;
 static int seq;
 
 static void occurs(libxl__egc *egc, libxl__ev_time *ev,
-                   const struct timeval *requested_abs);
+                   const struct timeval *requested_abs, int rc);
 
 static void regs(libxl__ao *ao, int j)
 {
@@ -59,13 +59,15 @@ int libxl_test_timedereg(libxl_ctx *ctx, libxl_asyncop_how *ao_how)
 }
 
 static void occurs(libxl__egc *egc, libxl__ev_time *ev,
-                   const struct timeval *requested_abs)
+                   const struct timeval *requested_abs, int rc)
 {
     EGC_GC;
     int i;
 
     int off = ev - &et[0][0];
-    LOG(DEBUG,"occurs[%d][%d] seq=%d", off/NTIMES, off%NTIMES, seq);
+    LOG(DEBUG,"occurs[%d][%d] seq=%d rc=%d", off/NTIMES, off%NTIMES, seq, rc);
+
+    assert(!rc);
 
     switch (seq) {
     case 0:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 13/29] libxl: domain create: Do not destroy on cancellation
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (11 preceding siblings ...)
  2015-02-10 20:09 ` [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:24   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao Ian Jackson
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

If we cancelled the domain creation, do not try to tear it down again
Document this.

This is a backwards-compatible API change since old libxl users will
never cancel any operations.

In the current code, there is no functional change, because
ERROR_CANCELLED is never generated anywhere yet.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl.h        |    4 ++++
 tools/libxl/libxl_create.c |    6 ++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c219f59..9385e82 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -914,6 +914,10 @@ int libxl_ctx_free(libxl_ctx *ctx /* 0 is OK */);
 
 /* domain related functions */
 
+/* If the result is ERROR_CANCELLED, the domain may or may not exist
+ * (in a half-created state).  *domid will be valid and will be the
+ * domain id, or -1, as appropriate */
+
 int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
                             uint32_t *domid,
                             const libxl_asyncop_how *ao_how,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 98687bd..f12ed72 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1446,7 +1446,9 @@ static void domcreate_complete(libxl__egc *egc,
     if (!rc && d_config->b_info.exec_ssidref)
         rc = xc_flask_relabel_domain(CTX->xch, dcs->guest_domid, d_config->b_info.exec_ssidref);
 
-    if (!rc) {
+    bool retain_domain = !rc || rc == ERROR_CANCELLED;
+
+    if (retain_domain) {
         libxl__domain_userdata_lock *lock;
 
         /* Note that we hold CTX lock at this point so only need to
@@ -1465,7 +1467,7 @@ static void domcreate_complete(libxl__egc *egc,
 
     libxl_domain_config_dispose(d_config_saved);
 
-    if (rc) {
+    if (!retain_domain) {
         if (dcs->guest_domid) {
             dcs->dds.ao = ao;
             dcs->dds.domid = dcs->guest_domid;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (12 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 13/29] libxl: domain create: Do not destroy on cancellation Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:26   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 15/29] libxl: ao: Count the nested progeny of an ao Ian Jackson
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

This will be used by the cancellation machinery.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    |   25 +++++++++++++++----------
 tools/libxl/libxl_internal.h |    3 ++-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 7b4b141..1385a94 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -31,6 +31,9 @@
 #define DBG(args, ...) LIBXL__DBG_LOG(CTX, args, __VA_ARGS__)
 
 
+static libxl__ao *ao_nested_root(libxl__ao *ao);
+
+
 /*
  * The counter osevent_in_hook is used to ensure that the application
  * honours the reentrancy restriction documented in libxl_event.h.
@@ -1758,7 +1761,7 @@ void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc)
     LOG(DEBUG,"ao %p: complete, rc=%d",ao,rc);
     assert(ao->magic == LIBXL__AO_MAGIC);
     assert(!ao->complete);
-    assert(!ao->nested);
+    assert(!ao->nested_root);
     ao->complete = 1;
     ao->rc = rc;
 
@@ -1929,7 +1932,7 @@ void libxl__ao_progress_report(libxl__egc *egc, libxl__ao *ao,
         const libxl_asyncprogress_how *how, libxl_event *ev)
 {
     AO_GC;
-    assert(!ao->nested);
+    assert(!ao->nested_root);
     if (how->callback == dummy_asyncprogress_callback_ignore) {
         LOG(DEBUG,"ao %p: progress report: ignored",ao);
         libxl_event_free(CTX,ev);
@@ -1952,21 +1955,23 @@ void libxl__ao_progress_report(libxl__egc *egc, libxl__ao *ao,
 
 /* nested ao */
 
+static libxl__ao *ao_nested_root(libxl__ao *ao) {
+    libxl__ao *root = ao->nested_root ? : ao;
+    assert(!root->nested_root);
+    return root;
+}
+
 _hidden libxl__ao *libxl__nested_ao_create(libxl__ao *parent)
 {
-    /* We only use the parent to get the ctx.  However, we require the
-     * caller to provide us with an ao, not just a ctx, to prove that
-     * they are already in an asynchronous operation.  That will avoid
-     * people using this to (for example) make an ao in a non-ao_how
-     * function somewhere in the middle of libxl. */
-    libxl__ao *child = NULL;
+    libxl__ao *child = NULL, *root;
     libxl_ctx *ctx = libxl__gc_owner(&parent->gc);
 
     assert(parent->magic == LIBXL__AO_MAGIC);
+    root = ao_nested_root(parent);
 
     child = libxl__zalloc(&ctx->nogc_gc, sizeof(*child));
     child->magic = LIBXL__AO_MAGIC;
-    child->nested = 1;
+    child->nested_root = root;
     LIBXL_INIT_GC(child->gc, ctx);
     libxl__gc *gc = &child->gc;
 
@@ -1977,7 +1982,7 @@ _hidden libxl__ao *libxl__nested_ao_create(libxl__ao *parent)
 _hidden void libxl__nested_ao_free(libxl__ao *child)
 {
     assert(child->magic == LIBXL__AO_MAGIC);
-    assert(child->nested);
+    assert(child->nested_root);
     libxl_ctx *ctx = libxl__gc_owner(&child->gc);
     libxl__ao__destroy(ctx, child);
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index accbab8..fe5c94f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -448,7 +448,8 @@ struct libxl__ao {
      * only in libxl__ao_complete.)
      */
     uint32_t magic;
-    unsigned constructing:1, in_initiator:1, complete:1, notified:1, nested:1;
+    unsigned constructing:1, in_initiator:1, complete:1, notified:1;
+    libxl__ao *nested_root;
     int progress_reports_outstanding;
     int rc;
     libxl__gc gc;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 15/29] libxl: ao: Count the nested progeny of an ao
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (13 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:27   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 16/29] libxl: ao: Provide manip_refcnt Ian Jackson
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

This will detect any "escaped" nested aos.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    |    8 +++++++-
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 1385a94..37c3d2f 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1762,6 +1762,7 @@ void libxl__ao_complete(libxl__egc *egc, libxl__ao *ao, int rc)
     assert(ao->magic == LIBXL__AO_MAGIC);
     assert(!ao->complete);
     assert(!ao->nested_root);
+    assert(!ao->nested_progeny);
     ao->complete = 1;
     ao->rc = rc;
 
@@ -1972,6 +1973,8 @@ _hidden libxl__ao *libxl__nested_ao_create(libxl__ao *parent)
     child = libxl__zalloc(&ctx->nogc_gc, sizeof(*child));
     child->magic = LIBXL__AO_MAGIC;
     child->nested_root = root;
+    assert(root->nested_progeny < INT_MAX);
+    root->nested_progeny++;
     LIBXL_INIT_GC(child->gc, ctx);
     libxl__gc *gc = &child->gc;
 
@@ -1982,7 +1985,10 @@ _hidden libxl__ao *libxl__nested_ao_create(libxl__ao *parent)
 _hidden void libxl__nested_ao_free(libxl__ao *child)
 {
     assert(child->magic == LIBXL__AO_MAGIC);
-    assert(child->nested_root);
+    libxl__ao *root = child->nested_root;
+    assert(root);
+    assert(root->nested_progeny > 0);
+    root->nested_progeny--;
     libxl_ctx *ctx = libxl__gc_owner(&child->gc);
     libxl__ao__destroy(ctx, child);
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fe5c94f..e29db43 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -450,6 +450,7 @@ struct libxl__ao {
     uint32_t magic;
     unsigned constructing:1, in_initiator:1, complete:1, notified:1;
     libxl__ao *nested_root;
+    int nested_progeny;
     int progress_reports_outstanding;
     int rc;
     libxl__gc gc;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 16/29] libxl: ao: Provide manip_refcnt
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (14 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 15/29] libxl: ao: Count the nested progeny of an ao Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:32   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API Ian Jackson
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Previously we used in_initiator to stop the ao being freed while we
were still in the initiator function (which would result in the
initiator's call to lixl__ao_inprogress accessing the ao after it had
been freed).

We are going to introduce a new libxl entrypoint which finds, and
operates on, ongoing aos.  This function needs the same protection,
and might even end up running on the same ao multiple times
concurrently.

So do this with reference counting instead, with a new variable
ao->manip_refcnt.

We keep ao->in_initiator because that allows us to keep some useful
asserts about the sequencing of libxl__ao_inprogress, etc.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    |   43 +++++++++++++++++++++++++++++++++---------
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 37c3d2f..80677e0 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -33,6 +33,8 @@
 
 static libxl__ao *ao_nested_root(libxl__ao *ao);
 
+static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao);
+
 
 /*
  * The counter osevent_in_hook is used to ensure that the application
@@ -1344,8 +1346,7 @@ static void egc_run_callbacks(libxl__egc *egc)
         ao->how.callback(CTX, ao->rc, ao->how.u.for_callback);
         CTX_LOCK;
         ao->notified = 1;
-        if (!ao->in_initiator)
-            libxl__ao__destroy(CTX, ao);
+        ao__check_destroy(CTX, ao);
         CTX_UNLOCK;
     }
 }
@@ -1726,6 +1727,33 @@ int libxl_event_wait(libxl_ctx *ctx, libxl_event **event_r,
  *                              - destroy the ao
  */
 
+
+/*
+ * A "manip" is a libxl public function manipulating this ao, which
+ * has a pointer to it.  We have to not destroy it while that's the
+ * case, obviously.
+ */
+static void ao__manip_enter(libxl__ao *ao)
+{
+    assert(ao->manip_refcnt < INT_MAX);
+    ao->manip_refcnt++;
+}
+
+static void ao__manip_leave(libxl_ctx *ctx, libxl__ao *ao)
+{
+    assert(ao->manip_refcnt > 0);
+    ao->manip_refcnt--;
+    ao__check_destroy(ctx, ao);
+}
+
+static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao)
+{
+    if (!ao->manip_refcnt && ao->notified) {
+        assert(ao->complete);
+        libxl__ao__destroy(ctx,ao);
+    }
+}
+
 void libxl__ao__destroy(libxl_ctx *ctx, libxl__ao *ao)
 {
     AO_GC;
@@ -1807,8 +1835,8 @@ void libxl__ao_complete_check_progress_reports(libxl__egc *egc, libxl__ao *ao)
         }
         ao->notified = 1;
     }
-    if (!ao->in_initiator && ao->notified)
-        libxl__ao__destroy(ctx, ao);
+    
+    ao__check_destroy(ctx, ao);
 }
 
 libxl__ao *libxl__ao_create(libxl_ctx *ctx, uint32_t domid,
@@ -1823,6 +1851,7 @@ libxl__ao *libxl__ao_create(libxl_ctx *ctx, uint32_t domid,
     ao->magic = LIBXL__AO_MAGIC;
     ao->constructing = 1;
     ao->in_initiator = 1;
+    ao__manip_enter(ao);
     ao->poller = 0;
     ao->domid = domid;
     LIBXL_INIT_GC(ao->gc, ctx);
@@ -1903,11 +1932,7 @@ int libxl__ao_inprogress(libxl__ao *ao,
     }
 
     ao->in_initiator = 0;
-
-    if (ao->notified) {
-        assert(ao->complete);
-        libxl__ao__destroy(CTX,ao);
-    }
+    ao__manip_leave(CTX, ao);
 
     return rc;
 }
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e29db43..d2c2637 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -449,6 +449,7 @@ struct libxl__ao {
      */
     uint32_t magic;
     unsigned constructing:1, in_initiator:1, complete:1, notified:1;
+    int manip_refcnt;
     libxl__ao *nested_root;
     int nested_progeny;
     int progress_reports_outstanding;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (15 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 16/29] libxl: ao: Provide manip_refcnt Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:45   ` Ian Campbell
  2015-03-24 11:48   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API Ian Jackson
                   ` (14 subsequent siblings)
  31 siblings, 2 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Provide libxl_ao_cancel.

There is machinery to allow an ao to register an interest in its
cancellation, using a libxl__ao_cancellable.

This API is not currently very functional: attempting cancellation it
will always return NOTIMPLEMENTED and have no effect.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: Minor comment improvements
---
 tools/libxl/libxl.c          |    3 ++
 tools/libxl/libxl.h          |   64 ++++++++++++++++++++++
 tools/libxl/libxl_event.c    |  123 ++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |   42 ++++++++++++++-
 4 files changed, 231 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ae0c7e1..193493b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -73,6 +73,8 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
     LIBXL_LIST_INIT(&ctx->evtchns_waiting);
     libxl__ev_fd_init(&ctx->evtchn_efd);
 
+    LIBXL_LIST_INIT(&ctx->aos_inprogress);
+
     LIBXL_TAILQ_INIT(&ctx->death_list);
     libxl__ev_xswatch_init(&ctx->death_watch);
 
@@ -174,6 +176,7 @@ int libxl_ctx_free(libxl_ctx *ctx)
     assert(LIBXL_LIST_EMPTY(&ctx->efds));
     assert(LIBXL_TAILQ_EMPTY(&ctx->etimes));
     assert(LIBXL_LIST_EMPTY(&ctx->evtchns_waiting));
+    assert(LIBXL_LIST_EMPTY(&ctx->aos_inprogress));
 
     if (ctx->xch) xc_interface_close(ctx->xch);
     libxl_version_info_dispose(&ctx->version_info);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9385e82..e8a2a91 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -642,6 +642,11 @@ typedef struct libxl__ctx libxl_ctx;
  */
 #define LIBXL_HAVE_DEVICE_CHANNEL 1
 
+/*
+ * LIBXL_HAVE_AO_CANCEL indicates the availability of libxl_ao_cancel
+ */
+#define LIBXL_HAVE_AO_CANCEL 1
+
 /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
  * called from within libxl itself. Callers outside libxl, who
  * do not #include libxl_internal.h, are fine. */
@@ -904,6 +909,65 @@ typedef struct {
     void *for_callback; /* passed to callback */
 } libxl_asyncprogress_how;
 
+/*
+ * It is sometimes possible to cancel an asynchronous operation.
+ *
+ * libxl_ao_cancel searches for an ongoing asynchronous operation whose
+ * ao_how is identical to *how, and tries to cancel it.  The return
+ * values from libxl_ao_cancel are as follows:
+ *
+ *  0
+ *
+ *     The operation in question has (at least some) support for
+ *     cancellation.  It will be cut short.  However, it may still
+ *     take some time to cancel.
+ *
+ *  ERROR_NOTFOUND
+ *
+ *      No matching ongoing operation was found.  This might happen
+ *      for an actual operation if the operation has already completed
+ *      (perhaps on another thread).  The call to libxl_ao_cancel has
+ *      had no effect.
+ *
+ *  ERROR_NOTIMPLEMENTED
+ *
+ *     As far as could be determined, the operation in question does
+ *     not support cancellation.  The operation may subsequently
+ *     complete normally, as if it had never been cancelled; however,
+ *     the cancellation attempt will still have been noted and it is
+ *     possible that the operation will be successfully cancelled.
+ *
+ *  ERROR_CANCELLED
+ *
+ *     The operation has already been the subject of at least one
+ *     call to libxl_ao_cancel.
+ *
+ * If the operation was indeed cut short due to the cancellation, it
+ * will complete, at some point in the future, with ERROR_CANCELLED.
+ * In that case, depending on the operation it have performed some of
+ * the work in question and left the operation half-done.  Consult the
+ * documentation for individual operations.
+ *
+ * Note that a cancelled operation might still fail for other reasons
+ * even after it has been cancelled.
+ *
+ * If your application is multithreaded you must not reuse an
+ * ao_how->for_event or ao_how->for_callback value (with a particular
+ * ao_how->callback) unless you are sure that none of your other
+ * threads are going to cancel the previous operation using that
+ * value; otherwise you risk cancelling the wrong operation if the
+ * intended target of the cancellation completes in the meantime.
+ *
+ * It is possible to cancel even an operation which is being performed
+ * synchronously, but since in that case how==NULL you had better only
+ * have one such operation, because it is not possible to tell them
+ * apart.  (And, if you want to do this, obviously the cancellation
+ * would have to be requested on a different thread.)
+ */
+int libxl_ao_cancel(libxl_ctx *ctx, const libxl_asyncop_how *how)
+                    LIBXL_EXTERNAL_CALLERS_ONLY;
+
+
 #define LIBXL_VERSION 0
 
 /* context functions */
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 80677e0..55013ef 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -1773,6 +1773,7 @@ void libxl__ao_abort(libxl__ao *ao)
     assert(ao->in_initiator);
     assert(!ao->complete);
     assert(!ao->progress_reports_outstanding);
+    assert(!ao->cancelling);
     libxl__ao__destroy(CTX, ao);
 }
 
@@ -1938,6 +1939,128 @@ int libxl__ao_inprogress(libxl__ao *ao,
 }
 
 
+/* cancellation */
+
+static int ao__cancel(libxl_ctx *ctx, libxl__ao *parent)
+/* Temporarily unlocks ctx, which must be locked exactly once on entry. */
+{
+    int rc;
+    ao__manip_enter(parent);
+
+    if (parent->cancelling) {
+        rc = ERROR_CANCELLED;
+        goto out;
+    }
+
+    parent->cancelling = 1;
+
+    if (LIBXL_LIST_EMPTY(&parent->cancellables)) {
+        LIBXL__LOG(ctx, XTL_DEBUG,
+                   "ao %p: cancellation requested, but not not implemented",
+                   parent);
+        rc = ERROR_NOTIMPLEMENTED;
+        goto out;
+    }
+
+    /* We keep calling cancellation hooks until there are none left */
+    while (!LIBXL_LIST_EMPTY(&parent->cancellables)) {
+        libxl__egc egc;
+        LIBXL_INIT_EGC(egc,ctx);
+
+        assert(!parent->complete);
+
+        libxl__ao_cancellable *canc = LIBXL_LIST_FIRST(&parent->cancellables);
+        assert(parent == ao_nested_root(canc->ao));
+
+        LIBXL_LIST_REMOVE(canc, entry);
+        canc->registered = 0;
+
+        LIBXL__LOG(ctx, XTL_DEBUG, "ao %p: canc=%p: cancelling",
+                   parent, canc->ao);
+        canc->callback(&egc, canc, ERROR_CANCELLED);
+
+        libxl__ctx_unlock(ctx);
+        libxl__egc_cleanup(&egc);
+        libxl__ctx_lock(ctx);
+    }
+
+    rc = 0;
+
+ out:
+    ao__manip_leave(ctx, parent);
+    return rc;
+}
+
+_hidden int libxl_ao_cancel(libxl_ctx *ctx, const libxl_asyncop_how *how)
+{
+    libxl__ao *search;
+    libxl__ctx_lock(ctx);
+    int rc;
+
+    LIBXL_LIST_FOREACH(search, &ctx->aos_inprogress, inprogress_entry) {
+        if (how) {
+            /* looking for ao to be reported by callback or event */
+            if (search->poller)
+                /* sync */
+                continue;
+            if (how->callback != search->how.callback)
+                continue;
+            if (how->callback
+                ? (how->u.for_callback != search->how.u.for_callback)
+                : (how->u.for_event != search->how.u.for_event))
+                continue;
+        } else {
+            /* looking for synchronous call */
+            if (!search->poller)
+                /* async */
+                continue;
+        }
+        goto found;
+    }
+    rc = ERROR_NOTFOUND;
+    goto out;
+
+ found:
+    rc = ao__cancel(ctx, search);
+ out:
+    libxl__ctx_unlock(ctx);
+    return rc;
+}
+
+int libxl__ao_cancellable_register(libxl__ao_cancellable *canc)
+{
+    libxl__ao *ao = canc->ao;
+    libxl__ao *root = ao_nested_root(ao);
+    AO_GC;
+
+    if (root->cancelling) {
+ DBG("ao=%p: preemptively cancelling cancellable registration %p (root=%p)",
+            ao, canc, root);
+        return ERROR_CANCELLED;
+    }
+
+    DBG("ao=%p, canc=%p: registering (root=%p)", ao, canc, root);
+    LIBXL_LIST_INSERT_HEAD(&root->cancellables, canc, entry);
+    canc->registered = 1;
+
+    return 0;
+}
+
+_hidden void libxl__ao_cancellable_deregister(libxl__ao_cancellable *canc)
+{
+    if (!canc->registered)
+        return;
+
+    libxl__ao *ao = canc->ao;
+    libxl__ao *root __attribute__((unused)) = ao_nested_root(ao);
+    AO_GC;
+
+    DBG("ao=%p, canc=%p: deregistering (root=%p)", ao, canc, root);
+    LIBXL_LIST_REMOVE(canc, entry);
+    canc->registered = 0;
+}
+
+
 /* progress reporting */
 
 /* The application indicates a desire to ignore events by passing NULL
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index d2c2637..46383c4 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -173,6 +173,41 @@ struct libxl__ev_fd {
 };
 
 
+typedef struct libxl__ao_cancellable libxl__ao_cancellable;
+typedef void libxl__ao_cancellable_callback(libxl__egc *egc,
+                  libxl__ao_cancellable *cancellable, int rc /* CANCELLED */);
+
+struct libxl__ao_cancellable {
+    /* caller must fill this in and it must remain valid */
+    libxl__ao *ao;
+    libxl__ao_cancellable_callback *callback;
+    /* remainder is private for cancellation machinery */
+    bool registered;
+    LIBXL_LIST_ENTRY(libxl__ao_cancellable) entry;
+    /*
+     * For nested aos:
+     *  Semantically, cancellation affects the whole tree of aos,
+     *    not just the parent.
+     *  libxl__ao_cancellable.ao refers to the child, so
+     *    that the child callback sees the right ao.  (After all,
+     *    it was code dealing with the child that set .ao.)
+     *  But, the cancellable is recorded on the "cancellables" list
+     *    for the ultimate root ao, so that every possible child
+     *    cancellation occurs as a result of the cancellation of the
+     *    parent.
+     *  We set ao->cancelling only in the root.
+     */
+};
+
+_hidden int libxl__ao_cancellable_register(libxl__ao_cancellable*);
+_hidden void libxl__ao_cancellable_deregister(libxl__ao_cancellable*);
+
+static inline void libxl__ao_cancellable_init
+  (libxl__ao_cancellable *c) { c->registered = 0; }
+static inline bool libxl__ao_cancellable_isregistered
+  (const libxl__ao_cancellable *c) { return c->registered; }
+
+
 typedef struct libxl__ev_time libxl__ev_time;
 typedef void libxl__ev_time_callback(libxl__egc *egc, libxl__ev_time *ev,
                                      const struct timeval *requested_abs,
@@ -362,6 +397,8 @@ struct libxl__ctx {
     LIBXL_LIST_HEAD(, libxl__ev_evtchn) evtchns_waiting;
     libxl__ev_fd evtchn_efd;
 
+    LIBXL_LIST_HEAD(, libxl__ao) aos_inprogress;
+
     LIBXL_TAILQ_HEAD(libxl__evgen_domain_death_list, libxl_evgen_domain_death)
         death_list /* sorted by domid */,
         death_reported;
@@ -448,12 +485,15 @@ struct libxl__ao {
      * only in libxl__ao_complete.)
      */
     uint32_t magic;
-    unsigned constructing:1, in_initiator:1, complete:1, notified:1;
+    unsigned constructing:1, in_initiator:1, complete:1, notified:1,
+        cancelling:1;
     int manip_refcnt;
     libxl__ao *nested_root;
     int nested_progeny;
     int progress_reports_outstanding;
     int rc;
+    LIBXL_LIST_HEAD(, libxl__ao_cancellable) cancellables;
+    LIBXL_LIST_ENTRY(libxl__ao) inprogress_entry;
     libxl__gc gc;
     libxl_asyncop_how how;
     libxl__poller *poller;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (16 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:45   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 19/29] libxl: cancellation: Make timeouts cancellable Ian Jackson
                   ` (13 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Some places in libxl which can't handle cancellation via a
libxl__ao_cancellable callback might nevertheless benefit from being
able to explicitly check for cancellation.

Provide the (fairly trivial) internal API function to do this.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_event.c    |   11 +++++++++++
 tools/libxl/libxl_internal.h |    2 ++
 2 files changed, 13 insertions(+)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index 55013ef..d3658f1 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -2027,6 +2027,17 @@ _hidden int libxl_ao_cancel(libxl_ctx *ctx, const libxl_asyncop_how *how)
     return rc;
 }
 
+int libxl__ao_cancelling(libxl__ao *ao)
+{
+    libxl__ao *root = ao_nested_root(ao);
+    if (root->cancelling) {
+        DBG("ao=%p: cancelling at explicit check (root=%p)", ao, root);
+        return ERROR_CANCELLED;
+    }
+
+    return 0;
+}
+
 int libxl__ao_cancellable_register(libxl__ao_cancellable *canc)
 {
     libxl__ao *ao = canc->ao;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 46383c4..6caf042 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -207,6 +207,8 @@ static inline void libxl__ao_cancellable_init
 static inline bool libxl__ao_cancellable_isregistered
   (const libxl__ao_cancellable *c) { return c->registered; }
 
+int libxl__ao_cancelling(libxl__ao *ao); /* -> 0 or ERROR_CANCELLED */
+
 
 typedef struct libxl__ev_time libxl__ev_time;
 typedef void libxl__ev_time_callback(libxl__egc *egc, libxl__ev_time *ev,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 19/29] libxl: cancellation: Make timeouts cancellable
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (17 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:50   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled Ian Jackson
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Make libxl__ev_time* register with the cancellation machinery, so that
libxl_ao_cancel can cancel any operation which has a timeout.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_event.c    |   27 +++++++++++++++++++++++++++
 tools/libxl/libxl_internal.h |    3 ++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index d3658f1..ba549f2 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -292,6 +292,8 @@ static int time_register_finite(libxl__gc *gc, libxl__ev_time *ev,
 
 static void time_deregister(libxl__gc *gc, libxl__ev_time *ev)
 {
+    libxl__ao_cancellable_deregister(&ev->cancel);
+
     if (!ev->infinite) {
         struct timeval right_away = { 0, 0 };
         if (ev->nexus) /* only set if app provided hooks */
@@ -314,6 +316,23 @@ static void time_done_debug(libxl__gc *gc, const char *func,
 #endif
 }
 
+static void time_cancelled(libxl__egc *egc, libxl__ao_cancellable *canc, int rc)
+{
+    libxl__ev_time *ev = CONTAINER_OF(canc, *ev, cancel);
+    EGC_GC;
+
+    time_deregister(gc, ev);
+    DBG("ev_time=%p cancelled", ev);
+    ev->func(egc, ev, &ev->abs, rc);
+}
+
+static int time_register_cancel(libxl__ao *ao, libxl__ev_time *ev)
+{
+    ev->cancel.ao = ao;
+    ev->cancel.callback = time_cancelled;
+    return libxl__ao_cancellable_register(&ev->cancel);
+}
+
 int libxl__ev_time_register_abs(libxl__ao *ao, libxl__ev_time *ev,
                                 libxl__ev_time_callback *func,
                                 struct timeval absolute)
@@ -326,6 +345,9 @@ int libxl__ev_time_register_abs(libxl__ao *ao, libxl__ev_time *ev,
     DBG("ev_time=%p register abs=%lu.%06lu",
         ev, (unsigned long)absolute.tv_sec, (unsigned long)absolute.tv_usec);
 
+    rc = time_register_cancel(ao, ev);
+    if (rc) goto out;
+
     rc = time_register_finite(gc, ev, absolute);
     if (rc) goto out;
 
@@ -333,6 +355,7 @@ int libxl__ev_time_register_abs(libxl__ao *ao, libxl__ev_time *ev,
 
     rc = 0;
  out:
+    libxl__ao_cancellable_deregister(&ev->cancel);
     time_done_debug(gc,__func__,ev,rc);
     CTX_UNLOCK;
     return rc;
@@ -351,6 +374,9 @@ int libxl__ev_time_register_rel(libxl__ao *ao, libxl__ev_time *ev,
 
     DBG("ev_time=%p register ms=%d", ev, milliseconds);
 
+    rc = time_register_cancel(ao, ev);
+    if (rc) goto out;
+
     if (milliseconds < 0) {
         ev->infinite = 1;
     } else {
@@ -365,6 +391,7 @@ int libxl__ev_time_register_rel(libxl__ao *ao, libxl__ev_time *ev,
     rc = 0;
 
  out:
+    libxl__ao_cancellable_deregister(&ev->cancel);
     time_done_debug(gc,__func__,ev,rc);
     CTX_UNLOCK;
     return rc;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6caf042..790a489 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -223,6 +223,7 @@ struct libxl__ev_time {
     LIBXL_TAILQ_ENTRY(libxl__ev_time) entry;
     struct timeval abs;
     libxl__osevent_hook_nexus *nexus;
+    libxl__ao_cancellable cancel;
 };
 
 typedef struct libxl__ev_xswatch libxl__ev_xswatch;
@@ -828,7 +829,7 @@ _hidden int libxl__ev_time_modify_abs(libxl__gc*, libxl__ev_time *ev,
                                       struct timeval);
 _hidden void libxl__ev_time_deregister(libxl__gc*, libxl__ev_time *ev);
 static inline void libxl__ev_time_init(libxl__ev_time *ev)
-                { ev->func = 0; }
+                { ev->func = 0; libxl__ao_cancellable_init(&ev->cancel); }
 static inline int libxl__ev_time_isregistered(const libxl__ev_time *ev)
                 { return !!ev->func; }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (18 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 19/29] libxl: cancellation: Make timeouts cancellable Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-02-18 16:24   ` Roger Pau Monné
  2015-03-24 11:51   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 21/29] libxl: cancellation: Make spawns cancellable Ian Jackson
                   ` (11 subsequent siblings)
  31 siblings, 2 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris, Roger Pau Monne

In practice, cancelling this task will cause all subsequent actual
backend operations to fail, but will not actually cause the
libxl_device_events_handler operation to complete.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Roger Pau Monne <roger.pau@citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index e8a2a91..08a87aed 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1353,6 +1353,9 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid,
  * From a libxl API point of view, this starts a long-running
  * operation.  That operation consists of "being a driver domain"
  * and never completes.
+ *
+ * Attempting to cancel this operation is not advisable; proper
+ * shutdown of the driver domain task is not supported.
  */
 int libxl_device_events_handler(libxl_ctx *ctx,
                                 const libxl_asyncop_how *ao_how)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 21/29] libxl: cancellation: Make spawns cancellable
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (19 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:53   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code Ian Jackson
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

The libxl__spawn_spawn internal API permits the caller to specify
.timeout_ms==-1, meaning to wait forever.  Provide an explicit
cancellable to allow spawns to be cancelled.

I think there are not currently any internal callers which do use
spawn with an infinite timeout, but this should not be left as a
lacuna for later.

Also, this change means that in practice anything which is cancelled
while spawning ought to be queued for two cancellation notifications:
one from its timeout, and one from the explicit cancellation
registration.  Hopefully this will mean that more exciting paths are
exercised during testing.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_exec.c     |   17 +++++++++++++++++
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 18 insertions(+)

diff --git a/tools/libxl/libxl_exec.c b/tools/libxl/libxl_exec.c
index 478b4c2..c005b7d 100644
--- a/tools/libxl/libxl_exec.c
+++ b/tools/libxl/libxl_exec.c
@@ -261,6 +261,7 @@ static void spawn_watch_event(libxl__egc *egc, libxl__xswait_state *xswa,
                               int rc, const char *xsdata);
 static void spawn_middle_death(libxl__egc *egc, libxl__ev_child *childw,
                                pid_t pid, int status);
+static void spawn_cancel(libxl__egc *egc, libxl__ao_cancellable *, int rc);
 
 /* Precondition: Partial.  Results: Idle. */
 static void spawn_cleanup(libxl__gc *gc, libxl__spawn_state *ss);
@@ -273,6 +274,7 @@ void libxl__spawn_init(libxl__spawn_state *ss)
 {
     libxl__ev_child_init(&ss->mid);
     libxl__xswait_init(&ss->xswait);
+    libxl__ao_cancellable_init(&ss->cancel);
 }
 
 int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
@@ -285,6 +287,10 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
     libxl__spawn_init(ss);
     ss->failed = ss->detaching = 0;
 
+    ss->cancel.ao = ao;
+    ss->cancel.callback = spawn_cancel;
+    rc = libxl__ao_cancellable_register(&ss->cancel);
+
     ss->xswait.ao = ao;
     ss->xswait.what = GCSPRINTF("%s startup", ss->what);
     ss->xswait.path = ss->xspath;
@@ -347,6 +353,7 @@ int libxl__spawn_spawn(libxl__egc *egc, libxl__spawn_state *ss)
 static void spawn_cleanup(libxl__gc *gc, libxl__spawn_state *ss)
 {
     assert(!libxl__ev_child_inuse(&ss->mid));
+    libxl__ao_cancellable_deregister(&ss->cancel);
     libxl__xswait_stop(gc, &ss->xswait);
 }
 
@@ -382,6 +389,16 @@ static void spawn_fail(libxl__egc *egc, libxl__spawn_state *ss)
     spawn_detach(gc, ss);
 }
 
+static void spawn_cancel(libxl__egc *egc, libxl__ao_cancellable *cancel,
+                         int rc)
+{
+    EGC_GC;
+    libxl__spawn_state *ss = CONTAINER_OF(cancel, *ss, cancel);
+
+    LOG(NOTICE, "%s: spawn cancelled", ss->what);
+    spawn_fail(egc, ss);
+}
+
 static void spawn_watch_event(libxl__egc *egc, libxl__xswait_state *xswa,
                               int rc, const char *p)
 {
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 790a489..a251b43 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1418,6 +1418,7 @@ struct libxl__spawn_state {
     int detaching; /* we are in Detaching */
     int failed; /* might be true whenever we are not Idle */
     libxl__ev_child mid; /* always in use whenever we are not Idle */
+    libxl__ao_cancellable cancel; /* in case timeout was -1 */
     libxl__xswait_state xswait;
 };
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (20 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 21/29] libxl: cancellation: Make spawns cancellable Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:56   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death Ian Jackson
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

This is currently reported only by the bootloader code, if the domain
is destroyed while the bootloader is running.

In the future it would be nice to return it for other circumstances
where the domain existed when the operation started but subsequently
vanished.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_bootloader.c |    2 +-
 tools/libxl/libxl_types.idl    |    1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_bootloader.c b/tools/libxl/libxl_bootloader.c
index 79947d4..c3f3a1f 100644
--- a/tools/libxl/libxl_bootloader.c
+++ b/tools/libxl/libxl_bootloader.c
@@ -611,7 +611,7 @@ static void bootloader_display_copyfail(libxl__egc *egc,
 static void bootloader_domaindeath(libxl__egc *egc, libxl__domaindeathcheck *dc)
 {
     libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, deathcheck);
-    bootloader_stop(egc, bl, ERROR_FAIL);
+    bootloader_stop(egc, bl, ERROR_DOMAIN_DESTROYED);
 }
 
 static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 193f22a..d91b70d 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -66,6 +66,7 @@ libxl_error = Enumeration("error", [
     (-20, "CANCELLED"),
     (-21, "NOTFOUND"),
     (-22, "NOTIMPLEMENTED"),
+    (-23, "DOMAIN_DESTROYED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (21 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 11:58   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 24/29] libxl: Introduce FILLZERO Ian Jackson
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Make an active libxl__domaindeathcheck contain an active
libxl__ao_cancellable.

Consequential changes are:
 * domaindeath callbacks now take an rc value.
 * libxl__domaindeathcheck_start takes an ao, not a gc.
 * bootloader_domaindeath plumbs the rc through to its caller.
 * libxl__domaindeathcheck_init and _stop are not quite trivial any
   more so are moved from (inline functions) in libxl_internal.h, to
   ordinary functions defined in libxl_event.c.
 * libxl__domaindeathcheck_start is not trivial any more, and now has
   the standard error-handling pattern.

The only current user of libxl__domaindeathcheck is the bootloader.
So the result is that now it is possible to effectively cancel domain
creation while the bootloader is running.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_bootloader.c |   11 +++++----
 tools/libxl/libxl_event.c      |   50 ++++++++++++++++++++++++++++++++++++----
 tools/libxl/libxl_internal.h   |   12 +++++-----
 3 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/tools/libxl/libxl_bootloader.c b/tools/libxl/libxl_bootloader.c
index c3f3a1f..21f92dc 100644
--- a/tools/libxl/libxl_bootloader.c
+++ b/tools/libxl/libxl_bootloader.c
@@ -33,7 +33,8 @@ static void bootloader_keystrokes_copyfail(libxl__egc *egc,
        libxl__datacopier_state *dc, int onwrite, int errnoval);
 static void bootloader_display_copyfail(libxl__egc *egc,
        libxl__datacopier_state *dc, int onwrite, int errnoval);
-static void bootloader_domaindeath(libxl__egc*, libxl__domaindeathcheck *dc);
+static void bootloader_domaindeath(libxl__egc*, libxl__domaindeathcheck *dc,
+                                   int rc);
 static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
                                 pid_t pid, int status);
 
@@ -496,7 +497,7 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
     bl->deathcheck.what = "stopping bootloader";
     bl->deathcheck.domid = bl->domid;
     bl->deathcheck.callback = bootloader_domaindeath;
-    rc = libxl__domaindeathcheck_start(gc, &bl->deathcheck);
+    rc = libxl__domaindeathcheck_start(ao, &bl->deathcheck);
     if (rc) goto out;
 
     if (bl->console_available)
@@ -608,10 +609,12 @@ static void bootloader_display_copyfail(libxl__egc *egc,
     bootloader_copyfail(egc, "bootloader output", bl, 1, onwrite, errnoval);
 }
 
-static void bootloader_domaindeath(libxl__egc *egc, libxl__domaindeathcheck *dc)
+static void bootloader_domaindeath(libxl__egc *egc,
+                                   libxl__domaindeathcheck *dc,
+                                   int rc)
 {
     libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, deathcheck);
-    bootloader_stop(egc, bl, ERROR_DOMAIN_DESTROYED);
+    bootloader_stop(egc, bl, rc);
 }
 
 static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
index ba549f2..dd9658d 100644
--- a/tools/libxl/libxl_event.c
+++ b/tools/libxl/libxl_event.c
@@ -920,6 +920,18 @@ int libxl__ev_devstate_wait(libxl__ao *ao, libxl__ev_devstate *ds,
  * futile.
  */
 
+void libxl__domaindeathcheck_init(libxl__domaindeathcheck *dc)
+{
+    libxl__ao_cancellable_init(&dc->cancel);
+    libxl__ev_xswatch_init(&dc->watch);
+}
+
+void libxl__domaindeathcheck_stop(libxl__gc *gc, libxl__domaindeathcheck *dc)
+{
+    libxl__ao_cancellable_deregister(&dc->cancel);
+    libxl__ev_xswatch_deregister(gc,&dc->watch);
+}
+
 static void domaindeathcheck_callback(libxl__egc *egc, libxl__ev_xswatch *w,
                             const char *watch_path, const char *event_path)
 {
@@ -928,6 +940,8 @@ static void domaindeathcheck_callback(libxl__egc *egc, libxl__ev_xswatch *w,
     const char *p = libxl__xs_read(gc, XBT_NULL, watch_path);
     if (p) return;
 
+    libxl__domaindeathcheck_stop(gc,dc);
+
     if (errno!=ENOENT) {
         LIBXL__EVENT_DISASTER(egc,"failed to read xenstore"
                               " for domain detach check", errno, 0);
@@ -936,15 +950,43 @@ static void domaindeathcheck_callback(libxl__egc *egc, libxl__ev_xswatch *w,
 
     LOG(ERROR,"%s: domain %"PRIu32" removed (%s no longer in xenstore)",
         dc->what, dc->domid, watch_path);
-    dc->callback(egc, dc);
+    dc->callback(egc, dc, ERROR_DOMAIN_DESTROYED);
+}
+
+static void domaindeathcheck_cancel(libxl__egc *egc,
+                                    libxl__ao_cancellable *cancel,
+                                    int rc)
+{
+    libxl__domaindeathcheck *dc = CONTAINER_OF(cancel, *dc, cancel);
+    EGC_GC;
+
+    libxl__domaindeathcheck_stop(gc,dc);
+    dc->callback(egc, dc, rc);
 }
 
-int libxl__domaindeathcheck_start(libxl__gc *gc,
+int libxl__domaindeathcheck_start(libxl__ao *ao,
                                   libxl__domaindeathcheck *dc)
 {
+    AO_GC;
+    int rc;
     const char *path = GCSPRINTF("/local/domain/%"PRIu32, dc->domid);
-    return libxl__ev_xswatch_register(gc, &dc->watch,
-                                      domaindeathcheck_callback, path);
+
+    libxl__domaindeathcheck_init(dc);
+
+    dc->cancel.ao = ao;
+    dc->cancel.callback = domaindeathcheck_cancel;
+    rc = libxl__ao_cancellable_register(&dc->cancel);
+    if (rc) goto out;
+
+    rc = libxl__ev_xswatch_register(gc, &dc->watch,
+                                    domaindeathcheck_callback, path);
+    if (rc) goto out;
+
+    return 0;
+
+ out:
+    libxl__domaindeathcheck_stop(gc,dc);
+    return rc;
 }
 
 /*
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index a251b43..9ab3ab7 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1238,7 +1238,8 @@ _hidden int libxl__ev_devstate_wait(libxl__ao *ao, libxl__ev_devstate *ds,
 
 typedef struct libxl__domaindeathcheck libxl__domaindeathcheck;
 typedef void libxl___domaindeathcheck_callback(libxl__egc *egc,
-                                         libxl__domaindeathcheck*);
+        libxl__domaindeathcheck*,
+        int rc /* DESTROYED or CANCELLED */);
 
 struct libxl__domaindeathcheck {
     /* must be filled in by caller, and remain valid: */
@@ -1246,16 +1247,15 @@ struct libxl__domaindeathcheck {
     uint32_t domid;
     libxl___domaindeathcheck_callback *callback;
     /* private */
+    libxl__ao_cancellable cancel;
     libxl__ev_xswatch watch;
 };
 
-_hidden int libxl__domaindeathcheck_start(libxl__gc *gc,
+_hidden int libxl__domaindeathcheck_start(libxl__ao *ao,
                                           libxl__domaindeathcheck *dc);
 
-static inline void libxl__domaindeathcheck_init
- (libxl__domaindeathcheck *dc) { libxl__ev_xswatch_init(&dc->watch); }
-static inline void libxl__domaindeathcheck_stop(libxl__gc *gc,
-  libxl__domaindeathcheck *dc) { libxl__ev_xswatch_deregister(gc,&dc->watch); }
+void libxl__domaindeathcheck_init(libxl__domaindeathcheck *dc);
+void libxl__domaindeathcheck_stop(libxl__gc *gc, libxl__domaindeathcheck *dc);
 
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 24/29] libxl: Introduce FILLZERO
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (22 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:03   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation Ian Jackson
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

FILLZERO is a macro for memset(&foo,0,sizeof(foo)).  It eliminates the
possiblity to make the error memset(&foo,0,sizeof(&foo)).

No callers yet, but document it in CODING_STYLE.  (In accordance with
existing libxl policy, I haven't gone through all existing possible
call sites.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/CODING_STYLE     |    1 +
 tools/libxl/libxl_internal.h |    3 +++
 tools/libxl/libxl_utils.h    |    3 +++
 3 files changed, 7 insertions(+)

diff --git a/tools/libxl/CODING_STYLE b/tools/libxl/CODING_STYLE
index f5b5890..a65efb3 100644
--- a/tools/libxl/CODING_STYLE
+++ b/tools/libxl/CODING_STYLE
@@ -62,6 +62,7 @@ whenever they are applicable.  For example:
   libxl__ctx_[un]lock     CTX_LOCK, CTX_UNLOCK
   gc=...; ao=...;         EGC_GC, AO_GC, STATE_AO_GC
   explicit gc creation    GC_INIT, GC_FREE
+  memset(..,0,sizeof..)   FILLZERO
 
 
 ERROR HANDLING
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9ab3ab7..cfc0706 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3226,6 +3226,9 @@ _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid);
     })
 
 
+#define FILLZERO LIBXL_FILLZERO
+
+
 /*
  * All of these assume (or define)
  *    libxl__gc *gc;
diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h
index acacdd9..51eac68 100644
--- a/tools/libxl/libxl_utils.h
+++ b/tools/libxl/libxl_utils.h
@@ -154,6 +154,9 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx,
 
 void libxl_string_copy(libxl_ctx *ctx, char **dst, char **src);
 
+
+#define LIBXL_FILLZERO(object) (memset(&(object), 0, sizeof((object))))
+
 #endif
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (23 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 24/29] libxl: Introduce FILLZERO Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:04   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper Ian Jackson
                   ` (6 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Two unrelated non-functional changes, broken out into a pre-patch for
easier review:

Break out a function sendsig() in libxl_save_callout.c.

Move io_fd to be a global variable in libxl_save_helper.c.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_save_callout.c |   10 +++++++---
 tools/libxl/libxl_save_helper.c  |    5 +++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 40b25e4..1d584f1 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -237,6 +237,12 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     libxl__carefd_close(childs_pipes[1]);
     helper_failed(egc, shs, rc);;
 }
+static void sendsig(libxl__gc *gc, libxl__save_helper_state *shs, int sig)
+{
+    int r = kill(shs->child.pid, sig);
+    if (r) LOGE(WARN, "failed to kill save/restore helper [%lu] (signal %d)",
+                (unsigned long)shs->child.pid, sig);
+}
 
 static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
                           int rc)
@@ -253,9 +259,7 @@ static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
         return;
     }
 
-    int r = kill(shs->child.pid, SIGKILL);
-    if (r) LOGE(WARN, "failed to kill save/restore helper [%lu]",
-                (unsigned long)shs->child.pid);
+    sendsig(gc, shs, SIGKILL);
 }
 
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 74826a1..7514b2e 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -85,6 +85,7 @@ static xentoollog_logger logger = {
     tellparent_destroy,
 };
 static xc_interface *xch;
+static int io_fd;
 
 /*----- error handling -----*/
 
@@ -211,7 +212,7 @@ int main(int argc, char **argv)
 
     if (!strcmp(mode,"--save-domain")) {
 
-        int io_fd =                atoi(NEXTARG);
+        io_fd =                    atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         uint32_t max_iters =       strtoul(NEXTARG,0,10);
         uint32_t max_factor =      strtoul(NEXTARG,0,10);
@@ -234,7 +235,7 @@ int main(int argc, char **argv)
 
     } else if (!strcmp(mode,"--restore-domain")) {
 
-        int io_fd =                atoi(NEXTARG);
+        io_fd =                    atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         unsigned store_evtchn =    strtoul(NEXTARG,0,10);
         domid_t store_domid =      strtoul(NEXTARG,0,10);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (24 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:07   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore Ian Jackson
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

During startup of the save/restore helper, set the disposition of
SIGTERM appropriately.

For restore, we can simply die immediately - there is no point trying
to do any kind of cleanup on what is now going to be a trashed domain.

For save, we want to arrange that libxc's cleanup code (eg turning off
logdirty) takes place.  So our signal handler replaces the fd with one
on which writes will fail, causing libxc's own loop to fail next time
it actually tries to do a write.

Currently this has only a minor beneficial effect: we don't send the
helper a SIGTERM ourselves, and if someone else contrives to send our
helper a SIGTERM they have probably sent one to libxl too in which
case things are going to be a bit messy anyway.

But in the next patch libxl is going to use SIGTERM itself on ao
cancellation.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_save_helper.c |   58 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 7514b2e..0be77c9 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -40,8 +40,10 @@
 #include <unistd.h>
 #include <assert.h>
 #include <inttypes.h>
+#include <fcntl.h>
 
 #include "libxl.h"
+#include "libxl_utils.h"
 
 #include "xenctrl.h"
 #include "xenguest.h"
@@ -120,6 +122,58 @@ static void *xmalloc(size_t sz)
     return r;
 }
 
+/*----- signal handling -----*/
+
+static int unwriteable_fd;
+
+static void save_signal_handler(int num)
+{
+    /*
+     * We want to be able to interrupt save.  But the code in libxc
+     * which does the actual saving is straight-through, and we need
+     * to execute its error path to put the guest back to sanity.
+     *
+     * So what we do is this: when we get the signal, we dup2
+     * the result of open("/dev/null",O_RDONLY) onto the output fd.
+     *
+     * This is guaranteed to 1. interrupt libxc's write (causing it to
+     * return short, or maybe EINTR); 2. make the next write give
+     * EBADF, so that: 3. at latest, libxc will notice when it next
+     * tries to write data and will then go into its cleanup path.
+     *
+     * We make no effort here to sanitise the resulting errors.
+     * That's libxl's job.
+     */
+    int esave = errno;
+
+    int r = dup2(unwriteable_fd, io_fd);
+    assert(r == io_fd); /* if not we can't write an xtl message because we
+                         * might end up interleaving on our control stream */
+
+    errno = esave;
+}
+
+static void setup_signals(void (*handler)(int))
+{
+    struct sigaction sa;
+    sigset_t spmask;
+    int r;
+
+    unwriteable_fd = open("/dev/null",O_RDONLY);
+    if (unwriteable_fd < 0) fail(errno,"open /dev/null for reading");
+
+    LIBXL_FILLZERO(sa);
+    sa.sa_handler = handler;
+    sigemptyset(&sa.sa_mask);
+    r = sigaction(SIGTERM, &sa, 0);
+    if (r) fail(errno,"sigaction SIGTERM failed");
+
+    sigemptyset(&spmask);
+    sigaddset(&spmask,SIGTERM);
+    r = sigprocmask(SIG_UNBLOCK,&spmask,0);
+    if (r) fail(errno,"sigprocmask unblock SIGTERM failed");
+}
+
 /*----- helper functions called by autogenerated stubs -----*/
 
 unsigned char * helper_allocbuf(int len, void *user)
@@ -229,6 +283,8 @@ int main(int argc, char **argv)
         helper_setcallbacks_save(&helper_save_callbacks, cbflags);
 
         startup("save");
+        setup_signals(save_signal_handler);
+
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
                            &helper_save_callbacks, hvm);
         complete(r);
@@ -254,6 +310,8 @@ int main(int argc, char **argv)
         unsigned long console_mfn = 0;
 
         startup("restore");
+        setup_signals(SIG_DFL);
+
         r = xc_domain_restore(xch, io_fd, dom, store_evtchn, &store_mfn,
                               store_domid, console_evtchn, &console_mfn,
                               console_domid, hvm, pae, superpages,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (25 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:08   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 28/29] libxl: ao: datacopier callback gets an rc Ian Jackson
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

Register the the save/restore helper interface with the cancellation
machinery.  When we are informed that save/restore should be
cancelled, we make a note of the that in our rc variable, and send the
helper a SIGTERM.  It will die in due course.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_internal.h     |    1 +
 tools/libxl/libxl_save_callout.c |   25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index cfc0706..eb8e5f2 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2639,6 +2639,7 @@ typedef struct libxl__save_helper_state {
     int rc;
     int completed; /* retval/errnoval valid iff completed */
     int retval, errnoval; /* from xc_domain_save / xc_domain_restore */
+    libxl__ao_cancellable cancel;
     libxl__carefd *pipes[2]; /* 0 = helper's stdin, 1 = helper's stdout */
     libxl__ev_fd readable;
     libxl__ev_child child;
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 1d584f1..d9fa0d2 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -32,6 +32,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
                        const unsigned long *argnums, int num_argnums);
 
 static void helper_failed(libxl__egc*, libxl__save_helper_state *shs, int rc);
+static void helper_cancel(libxl__egc *egc, libxl__ao_cancellable*, int rc);
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
                                    int fd, short events, short revents);
 static void helper_exited(libxl__egc *egc, libxl__ev_child *ch,
@@ -166,9 +167,15 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     shs->rc = 0;
     shs->completed = 0;
     shs->pipes[0] = shs->pipes[1] = 0;
+    libxl__ao_cancellable_init(&shs->cancel);
     libxl__ev_fd_init(&shs->readable);
     libxl__ev_child_init(&shs->child);
 
+    shs->cancel.ao = shs->ao;
+    shs->cancel.callback = helper_cancel;
+    rc = libxl__ao_cancellable_register(&shs->cancel);
+    if (rc) goto out;
+
     shs->stdin_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
                                 " stdin pipe", domid);
     shs->stdout_what = GCSPRINTF("domain %"PRIu32" save/restore helper"
@@ -262,6 +269,23 @@ static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs,
     sendsig(gc, shs, SIGKILL);
 }
 
+static void helper_cancel(libxl__egc *egc, libxl__ao_cancellable *cancel,
+                          int rc)
+{
+    libxl__save_helper_state *shs = CONTAINER_OF(cancel, *shs, cancel);
+    STATE_AO_GC(shs->ao);
+
+    if (!libxl__ev_child_inuse(&shs->child)) {
+        helper_failed(egc, shs, rc);
+        return;
+    }
+
+    if (!shs->rc)
+        shs->rc = rc;
+
+    sendsig(gc, shs, SIGTERM);
+}
+
 static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev,
                                    int fd, short events, short revents)
 {
@@ -332,6 +356,7 @@ static void helper_done(libxl__egc *egc, libxl__save_helper_state *shs)
 {
     STATE_AO_GC(shs->ao);
 
+    libxl__ao_cancellable_deregister(&shs->cancel);
     libxl__ev_fd_deregister(gc, &shs->readable);
     libxl__carefd_close(shs->pipes[0]);  shs->pipes[0] = 0;
     libxl__carefd_close(shs->pipes[1]);  shs->pipes[1] = 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 28/29] libxl: ao: datacopier callback gets an rc
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (26 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:10   ` Ian Campbell
  2015-02-10 20:10 ` [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable Ian Jackson
                   ` (3 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

libxl__datacopier_* now provides its caller's callback function with
an rc value.  This relieves the caller of the need to figure out an
appropriate rc value.

Arrange that the `other internal failure' cases now get a valid
positive errno value (EIO).

In a few places, assert that errno is nonzero before passing it to our
caller.

Extend the datacopier callback API to permit the dc to signal
CANCELLED.  (It doesn't actually do this yet, though.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_aoutils.c    |   22 ++++++++++++----------
 tools/libxl/libxl_bootloader.c |   20 +++++++++++---------
 tools/libxl/libxl_dom.c        |   10 +++-------
 tools/libxl/libxl_internal.h   |   18 +++++++++++-------
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 0b6d750..ece7981 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -121,10 +121,10 @@ void libxl__datacopier_kill(libxl__datacopier_state *dc)
 }
 
 static void datacopier_callback(libxl__egc *egc, libxl__datacopier_state *dc,
-                                int onwrite, int errnoval)
+                                int rc, int onwrite, int errnoval)
 {
     libxl__datacopier_kill(dc);
-    dc->callback(egc, dc, onwrite, errnoval);
+    dc->callback(egc, dc, rc, onwrite, errnoval);
 }
 
 static void datacopier_writable(libxl__egc *egc, libxl__ev_fd *ev,
@@ -142,13 +142,13 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
             if (rc) {
                 LOG(ERROR, "unable to establish write event on %s"
                     " during copy of %s", dc->writewhat, dc->copywhat);
-                datacopier_callback(egc, dc, -1, 0);
+                datacopier_callback(egc, dc, ERROR_FAIL, -1, EIO);
                 return;
             }
         }
     } else if (!libxl__ev_fd_isregistered(&dc->toread)) {
         /* we have had eof */
-        datacopier_callback(egc, dc, 0, 0);
+        datacopier_callback(egc, dc, 0, 0, 0);
         return;
     } else {
         /* nothing buffered, but still reading */
@@ -190,7 +190,7 @@ static int datacopier_pollhup_handled(libxl__egc *egc,
             onwrite ? dc->writewhat : dc->readwhat,
             dc->copywhat);
         libxl__datacopier_kill(dc);
-        dc->callback_pollhup(egc, dc, onwrite, -1);
+        dc->callback_pollhup(egc, dc, ERROR_FAIL, onwrite, -1);
         return 1;
     }
     return 0;
@@ -207,7 +207,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     if (revents & ~POLLIN) {
         LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN)"
             " on %s during copy of %s", revents, dc->readwhat, dc->copywhat);
-        datacopier_callback(egc, dc, -1, 0);
+        datacopier_callback(egc, dc, ERROR_FAIL, -1, EIO);
         return;
     }
     assert(revents & POLLIN);
@@ -234,9 +234,10 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
         if (r < 0) {
             if (errno == EINTR) continue;
             if (errno == EWOULDBLOCK) break;
+            assert(errno);
             LOGE(ERROR, "error reading %s during copy of %s",
                  dc->readwhat, dc->copywhat);
-            datacopier_callback(egc, dc, 0, errno);
+            datacopier_callback(egc, dc, ERROR_FAIL, 0, errno);
             return;
         }
         if (r == 0) {
@@ -249,7 +250,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
                 assert(ferror(dc->log));
                 assert(errno);
                 LOGE(ERROR, "error logging %s", dc->copywhat);
-                datacopier_callback(egc, dc, 0, errno);
+                datacopier_callback(egc, dc, ERROR_FAIL, 0, errno);
                 return;
             }
         }
@@ -271,7 +272,7 @@ static void datacopier_writable(libxl__egc *egc, libxl__ev_fd *ev,
     if (revents & ~POLLOUT) {
         LOG(ERROR, "unexpected poll event 0x%x (should be POLLOUT)"
             " on %s during copy of %s", revents, dc->writewhat, dc->copywhat);
-        datacopier_callback(egc, dc, -1, 0);
+        datacopier_callback(egc, dc, ERROR_FAIL, -1, EIO);
         return;
     }
     assert(revents & POLLOUT);
@@ -288,9 +289,10 @@ static void datacopier_writable(libxl__egc *egc, libxl__ev_fd *ev,
         if (r < 0) {
             if (errno == EINTR) continue;
             if (errno == EWOULDBLOCK) break;
+            assert(errno);
             LOGE(ERROR, "error writing to %s during copy of %s",
                  dc->writewhat, dc->copywhat);
-            datacopier_callback(egc, dc, 1, errno);
+            datacopier_callback(egc, dc, ERROR_FAIL, 1, errno);
             return;
         }
         assert(r > 0);
diff --git a/tools/libxl/libxl_bootloader.c b/tools/libxl/libxl_bootloader.c
index 21f92dc..c26f1d6 100644
--- a/tools/libxl/libxl_bootloader.c
+++ b/tools/libxl/libxl_bootloader.c
@@ -30,9 +30,9 @@
 
 static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op);
 static void bootloader_keystrokes_copyfail(libxl__egc *egc,
-       libxl__datacopier_state *dc, int onwrite, int errnoval);
+       libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
 static void bootloader_display_copyfail(libxl__egc *egc,
-       libxl__datacopier_state *dc, int onwrite, int errnoval);
+       libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
 static void bootloader_domaindeath(libxl__egc*, libxl__domaindeathcheck *dc,
                                    int rc);
 static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
@@ -577,10 +577,10 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
 
 /* perhaps one of these will be called, but perhaps not */
 static void bootloader_copyfail(libxl__egc *egc, const char *which,
-        libxl__bootloader_state *bl, int ondisplay, int onwrite, int errnoval)
+        libxl__bootloader_state *bl, int ondisplay,
+        int rc, int onwrite, int errnoval)
 {
     STATE_AO_GC(bl->ao);
-    int rc = ERROR_FAIL;
 
     if (errnoval==-1) {
         /* POLLHUP */
@@ -591,22 +591,24 @@ static void bootloader_copyfail(libxl__egc *egc, const char *which,
             LOG(ERROR, "unexpected POLLHUP on %s", which);
         }
     }
-    if (!onwrite && !errnoval)
+    if (!rc) {
         LOG(ERROR, "unexpected eof copying %s", which);
+        rc = ERROR_FAIL;
+    }
 
     bootloader_stop(egc, bl, rc);
 }
 static void bootloader_keystrokes_copyfail(libxl__egc *egc,
-       libxl__datacopier_state *dc, int onwrite, int errnoval)
+       libxl__datacopier_state *dc, int rc, int onwrite, int errnoval)
 {
     libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, keystrokes);
-    bootloader_copyfail(egc, "bootloader input", bl, 0, onwrite, errnoval);
+    bootloader_copyfail(egc, "bootloader input", bl, 0, rc,onwrite,errnoval);
 }
 static void bootloader_display_copyfail(libxl__egc *egc,
-       libxl__datacopier_state *dc, int onwrite, int errnoval)
+       libxl__datacopier_state *dc, int rc, int onwrite, int errnoval)
 {
     libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, display);
-    bootloader_copyfail(egc, "bootloader output", bl, 1, onwrite, errnoval);
+    bootloader_copyfail(egc, "bootloader output", bl, 1, rc,onwrite,errnoval);
 }
 
 static void bootloader_domaindeath(libxl__egc *egc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 87b5fdd..ff0f55d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1918,7 +1918,7 @@ out:
 }
 
 static void save_device_model_datacopier_done(libxl__egc *egc,
-     libxl__datacopier_state *dc, int onwrite, int errnoval);
+     libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
 
 void libxl__domain_save_device_model(libxl__egc *egc,
                                      libxl__domain_suspend_state *dss,
@@ -1977,11 +1977,11 @@ void libxl__domain_save_device_model(libxl__egc *egc,
     return;
 
  out:
-    save_device_model_datacopier_done(egc, dc, -1, 0);
+    save_device_model_datacopier_done(egc, dc, rc, -1, EIO);
 }
 
 static void save_device_model_datacopier_done(libxl__egc *egc,
-     libxl__datacopier_state *dc, int onwrite, int errnoval)
+     libxl__datacopier_state *dc, int our_rc, int onwrite, int errnoval)
 {
     libxl__domain_suspend_state *dss =
         CONTAINER_OF(dc, *dss, save_dm_datacopier);
@@ -1989,14 +1989,10 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
 
     /* Convenience aliases */
     const char *const filename = dss->dm_savefile;
-    int our_rc = 0;
     int rc;
 
     libxl__datacopier_kill(dc);
 
-    if (onwrite || errnoval)
-        our_rc = ERROR_FAIL;
-
     if (dc->readfd >= 0) {
         close(dc->readfd);
         dc->readfd = -1;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index eb8e5f2..29a612c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2562,16 +2562,20 @@ _hidden void libxl__device_disk_local_initiate_detach(libxl__egc *egc,
 typedef struct libxl__datacopier_state libxl__datacopier_state;
 typedef struct libxl__datacopier_buf libxl__datacopier_buf;
 
-/* onwrite==1 means failure happened when writing, logged, errnoval is valid
- * onwrite==0 means failure happened when reading
- *     errnoval==0 means we got eof and all data was written
- *     errnoval!=0 means we had a read error, logged
- * onwrite==-1 means some other internal failure, errnoval not valid, logged
- * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1);
+/* onwrite==1 means problem happened when writing
+ *     rc==FAIL    errnoval >0    we had a write error, logged
+ * onwrite==0 means problem happened when reading
+ *     rc==0       errnoval==0    we got eof and all data was written
+ *     rc==FAIL    errnoval >0    we had a read error, logged
+ * onwrite==-1 means some other internal problem
+ *     rc==FAIL    errnoval==EIO  some other internal failure, logged
+ *     rc==CANCEL  errnoval==0    cancellation requested, not logged
+ * If we get POLLHUP, we call callback_pollhup with
+ *     rc==FAIL    errnoval==-1   POLLHUP signalled
  * or if callback_pollhup==0 this is an internal failure, as above.
  * In all cases copier is killed before calling this callback */
 typedef void libxl__datacopier_callback(libxl__egc *egc,
-     libxl__datacopier_state *dc, int onwrite, int errnoval);
+     libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
 
 struct libxl__datacopier_buf {
     /* private to datacopier */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (27 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 28/29] libxl: ao: datacopier callback gets an rc Ian Jackson
@ 2015-02-10 20:10 ` Ian Jackson
  2015-03-24 12:11   ` Ian Campbell
  2015-02-10 20:13 ` [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:10 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Euan Harris

libxl__datacopier_* can now actually generate a callback with
rc==CANCELLED.

This provides cancellation during some corner cases, including (at
least) copying the device model data during the end of domain save.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
v2: New in this version of the series.
---
 tools/libxl/libxl_aoutils.c  |   16 ++++++++++++++++
 tools/libxl/libxl_internal.h |    1 +
 2 files changed, 17 insertions(+)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index ece7981..919bf12 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -103,6 +103,7 @@ static void xswait_report_error(libxl__egc *egc, libxl__xswait_state *xswa,
 void libxl__datacopier_init(libxl__datacopier_state *dc)
 {
     assert(dc->ao);
+    libxl__ao_cancellable_init(&dc->cancel);
     libxl__ev_fd_init(&dc->toread);
     libxl__ev_fd_init(&dc->towrite);
     LIBXL_TAILQ_INIT(&dc->bufs);
@@ -113,6 +114,7 @@ void libxl__datacopier_kill(libxl__datacopier_state *dc)
     STATE_AO_GC(dc->ao);
     libxl__datacopier_buf *buf, *tbuf;
 
+    libxl__ao_cancellable_deregister(&dc->cancel);
     libxl__ev_fd_deregister(gc, &dc->toread);
     libxl__ev_fd_deregister(gc, &dc->towrite);
     LIBXL_TAILQ_FOREACH_SAFE(buf, &dc->bufs, entry, tbuf)
@@ -196,6 +198,15 @@ static int datacopier_pollhup_handled(libxl__egc *egc,
     return 0;
 }
 
+static void datacopier_cancel(libxl__egc *egc, libxl__ao_cancellable *cancel,
+                              int rc)
+{
+    libxl__datacopier_state *dc = CONTAINER_OF(cancel, *dc, cancel);
+    STATE_AO_GC(dc->ao);
+
+    datacopier_callback(egc, dc, rc, -1, 0);
+}
+
 static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
                                 int fd, short events, short revents) {
     libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
@@ -312,6 +323,11 @@ int libxl__datacopier_start(libxl__datacopier_state *dc)
 
     libxl__datacopier_init(dc);
 
+    dc->cancel.ao = ao;
+    dc->cancel.callback = datacopier_cancel;
+    rc = libxl__ao_cancellable_register(&dc->cancel);
+    if (rc) goto out;
+
     rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
                                dc->readfd, POLLIN);
     if (rc) goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 29a612c..9d9f376 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2594,6 +2594,7 @@ struct libxl__datacopier_state {
     libxl__datacopier_callback *callback;
     libxl__datacopier_callback *callback_pollhup;
     /* remaining fields are private to datacopier */
+    libxl__ao_cancellable cancel;
     libxl__ev_fd toread, towrite;
     ssize_t used;
     LIBXL_TAILQ_HEAD(libxl__datacopier_bufs, libxl__datacopier_buf) bufs;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (28 preceding siblings ...)
  2015-02-10 20:10 ` [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable Ian Jackson
@ 2015-02-10 20:13 ` Ian Jackson
  2015-02-18 16:10 ` Euan Harris
  2015-03-03 12:08 ` Ian Campbell
  31 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-10 20:13 UTC (permalink / raw)
  To: xen-devel, Euan Harris

Ian Jackson writes ("[RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations"):
> This is v2 of my work-in-progress series to support cancellation of
> long-running libxl operations.
> 
> There are many improvements since v1, but the basic structure remains
> the same and the external API remains unchanged.
> 
> I have rebased this onto current staging.  I have compiled it but
> NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
> you something you could start to work on building against.
> 
> I wouldn't recommend testing it yet until I've at least smoke tested
> it to see that things still work if you don't cancel them.
> 
> Here's a list of the patches:

These are also here

 http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=summary
 git://xenbits.xen.org/people/iwj/xen.git

 base.ao-cancel.v2..wip.ao-cancel.v2..

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc
  2015-02-10 20:09 ` [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc Ian Jackson
@ 2015-02-11  1:04   ` Wen Congyang
  2015-02-11 12:04     ` Ian Jackson
  2015-03-31 18:09     ` Ian Jackson
  2015-03-24 11:09   ` Ian Campbell
  1 sibling, 2 replies; 98+ messages in thread
From: Wen Congyang @ 2015-02-11  1:04 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Yang Hongyang, Euan Harris, Lai Jiangshan

On 02/11/2015 04:09 AM, Ian Jackson wrote:
> Change the timeout setup functions to take a libxl__ao, not a
> libxl__gc.  This is going to be needed for ao cancellation, because
> timeouts are going to be a main hook for ao cancellation - so the
> timeouts need to be associated with an ao.
> 
> This means that timeouts can only occur as part of a long-running
> libxl function (but this is of course correct, as libxl shouldn't have
> any global timeouts, and indeed all the call sites have an ao).
> 
> Also remove the gc parameter from libxl__async_exec_start.  It can
> just use the gc from the ao supplied in the aes.
> 
> All the callers follow the obvious patterns and therefore supply the
> ao's gc to libxl__async_exec_start and the timeout setup functions.
> There is therefore no functional change in this patch.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> CC: Lai Jiangshan <laijs@cn.fujitsu.com>

libxl__async_exec_start() related modifications look fine to me.

> ---
> v2: This patch split off from "Permit timeouts to signal cancellation".
>     Rebased; consequently, deal with libxl__async_exec_start.
>     CC'd authors of the libxl__async_exec_* functions.
> ---
>  tools/libxl/libxl_aoutils.c         |    8 +++++---
>  tools/libxl/libxl_device.c          |    4 ++--
>  tools/libxl/libxl_dom.c             |    8 ++++----
>  tools/libxl/libxl_event.c           |    6 ++++--
>  tools/libxl/libxl_internal.h        |    6 +++---
>  tools/libxl/libxl_remus_disk_drbd.c |    2 +-
>  tools/libxl/libxl_test_timedereg.c  |    9 +++++----
>  7 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index 44dc222..754e2d1 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -46,7 +46,7 @@ int libxl__xswait_start(libxl__gc *gc, libxl__xswait_state *xswa)
>  {
>      int rc;
>  
> -    rc = libxl__ev_time_register_rel(gc, &xswa->time_ev,
> +    rc = libxl__ev_time_register_rel(xswa->ao, &xswa->time_ev,
>                                       xswait_timeout_callback, xswa->timeout_ms);
>      if (rc) goto err;
>  
> @@ -496,16 +496,18 @@ void libxl__async_exec_init(libxl__async_exec_state *aes)
>      libxl__ev_child_init(&aes->child);
>  }
>  
> -int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes)
> +int libxl__async_exec_start(libxl__async_exec_state *aes)
>  {
>      pid_t pid;
>  
>      /* Convenience aliases */
> +    libxl__ao *ao = aes->ao;
> +    AO_GC;
>      libxl__ev_child *const child = &aes->child;
>      char ** const args = aes->args;
>  
>      /* Set execution timeout */
> -    if (libxl__ev_time_register_rel(gc, &aes->time,
> +    if (libxl__ev_time_register_rel(ao, &aes->time,
>                                      async_exec_timeout,
>                                      aes->timeout_ms)) {
>          LOG(ERROR, "unable to register timeout for executing: %s", aes->what);
> diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
> index 0455134..c80749f 100644
> --- a/tools/libxl/libxl_device.c
> +++ b/tools/libxl/libxl_device.c
> @@ -808,7 +808,7 @@ void libxl__initiate_device_remove(libxl__egc *egc,
>               * TODO: 4.2 Bodge due to QEMU, see comment on top of
>               * libxl__initiate_device_remove in libxl_internal.h
>               */
> -            rc = libxl__ev_time_register_rel(gc, &aodev->timeout,
> +            rc = libxl__ev_time_register_rel(ao, &aodev->timeout,
>                                               device_qemu_timeout,
>                                               LIBXL_QEMU_BODGE_TIMEOUT * 1000);
>              if (rc) {
> @@ -1034,7 +1034,7 @@ static void device_hotplug(libxl__egc *egc, libxl__ao_device *aodev)
>      aes->stdfds[1] = 2;
>      aes->stdfds[2] = -1;
>  
> -    rc = libxl__async_exec_start(gc, aes);
> +    rc = libxl__async_exec_start(aes);
>      if (rc)
>          goto out;
>  
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 90877d6..e292cb3 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -980,7 +980,7 @@ static void domain_suspend_switch_qemu_xen_traditional_logdirty
>                                  switch_logdirty_xswatch, lds->ret_path);
>      if (rc) goto out;
>  
> -    rc = libxl__ev_time_register_rel(gc, &lds->timeout,
> +    rc = libxl__ev_time_register_rel(ao, &lds->timeout,
>                                  switch_logdirty_timeout, 10*1000);
>      if (rc) goto out;
>  
> @@ -1260,7 +1260,7 @@ static void domain_suspend_callback_common(libxl__egc *egc,
>          rc = libxl__ev_evtchn_wait(gc, &dss->guest_evtchn);
>          if (rc) goto err;
>  
> -        rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
> +        rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
>                                           suspend_common_wait_guest_timeout,
>                                           60*1000);
>          if (rc) goto err;
> @@ -1391,7 +1391,7 @@ static void domain_suspend_common_wait_guest(libxl__egc *egc,
>                                      "@releaseDomain");
>      if (rc) goto err;
>  
> -    rc = libxl__ev_time_register_rel(gc, &dss->guest_timeout,
> +    rc = libxl__ev_time_register_rel(ao, &dss->guest_timeout,
>                                       suspend_common_wait_guest_timeout,
>                                       60*1000);
>      if (rc) goto err;
> @@ -1751,7 +1751,7 @@ static void remus_devices_commit_cb(libxl__egc *egc,
>       */
>  
>      /* Set checkpoint interval timeout */
> -    rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
> +    rc = libxl__ev_time_register_rel(ao, &dss->checkpoint_timeout,
>                                       remus_next_checkpoint,
>                                       dss->interval);
>  
> diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c
> index fb6daeb..1a97cf8 100644
> --- a/tools/libxl/libxl_event.c
> +++ b/tools/libxl/libxl_event.c
> @@ -309,10 +309,11 @@ static void time_done_debug(libxl__gc *gc, const char *func,
>  #endif
>  }
>  
> -int libxl__ev_time_register_abs(libxl__gc *gc, libxl__ev_time *ev,
> +int libxl__ev_time_register_abs(libxl__ao *ao, libxl__ev_time *ev,
>                                  libxl__ev_time_callback *func,
>                                  struct timeval absolute)
>  {
> +    AO_GC;
>      int rc;
>  
>      CTX_LOCK;
> @@ -333,10 +334,11 @@ int libxl__ev_time_register_abs(libxl__gc *gc, libxl__ev_time *ev,
>  }
>  
>  
> -int libxl__ev_time_register_rel(libxl__gc *gc, libxl__ev_time *ev,
> +int libxl__ev_time_register_rel(libxl__ao *ao, libxl__ev_time *ev,
>                                  libxl__ev_time_callback *func,
>                                  int milliseconds /* as for poll(2) */)
>  {
> +    AO_GC;
>      struct timeval absolute;
>      int rc;
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 6bb208c..b615fc5 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -770,10 +770,10 @@ static inline void libxl__ev_fd_init(libxl__ev_fd *efd)
>  static inline int libxl__ev_fd_isregistered(const libxl__ev_fd *efd)
>                      { return efd->fd >= 0; }
>  
> -_hidden int libxl__ev_time_register_rel(libxl__gc*, libxl__ev_time *ev_out,
> +_hidden int libxl__ev_time_register_rel(libxl__ao*, libxl__ev_time *ev_out,
>                                          libxl__ev_time_callback*,
>                                          int milliseconds /* as for poll(2) */);
> -_hidden int libxl__ev_time_register_abs(libxl__gc*, libxl__ev_time *ev_out,
> +_hidden int libxl__ev_time_register_abs(libxl__ao*, libxl__ev_time *ev_out,
>                                          libxl__ev_time_callback*,
>                                          struct timeval);
>  _hidden int libxl__ev_time_modify_rel(libxl__gc*, libxl__ev_time *ev,
> @@ -2108,7 +2108,7 @@ struct libxl__async_exec_state {
>  };
>  
>  void libxl__async_exec_init(libxl__async_exec_state *aes);
> -int libxl__async_exec_start(libxl__gc *gc, libxl__async_exec_state *aes);
> +int libxl__async_exec_start(libxl__async_exec_state *aes);
>  bool libxl__async_exec_inuse(const libxl__async_exec_state *aes);
>  
>  /*----- device addition/removal -----*/
> diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
> index afe9b61..5e0c9a6 100644
> --- a/tools/libxl/libxl_remus_disk_drbd.c
> +++ b/tools/libxl/libxl_remus_disk_drbd.c
> @@ -120,7 +120,7 @@ static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
>      aes->stdfds[1] = -1;
>      aes->stdfds[2] = -1;
>  
> -    rc = libxl__async_exec_start(gc, aes);
> +    rc = libxl__async_exec_start(aes);
>      if (rc)
>          goto out;
>  
> diff --git a/tools/libxl/libxl_test_timedereg.c b/tools/libxl/libxl_test_timedereg.c
> index a44639f..e2cc27d 100644
> --- a/tools/libxl/libxl_test_timedereg.c
> +++ b/tools/libxl/libxl_test_timedereg.c
> @@ -30,12 +30,13 @@ static int seq;
>  static void occurs(libxl__egc *egc, libxl__ev_time *ev,
>                     const struct timeval *requested_abs);
>  
> -static void regs(libxl__gc *gc, int j)
> +static void regs(libxl__ao *ao, int j)
>  {
> +    AO_GC;
>      int rc, i;
>      LOG(DEBUG,"regs(%d)", j);
>      for (i=0; i<NTIMES; i++) {
> -        rc = libxl__ev_time_register_rel(gc, &et[j][i], occurs, ms[j][i]);
> +        rc = libxl__ev_time_register_rel(ao, &et[j][i], occurs, ms[j][i]);
>          assert(!rc);
>      }    
>  }
> @@ -52,7 +53,7 @@ int libxl_test_timedereg(libxl_ctx *ctx, libxl_asyncop_how *ao_how)
>          libxl__ev_time_init(&et[1][i]);
>      }
>  
> -    regs(gc, 0);
> +    regs(ao, 0);
>  
>      return AO_INPROGRESS;
>  }
> @@ -71,7 +72,7 @@ static void occurs(libxl__egc *egc, libxl__ev_time *ev,
>          assert(ev == &et[0][1]);
>          libxl__ev_time_deregister(gc, &et[0][0]);
>          libxl__ev_time_deregister(gc, &et[0][2]);
> -        regs(gc, 1);
> +        regs(tao, 1);
>          libxl__ev_time_deregister(gc, &et[0][1]);
>          break;
>  
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/29] libxl: Comment cleanups
  2015-02-10 20:09 ` [PATCH 02/29] libxl: Comment cleanups Ian Jackson
@ 2015-02-11  1:46   ` Hongyang Yang
  2015-02-11 12:05     ` Ian Jackson
  2015-03-30  9:08     ` Ian Campbell
  2015-03-24 10:50   ` Ian Campbell
  1 sibling, 2 replies; 98+ messages in thread
From: Hongyang Yang @ 2015-02-11  1:46 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Euan Harris, Lai Jiangshan, Wen Congyang



在 02/11/2015 04:09 AM, Ian Jackson 写道:
> * Add two comments in libxl_remus_disk_drbd documenting buggy handling
>    of the hotplug script exit status.
>
> * Add a section heading for async exec in libxl_aoutils.c
>
> * Mention the right function name (libxl__ev_child_fork, not
>    libxl__ev_fork) in libxl_internal.h
>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> CC: Lai Jiangshan <laijs@cn.fujitsu.com>

For async exec and Remus part,
Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>

> ---
> v2: New patch in this version of the series.
> ---
>   tools/libxl/libxl_aoutils.c         |    2 ++
>   tools/libxl/libxl_internal.h        |    2 +-
>   tools/libxl/libxl_remus_disk_drbd.c |    2 ++
>   3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index b10d2e1..44dc222 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -451,6 +451,8 @@ int libxl__openptys(libxl__openpty_state *op,
>       return rc;
>   }
>
> +/*----- async exec -----*/
> +
>   static void async_exec_timeout(libxl__egc *egc,
>                                  libxl__ev_time *ev,
>                                  const struct timeval *requested_abs)
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 934465a..99db92a 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1393,7 +1393,7 @@ _hidden int libxl__spawn_record_pid(libxl__gc*, libxl__spawn_state*,
>    * This is a NOT function for waiting for ordinary child processes.
>    * If you want to run (fork/exec/wait) subprocesses from libxl:
>    *  - Make your libxl entrypoint use the ao machinery
> - *  - Use libxl__ev_fork, and use the callback programming style
> + *  - Use libxl__ev_child_fork, and use the callback programming style
>    *
>    * This function is intended for interprocess communication with a
>    * service process.  If the service process does not respond quickly,
> diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
> index 3215f93..afe9b61 100644
> --- a/tools/libxl/libxl_remus_disk_drbd.c
> +++ b/tools/libxl/libxl_remus_disk_drbd.c
> @@ -145,6 +145,8 @@ static void match_async_exec_cb(libxl__egc *egc,
>
>       if (status) {
>           rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
> +        /* BUG: seems to assume that any exit status means `no match' */
> +        /* BUG: exit status will have been logged as an error */

Yes, the error handling needs to be improved. I will prepare a patch for this.
The comments is fine for now.

>           goto out;
>       }
>
>

-- 
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc
  2015-02-11  1:04   ` Wen Congyang
@ 2015-02-11 12:04     ` Ian Jackson
  2015-03-31 18:09     ` Ian Jackson
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-11 12:04 UTC (permalink / raw)
  To: Wen Congyang; +Cc: xen-devel, Euan Harris, Lai Jiangshan, Yang Hongyang

Wen Congyang writes ("Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc"):
> On 02/11/2015 04:09 AM, Ian Jackson wrote:
> > Change the timeout setup functions to take a libxl__ao, not a
> > libxl__gc.  This is going to be needed for ao cancellation, because
> > timeouts are going to be a main hook for ao cancellation - so the
> > timeouts need to be associated with an ao.
...
> > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wen Congyang <wency@cn.fujitsu.com>
> > CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> libxl__async_exec_start() related modifications look fine to me.

Thanks for the review.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/29] libxl: Comment cleanups
  2015-02-11  1:46   ` Hongyang Yang
@ 2015-02-11 12:05     ` Ian Jackson
  2015-03-30  9:08     ` Ian Campbell
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-11 12:05 UTC (permalink / raw)
  To: Hongyang Yang; +Cc: xen-devel, Euan Harris, Lai Jiangshan, Wen Congyang

Hongyang Yang writes ("Re: [PATCH 02/29] libxl: Comment cleanups"):
> 在 02/11/2015 04:09 AM, Ian Jackson 写道:
> > * Add two comments in libxl_remus_disk_drbd documenting buggy handling
> >    of the hotplug script exit status.
...
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wen Congyang <wency@cn.fujitsu.com>
> > CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> For async exec and Remus part,
> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (29 preceding siblings ...)
  2015-02-10 20:13 ` [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
@ 2015-02-18 16:10 ` Euan Harris
  2015-02-18 16:12   ` Ian Jackson
  2015-04-07 17:08   ` Euan Harris
  2015-03-03 12:08 ` Ian Campbell
  31 siblings, 2 replies; 98+ messages in thread
From: Euan Harris @ 2015-02-18 16:10 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, james.bulpin

Hi,

On Tue, Feb 10, 2015 at 08:09:47PM +0000, Ian Jackson wrote:
> I have rebased this onto current staging.  I have compiled it but
> NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
> you something you could start to work on building against.
> 
> I wouldn't recommend testing it yet until I've at least smoke tested
> it to see that things still work if you don't cancel them.

We had a chat about testing these changes, and integrating them into xenopsd.
We agreed that we each had slightly different expectations of what we were 
going to do, and when.   I think we came to the following major conclusions:

  - I will start work on a simple test framework for cancellation,
    hopefully to have first results in a fortnight or so.
  - Once the test framework is available you will fix whatever bugs it
    unearths, then we will rinse and repeat.
  - You will think some more about the possibility of adding cancellation
    to the xl command line tool, but since this is tricky there is no 
    expectation of when it might happen.

In the slightly longer term, we expect:

  - More testing and integration effort from Xapi project members in March
    or April.
  - Investigation of the idea of a xenopsd-based push gate, similar to the 
    current libvirt push gate.

Have I got the main points right, or forgotten anything important?   

Thanks,
Euan

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-02-18 16:10 ` Euan Harris
@ 2015-02-18 16:12   ` Ian Jackson
  2015-04-07 17:08   ` Euan Harris
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-02-18 16:12 UTC (permalink / raw)
  To: Euan Harris; +Cc: xen-devel, james.bulpin

Euan Harris writes ("Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations"):
> On Tue, Feb 10, 2015 at 08:09:47PM +0000, Ian Jackson wrote:
> > I have rebased this onto current staging.  I have compiled it but
> > NOT EXECUTED IT AT ALL.  Euan, I thought it would be useful to give
> > you something you could start to work on building against.
> > 
> > I wouldn't recommend testing it yet until I've at least smoke tested
> > it to see that things still work if you don't cancel them.
> 
> We had a chat about testing these changes, and integrating them into xenopsd.
> We agreed that we each had slightly different expectations of what we were 
> going to do, and when.   I think we came to the following major conclusions:
> 
>   - I will start work on a simple test framework for cancellation,
>     hopefully to have first results in a fortnight or so.
>   - Once the test framework is available you will fix whatever bugs it
>     unearths, then we will rinse and repeat.
>   - You will think some more about the possibility of adding cancellation
>     to the xl command line tool, but since this is tricky there is no 
>     expectation of when it might happen.
> 
> In the slightly longer term, we expect:
> 
>   - More testing and integration effort from Xapi project members in March
>     or April.
>   - Investigation of the idea of a xenopsd-based push gate, similar to the 
>     current libvirt push gate.
> 
> Have I got the main points right, or forgotten anything important?   

That seems about right, thanks.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler
  2015-02-10 20:09 ` [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler Ian Jackson
@ 2015-02-18 16:23   ` Roger Pau Monné
  2015-03-24 10:49     ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2015-02-18 16:23 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Euan Harris

El 10/02/15 a les 21.09, Ian Jackson ha escrit:
> On the success path, do not call GC_FREE explicitly.  Instead, call
> AO_INPROGRESS.
> 
> GC_FREE will free the gc underlying the long-term ao, which is then
> subsequently referenced in backend_watch_callback's call to
> libxl__nested_ao_create.  It is a miracle that this ever works at all.
> 
> Also, add an `if (rc) goto out;' after the xswatch registration.
> 
> After this, libxl_device_events_handler has the conventional and
> correct ao initiation pattern.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled
  2015-02-10 20:10 ` [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled Ian Jackson
@ 2015-02-18 16:24   ` Roger Pau Monné
  2015-03-24 11:51   ` Ian Campbell
  1 sibling, 0 replies; 98+ messages in thread
From: Roger Pau Monné @ 2015-02-18 16:24 UTC (permalink / raw)
  To: Ian Jackson, xen-devel; +Cc: Euan Harris

El 10/02/15 a les 21.10, Ian Jackson ha escrit:
> In practice, cancelling this task will cause all subsequent actual
> backend operations to fail, but will not actually cause the
> libxl_device_events_handler operation to complete.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
                   ` (30 preceding siblings ...)
  2015-02-18 16:10 ` Euan Harris
@ 2015-03-03 12:08 ` Ian Campbell
  2015-03-20 10:39   ` Euan Harris
  2015-04-07 17:22   ` Ian Jackson
  31 siblings, 2 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-03 12:08 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> This is v2 of my work-in-progress series to support cancellation of
> long-running libxl operations.
[...]
> I wouldn't recommend testing it yet until I've at least smoke tested
> it to see that things still work if you don't cancel them.

Would review of the series be useful and/or appreciated at this stage?

Perhaps the first half dozen or so look like preparatory cleanups which
I could sensibly look at?
> 
> Here's a list of the patches:
> 
>   01/29  libxl: Further fix exit paths from libxl_device_events_handler
>   02/29  libxl: Comment cleanups
>   03/29  libxl: suspend: switch_logdirty_done takes rc
>   04/29  libxl: suspend: common suspend callbacks take rc
>   05/29  libxl: suspend: Return correct error from callbacks
>   06/29  libxl: Use libxl__xswait* in libxl__ao_device
>   07/29  libxl: xswait/devstate: Move xswait to before devstate
>   08/29  libxl: devstate: Use libxl__xswait*
>   09/29  libxl: New error codes CANCELLED etc.
>   10/29  libxl: events: Make timeout and async exec setup take an ao, not a gc
>   11/29  libxl: events: Make libxl__async_exec_* pass caller an rc
>   12/29  libxl: events: Permit timeouts to signal cancellation
>   13/29  libxl: domain create: Do not destroy on cancellation
>   14/29  libxl: ao: Record ultimate parent of a nested ao
>   15/29  libxl: ao: Count the nested progeny of an ao
>   16/29  libxl: ao: Provide manip_refcnt
>   17/29  libxl: cancellation: Provide public ao cancellation API
>   18/29  libxl: cancellation: Provide explicit internal cancel check API
>   19/29  libxl: cancellation: Make timeouts cancellable
>   20/29  libxl: cancellation: Note that driver domain task cannot be cancelled
>   21/29  libxl: cancellation: Make spawns cancellable
>   22/29  libxl: Introduce DOMAIN_DESTROYED error code
>   23/29  libxl: cancellation: Support cancellation where we spot domain death
>   24/29  libxl: Introduce FILLZERO
>   25/29  libxl: cancellation: Preparations for save/restore cancellation
>   26/29  libxl: cancellation: Handle SIGTERM in save/restore helper
>   27/29  libxl: cancellation: Cancel libxc save/restore
>   28/29  libxl: ao: datacopier callback gets an rc
>   29/29  libxl: cancellation: Make datacopiers cancellable
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-03-03 12:08 ` Ian Campbell
@ 2015-03-20 10:39   ` Euan Harris
  2015-04-07 17:22   ` Ian Jackson
  1 sibling, 0 replies; 98+ messages in thread
From: Euan Harris @ 2015-03-20 10:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Ian Jackson

On Tue, Mar 03, 2015 at 12:08:04PM +0000, Ian Campbell wrote:
> > I wouldn't recommend testing it yet until I've at least smoke tested
> > it to see that things still work if you don't cancel them.
> 
> Would review of the series be useful and/or appreciated at this stage?
> 
> Perhaps the first half dozen or so look like preparatory cleanups which
> I could sensibly look at?

Yes, that would be great.   I've read through the whole series fairly carefully,
and it looks sensible, but you will be better placed to see whether it fits well
with the rest of libxl.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler
  2015-02-18 16:23   ` Roger Pau Monné
@ 2015-03-24 10:49     ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 10:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Euan Harris, xen-devel, Ian Jackson

On Wed, 2015-02-18 at 17:23 +0100, Roger Pau Monné wrote:
> El 10/02/15 a les 21.09, Ian Jackson ha escrit:
> > On the success path, do not call GC_FREE explicitly.  Instead, call
> > AO_INPROGRESS.
> > 
> > GC_FREE will free the gc underlying the long-term ao, which is then
> > subsequently referenced in backend_watch_callback's call to
> > libxl__nested_ao_create.  It is a miracle that this ever works at all.
> > 
> > Also, add an `if (rc) goto out;' after the xswatch registration.
> > 
> > After this, libxl_device_events_handler has the conventional and
> > correct ao initiation pattern.
> > 
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/29] libxl: Comment cleanups
  2015-02-10 20:09 ` [PATCH 02/29] libxl: Comment cleanups Ian Jackson
  2015-02-11  1:46   ` Hongyang Yang
@ 2015-03-24 10:50   ` Ian Campbell
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 10:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, Euan Harris, Lai Jiangshan, Wen Congyang, Yang Hongyang

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> * Add two comments in libxl_remus_disk_drbd documenting buggy handling
>   of the hotplug script exit status.
> 
> * Add a section heading for async exec in libxl_aoutils.c
> 
> * Mention the right function name (libxl__ev_child_fork, not
>   libxl__ev_fork) in libxl_internal.h
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> CC: Lai Jiangshan <laijs@cn.fujitsu.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc
  2015-02-10 20:09 ` [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc Ian Jackson
@ 2015-03-24 10:53   ` Ian Campbell
  2015-03-31 18:02     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 10:53 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> +    int broke;
> +    if (rc) {
> +        broke = -1;
> +    } else {
> +        broke = 0;
> +    }

int broke = rc ? -1 : 0;

?

But it looks like perhaps you are preparing to add other code in one or
the other case, or maybe you just prefer this for some reason. Either
way:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

>      libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, broke);
>  }
>  

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/29] libxl: suspend: common suspend callbacks take rc
  2015-02-10 20:09 ` [PATCH 04/29] libxl: suspend: common suspend callbacks take rc Ian Jackson
@ 2015-03-24 10:55   ` Ian Campbell
  2015-03-31 18:03     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 10:55 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> Change the following functions to take a libxl error code rather than
> a boolean "ok" value, and translate that value to the boolean expected
> by libxc at the last moment:
>   domain_suspend_callback_common_done        } dss->callback_common_done
>   remus_domain_suspend_callback_common_done  }
>   domain_suspend_common_done
> 
> Also, abolish domain_suspend_common_failed as
> domain_suspend_common_done can easily do its job and the call sites
> now have to supply the right rc value anyway.
> 
> In domain_suspend_common_guest_suspended, change "ret" to "rc"
> as it contains a libxl error code.
> 
> There is no functional change in this patch: the proper rc value now
> propagates further, but is still eventually smashed to a boolean.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

There are a few new ERROR_FAILs which we might like to consider making
more specific either now or later.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/29] libxl: suspend: Return correct error from callbacks
  2015-02-10 20:09 ` [PATCH 05/29] libxl: suspend: Return correct error from callbacks Ian Jackson
@ 2015-03-24 10:58   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 10:58 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> If a suspend callback fails, it has a libxl error code in its hand.
> However we must return to libxc the values that libxc expects.  So we
> stash the libxl error code in dss->rc and fish it out again after
> libxc returns from the suspend call.
> 
> While we're here, abolish the now-redundant `ok' variable in
> remus_devices_postsuspend_cb.
> 
> The overall functional change is that libxl_domain_save now completes
> with the correct error code as determined when the underlying failure
> happened.  (Usually this is, still, ERROR_FAIL.)
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device
  2015-02-10 20:09 ` [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device Ian Jackson
@ 2015-03-24 11:04   ` Ian Campbell
  2015-03-31 18:06     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> @@ -1164,7 +1136,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
>  {
>      /* Clean events and check reentrancy */
>      libxl__ev_time_deregister(gc, &aodev->timeout);

You seem to have removed the initialisation of this in a previous hunk
but not this deregistration or the field itself.

Was that deliberate, perhaps it was serving dual purpose somewhere?

> -    libxl__ev_xswatch_deregister(gc, &aodev->xs_watch);
> +    libxl__xswait_stop(gc, &aodev->xswait);
>      assert(!libxl__async_exec_inuse(&aodev->aes));
>  }
>  
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 2862c69..5a76d51 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2152,7 +2152,7 @@ struct libxl__ao_device {
>      /* Bodge for Qemu devices */
>      libxl__ev_time timeout;
>      /* xenstore watch for backend path of driver domains */
> -    libxl__ev_xswatch xs_watch;
> +    libxl__xswait_state xswait;
>      int num_exec;
>      /* for calling hotplug scripts */
>      libxl__async_exec_state aes;

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate
  2015-02-10 20:09 ` [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate Ian Jackson
@ 2015-03-24 11:05   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:05 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> Pure code motion.  We are going to make devstate use xswait.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/29] libxl: devstate: Use libxl__xswait*
  2015-02-10 20:09 ` [PATCH 08/29] libxl: devstate: Use libxl__xswait* Ian Jackson
@ 2015-03-24 11:07   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:07 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 09/29] libxl: New error codes CANCELLED etc.
  2015-02-10 20:09 ` [PATCH 09/29] libxl: New error codes CANCELLED etc Ian Jackson
@ 2015-03-24 11:08   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:08 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> We introduce ERROR_CANCELLED now, so that we can write code to handle
> it, and decreee that functions might return it, even though currently
> there is nowhere where this error is generated.
> 
> While we're here, provide ERROR_NOTFOUND and ERROR_NOTIMPLEMENTED,
> which will also be used later, but only as part of the public API.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc
  2015-02-10 20:09 ` [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc Ian Jackson
  2015-02-11  1:04   ` Wen Congyang
@ 2015-03-24 11:09   ` Ian Campbell
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:09 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, Euan Harris, Lai Jiangshan, Wen Congyang, Yang Hongyang

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> Change the timeout setup functions to take a libxl__ao, not a
> libxl__gc.  This is going to be needed for ao cancellation, because
> timeouts are going to be a main hook for ao cancellation - so the
> timeouts need to be associated with an ao.
> 
> This means that timeouts can only occur as part of a long-running
> libxl function (but this is of course correct, as libxl shouldn't have
> any global timeouts, and indeed all the call sites have an ao).
> 
> Also remove the gc parameter from libxl__async_exec_start.  It can
> just use the gc from the ao supplied in the aes.
> 
> All the callers follow the obvious patterns and therefore supply the
> ao's gc to libxl__async_exec_start and the timeout setup functions.
> There is therefore no functional change in this patch.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> CC: Wen Congyang <wency@cn.fujitsu.com>
> CC: Lai Jiangshan <laijs@cn.fujitsu.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc
  2015-02-10 20:09 ` [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc Ian Jackson
@ 2015-03-24 11:20   ` Ian Campbell
  2015-03-31 18:12     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:20 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> The internal user of libxl__async_exec_start et al now gets an rc as
> well as the process's exit status.
> 
> For now this is always either 0 or ERROR_FAIL, but with ao
> cancellation this will possibly be CANCELLED or TIMEDOUT too.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> ---
> v2: New patch due to rebause; v1 had changes to device_hotplug_*
>      scripts instead.
>     Callback now gets unambiguous information about error situation:
>      previously, if only thing that went wrong was that child died
>      badly, rc would be FAILED, which was unambigously; now rc=0.
>     Add a comment document the meaning of the rc and status parameters
>      to the callback.
> ---
>  tools/libxl/libxl_aoutils.c         |    9 ++++++---
>  tools/libxl/libxl_device.c          |   13 +++++++++----
>  tools/libxl/libxl_internal.h        |   11 ++++++++++-
>  tools/libxl/libxl_netbuffer.c       |   19 ++++++++++---------
>  tools/libxl/libxl_remus_disk_drbd.c |    8 +++++---
>  5 files changed, 40 insertions(+), 20 deletions(-)
> 
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index 754e2d1..891cdb8 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -483,11 +483,12 @@ static void async_exec_done(libxl__egc *egc,
>      libxl__ev_time_deregister(gc, &aes->time);
>  
>      if (status) {
> -        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
> -                                      aes->what, pid, status);
> +        if (!aes->rc)

Could be one "if (status && !aes->rc)", unless perhaps there is more
code to come in this block?

> +                        libxl__async_exec_state *aes, int rc, int status);
> +/*
> + * Meaning of status and rc:
> + *  rc==0, status==0    all went well
> + *  rc==0, status!=0    everything OK except child exited nonzero (logged)
> + *  rc!=0               something else went wrong (status is real
> + *                       exit status, maybe reflecting SIGKILL if aes
> + *                       code killed the child).  Logged unless CANCELLED.

I'm unclear on whether status is valid in this third case or not. I
think you are saying that it is (probably?) valid but if rc!=0 the
caller likely doesn't actually care what it is?

Everything else looks fine to me.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation
  2015-02-10 20:09 ` [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation Ian Jackson
@ 2015-03-24 11:22   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:22 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> The callback functions provided by users must take an rc value.  This
> rc value can be ERROR_TIMEDOUT or ERROR_CANCELLED.
> 
> Users of xswait are now expected to deal correctly with
> ERROR_CANCELLED.  If they experience this, it hasn't been logged.
> And the caller won't log it either since it's not TIMEDOUT.
> Luckily this is correct, so we can just change the doc comment.
> 
> Currently nothing generates ERROR_CANCELLED; in particular the
> timeouts cannot in fact signal cancellation.
> 
> There should be no publicly visible change except that some error
> returns from libxl will change from ERROR_FAIL to ERROR_TIMEDOUT, and
> some changes to debugging messages.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 13/29] libxl: domain create: Do not destroy on cancellation
  2015-02-10 20:10 ` [PATCH 13/29] libxl: domain create: Do not destroy on cancellation Ian Jackson
@ 2015-03-24 11:24   ` Ian Campbell
  2015-03-31 18:14     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:24 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> If we cancelled the domain creation, do not try to tear it down again
> Document this.
> 
> This is a backwards-compatible API change since old libxl users will
> never cancel any operations.
> 
> In the current code, there is no functional change, because
> ERROR_CANCELLED is never generated anywhere yet.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

I presume at some later stage in the series a suitable
LIBXL_HAVE_CANCELLATION will materialise? I mention it here because it
is on my mind.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao
  2015-02-10 20:10 ` [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao Ian Jackson
@ 2015-03-24 11:26   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:26 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> This will be used by the cancellation machinery.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 15/29] libxl: ao: Count the nested progeny of an ao
  2015-02-10 20:10 ` [PATCH 15/29] libxl: ao: Count the nested progeny of an ao Ian Jackson
@ 2015-03-24 11:27   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:27 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> This will detect any "escaped" nested aos.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 16/29] libxl: ao: Provide manip_refcnt
  2015-02-10 20:10 ` [PATCH 16/29] libxl: ao: Provide manip_refcnt Ian Jackson
@ 2015-03-24 11:32   ` Ian Campbell
  2015-03-31 18:23     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:32 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> +/*
> + * A "manip" is a libxl public function manipulating this ao, which
> + * has a pointer to it.  We have to not destroy it while that's the
> + * case, obviously.

It might be nice to the reader to make a reference to the "An ao and its
gc may be accessed only with the ctx lock held." sentence in the overall
ao docs?
[...]
> +        libxl__ao__destroy(ctx,ao);

Nit: missing space after ",".

But other than those and including if you disagree about extending the
comment, since it's not a big deal:
        Acked-by: Ian Campbell <ian.campbell@citrix.com>
        
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-02-10 20:10 ` [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API Ian Jackson
@ 2015-03-24 11:45   ` Ian Campbell
  2015-03-31 18:26     ` Ian Jackson
  2015-03-24 11:48   ` Ian Campbell
  1 sibling, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> +/*
> + * It is sometimes possible to cancel an asynchronous operation.
> + *
> + * libxl_ao_cancel searches for an ongoing asynchronous operation whose
> + * ao_how is identical to *how, and tries to cancel it.

I can see that you have arranged for the pointer not to be required to
match, just the contents of the struct, which may be convenient for some
callers who haven't remembered the ao_how somewhere convenient, but is
it permissible to use the same pointer if it is convenient?

Other than wondering about that this patch looks good,
Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API
  2015-02-10 20:10 ` [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API Ian Jackson
@ 2015-03-24 11:45   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> Some places in libxl which can't handle cancellation via a
> libxl__ao_cancellable callback might nevertheless benefit from being
> able to explicitly check for cancellation.
> 
> Provide the (fairly trivial) internal API function to do this.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-02-10 20:10 ` [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API Ian Jackson
  2015-03-24 11:45   ` Ian Campbell
@ 2015-03-24 11:48   ` Ian Campbell
  2015-03-31 18:33     ` Ian Jackson
  1 sibling, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:48 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> +    /*
> +     * For nested aos:
> +     *  Semantically, cancellation affects the whole tree of aos,
> +     *    not just the parent.
> +     *  libxl__ao_cancellable.ao refers to the child, so
> +     *    that the child callback sees the right ao.  (After all,
> +     *    it was code dealing with the child that set .ao.)
> +     *  But, the cancellable is recorded on the "cancellables" list
> +     *    for the ultimate root ao, so that every possible child
> +     *    cancellation occurs as a result of the cancellation of the
> +     *    parent.
> +     *  We set ao->cancelling only in the root.
> +     */

WRT this, given a tree of ao's, which ones need to be cancellable for a
cancellation to succeed? I would assume all of them do, or else the
cancellation can only occur if/when the non-cancellable ones happen to
end?

Do we(/are we going to) take steps to stop new non-cancellable ao's to
the tree once the root is cancelled?

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 19/29] libxl: cancellation: Make timeouts cancellable
  2015-02-10 20:10 ` [PATCH 19/29] libxl: cancellation: Make timeouts cancellable Ian Jackson
@ 2015-03-24 11:50   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:50 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> Make libxl__ev_time* register with the cancellation machinery, so that
> libxl_ao_cancel can cancel any operation which has a timeout.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled
  2015-02-10 20:10 ` [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled Ian Jackson
  2015-02-18 16:24   ` Roger Pau Monné
@ 2015-03-24 11:51   ` Ian Campbell
  2015-03-31 18:37     ` Ian Jackson
  1 sibling, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:51 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris, Roger Pau Monne

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> In practice, cancelling this task will cause all subsequent actual
> backend operations to fail, but will not actually cause the
> libxl_device_events_handler operation to complete.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Roger Pau Monne <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

This is an unfortunate short-coming though, and I presume one which
could be fixed by updates to the toolstack<->driver domain protocol?

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 21/29] libxl: cancellation: Make spawns cancellable
  2015-02-10 20:10 ` [PATCH 21/29] libxl: cancellation: Make spawns cancellable Ian Jackson
@ 2015-03-24 11:53   ` Ian Campbell
  2015-03-31 18:45     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:53 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> The libxl__spawn_spawn internal API permits the caller to specify
> .timeout_ms==-1, meaning to wait forever.  Provide an explicit
> cancellable to allow spawns to be cancelled.

AIUI this also lets spawns with timeout_ms > -1 to be cancelled, which I
think is desirable.

> I think there are not currently any internal callers which do use
> spawn with an infinite timeout, but this should not be left as a
> lacuna for later.
> 
> Also, this change means that in practice anything which is cancelled
> while spawning ought to be queued for two cancellation notifications:
> one from its timeout, and one from the explicit cancellation
> registration.  Hopefully this will mean that more exciting paths are
> exercised during testing.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code
  2015-02-10 20:10 ` [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code Ian Jackson
@ 2015-03-24 11:56   ` Ian Campbell
  2015-03-31 18:47     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:56 UTC (permalink / raw)
  To: Ian Jackson, Konrad Rzeszutek Wilk; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> This is currently reported only by the bootloader code, if the domain
> is destroyed while the bootloader is running.
> 
> In the future it would be nice to return it for other circumstances
> where the domain existed when the operation started but subsequently
> vanished.

Konrad has a semantically similar error code which he is adding, I think
in his recent libxl series to do with vcpu-set.

AIUI Konrad's semantics are simply "domain does not exist", which seems
to be usefully distinct from your "did exist but doesn't any more".

I just wanted to mention it in case I'd misunderstood one or both error
codes. As it stands this patch seems fine to me:

> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

I do wonder though if we ought to be better about documenting in the
code|headers|idl what error codes mean and where they should be used
(some are global, others specific to a subset of calls etc).

Ian.

> ---
> v2: New in this version of the series.
> ---
>  tools/libxl/libxl_bootloader.c |    2 +-
>  tools/libxl/libxl_types.idl    |    1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_bootloader.c b/tools/libxl/libxl_bootloader.c
> index 79947d4..c3f3a1f 100644
> --- a/tools/libxl/libxl_bootloader.c
> +++ b/tools/libxl/libxl_bootloader.c
> @@ -611,7 +611,7 @@ static void bootloader_display_copyfail(libxl__egc *egc,
>  static void bootloader_domaindeath(libxl__egc *egc, libxl__domaindeathcheck *dc)
>  {
>      libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, deathcheck);
> -    bootloader_stop(egc, bl, ERROR_FAIL);
> +    bootloader_stop(egc, bl, ERROR_DOMAIN_DESTROYED);
>  }
>  
>  static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 193f22a..d91b70d 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -66,6 +66,7 @@ libxl_error = Enumeration("error", [
>      (-20, "CANCELLED"),
>      (-21, "NOTFOUND"),
>      (-22, "NOTIMPLEMENTED"),
> +    (-23, "DOMAIN_DESTROYED"),
>      ], value_namespace = "")
>  
>  libxl_domain_type = Enumeration("domain_type", [

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death
  2015-02-10 20:10 ` [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death Ian Jackson
@ 2015-03-24 11:58   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 11:58 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> Make an active libxl__domaindeathcheck contain an active
> libxl__ao_cancellable.
> 
> Consequential changes are:
>  * domaindeath callbacks now take an rc value.
>  * libxl__domaindeathcheck_start takes an ao, not a gc.
>  * bootloader_domaindeath plumbs the rc through to its caller.
>  * libxl__domaindeathcheck_init and _stop are not quite trivial any
>    more so are moved from (inline functions) in libxl_internal.h, to
>    ordinary functions defined in libxl_event.c.
>  * libxl__domaindeathcheck_start is not trivial any more, and now has
>    the standard error-handling pattern.
> 
> The only current user of libxl__domaindeathcheck is the bootloader.
> So the result is that now it is possible to effectively cancel domain
> creation while the bootloader is running.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 24/29] libxl: Introduce FILLZERO
  2015-02-10 20:10 ` [PATCH 24/29] libxl: Introduce FILLZERO Ian Jackson
@ 2015-03-24 12:03   ` Ian Campbell
  2015-03-31 18:51     ` Ian Jackson
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:03 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> FILLZERO is a macro for memset(&foo,0,sizeof(foo)).  It eliminates the
> possiblity to make the error memset(&foo,0,sizeof(&foo)).

but not:
        foo *p = allocate_a_foo()
     memset(p, 0, sizeof(p))
although that's probably less likely to go wrong and I don't think it
can be avoided by the sorts of tricks used here.

> No callers yet, but document it in CODING_STYLE.  (In accordance with
> existing libxl policy, I haven't gone through all existing possible
> call sites.)

We don't usually expose such helpers in the public API, but I suppose
you have a good reason to do so here, could you mention it in the commit
log please.

>  
> +
> +#define LIBXL_FILLZERO(object) (memset(&(object), 0, sizeof((object))))

Evaluates object twice, so LIBXL_FILEZERO(*(p++)), would behave
surprisingly. I'm not sure if this can be resolved though, so this might
be a Don't Do That Then situation.

Only answer I can think of is to require the type be passed to the macro
and validating it matches the object via a pointer comparison.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation
  2015-02-10 20:10 ` [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation Ian Jackson
@ 2015-03-24 12:04   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> Two unrelated non-functional changes, broken out into a pre-patch for
> easier review:
> 
> Break out a function sendsig() in libxl_save_callout.c.
> 
> Move io_fd to be a global variable in libxl_save_helper.c.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper
  2015-02-10 20:10 ` [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper Ian Jackson
@ 2015-03-24 12:07   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:07 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> During startup of the save/restore helper, set the disposition of
> SIGTERM appropriately.
> 
> For restore, we can simply die immediately - there is no point trying
> to do any kind of cleanup on what is now going to be a trashed domain.
> 
> For save, we want to arrange that libxc's cleanup code (eg turning off
> logdirty) takes place.  So our signal handler replaces the fd with one
> on which writes will fail, causing libxc's own loop to fail next time
> it actually tries to do a write.
> 
> Currently this has only a minor beneficial effect: we don't send the
> helper a SIGTERM ourselves, and if someone else contrives to send our
> helper a SIGTERM they have probably sent one to libxl too in which
> case things are going to be a bit messy anyway.
> 
> But in the next patch libxl is going to use SIGTERM itself on ao
> cancellation.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore
  2015-02-10 20:10 ` [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore Ian Jackson
@ 2015-03-24 12:08   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:08 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> Register the the save/restore helper interface with the cancellation
> machinery.  When we are informed that save/restore should be
> cancelled, we make a note of the that in our rc variable, and send the
> helper a SIGTERM.  It will die in due course.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 28/29] libxl: ao: datacopier callback gets an rc
  2015-02-10 20:10 ` [PATCH 28/29] libxl: ao: datacopier callback gets an rc Ian Jackson
@ 2015-03-24 12:10   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:10 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> libxl__datacopier_* now provides its caller's callback function with
> an rc value.  This relieves the caller of the need to figure out an
> appropriate rc value.
> 
> Arrange that the `other internal failure' cases now get a valid
> positive errno value (EIO).
> 
> In a few places, assert that errno is nonzero before passing it to our
> caller.
> 
> Extend the datacopier callback API to permit the dc to signal
> CANCELLED.  (It doesn't actually do this yet, though.)
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

Might be worth CC-ing the Migration-v2 folks on a 3rd version of this if
one happens.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable
  2015-02-10 20:10 ` [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable Ian Jackson
@ 2015-03-24 12:11   ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-24 12:11 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> libxl__datacopier_* can now actually generate a callback with
> rc==CANCELLED.
> 
> This provides cancellation during some corner cases, including (at
> least) copying the device model data during the end of domain save.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/29] libxl: Comment cleanups
  2015-02-11  1:46   ` Hongyang Yang
  2015-02-11 12:05     ` Ian Jackson
@ 2015-03-30  9:08     ` Ian Campbell
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-03-30  9:08 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: Euan Harris, xen-devel, Ian Jackson, Lai Jiangshan, Wen Congyang

On Wed, 2015-02-11 at 09:46 +0800, Hongyang Yang wrote:
> 
> 在 02/11/2015 04:09 AM, Ian Jackson 写道:
> > * Add two comments in libxl_remus_disk_drbd documenting buggy handling
> >    of the hotplug script exit status.
> >
> > * Add a section heading for async exec in libxl_aoutils.c
> >
> > * Mention the right function name (libxl__ev_child_fork, not
> >    libxl__ev_fork) in libxl_internal.h
> >
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wen Congyang <wency@cn.fujitsu.com>
> > CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> For async exec and Remus part,
> Acked-by: Yang Hongyang <yanghy@cn.fujitsu.com>

I did likewise, and I have now applied.

Patch #3 looked like it was getting into the meat of starting to arrange
for cancellation, so I stopped at this patch.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc
  2015-03-24 10:53   ` Ian Campbell
@ 2015-03-31 18:02     ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:02 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc"):
> On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > +    int broke;
> > +    if (rc) {
> > +        broke = -1;
> > +    } else {
> > +        broke = 0;
> > +    }
> 
> int broke = rc ? -1 : 0;
> 
> ?
> 
> But it looks like perhaps you are preparing to add other code in one or
> the other case, or maybe you just prefer this for some reason. Either
> way:

Later we change one of the branches to set dss->rc too.

> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/29] libxl: suspend: common suspend callbacks take rc
  2015-03-24 10:55   ` Ian Campbell
@ 2015-03-31 18:03     ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:03 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 04/29] libxl: suspend: common suspend callbacks take rc"):
> On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > Change the following functions to take a libxl error code rather than
> > a boolean "ok" value, and translate that value to the boolean expected
> > by libxc at the last moment:
> >   domain_suspend_callback_common_done        } dss->callback_common_done
> >   remus_domain_suspend_callback_common_done  }
> >   domain_suspend_common_done
...
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks.

> There are a few new ERROR_FAILs which we might like to consider making
> more specific either now or later.

Indeed.  I have chosen not to try to drain the error handling swamp
myself at this stage.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device
  2015-03-24 11:04   ` Ian Campbell
@ 2015-03-31 18:06     ` Ian Jackson
  2015-04-01  9:25       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:06 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device"):
> On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > @@ -1164,7 +1136,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
> >  {
> >      /* Clean events and check reentrancy */
> >      libxl__ev_time_deregister(gc, &aodev->timeout);
> 
> You seem to have removed the initialisation of this in a previous hunk
> but not this deregistration or the field itself.
> 
> Was that deliberate, perhaps it was serving dual purpose somewhere?

Yes.  See "TODO: 4.2 Bodge due to QEMU, see comment on top of"...
in libxl__initiate_device_remove in libxl_device.c.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc
  2015-02-11  1:04   ` Wen Congyang
  2015-02-11 12:04     ` Ian Jackson
@ 2015-03-31 18:09     ` Ian Jackson
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:09 UTC (permalink / raw)
  To: Wen Congyang; +Cc: xen-devel, Euan Harris, Lai Jiangshan, Yang Hongyang

Wen Congyang writes ("Re: [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc"):
> On 02/11/2015 04:09 AM, Ian Jackson wrote:
> > Change the timeout setup functions to take a libxl__ao, not a
> > libxl__gc.  This is going to be needed for ao cancellation, because
> > timeouts are going to be a main hook for ao cancellation - so the
> > timeouts need to be associated with an ao.
> > 
> > This means that timeouts can only occur as part of a long-running
> > libxl function (but this is of course correct, as libxl shouldn't have
> > any global timeouts, and indeed all the call sites have an ao).
> > 
> > Also remove the gc parameter from libxl__async_exec_start.  It can
> > just use the gc from the ao supplied in the aes.

> > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> > CC: Yang Hongyang <yanghy@cn.fujitsu.com>
> > CC: Wen Congyang <wency@cn.fujitsu.com>
> > CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> libxl__async_exec_start() related modifications look fine to me.

Thanks for the review.  I'll take that as ack, if you don't object.

Regards,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc
  2015-03-24 11:20   ` Ian Campbell
@ 2015-03-31 18:12     ` Ian Jackson
  2015-04-01  9:29       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:12 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc"):
> On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> > index 754e2d1..891cdb8 100644
> > --- a/tools/libxl/libxl_aoutils.c
> > +++ b/tools/libxl/libxl_aoutils.c
> > @@ -483,11 +483,12 @@ static void async_exec_done(libxl__egc *egc,
> >      libxl__ev_time_deregister(gc, &aes->time);
> >  
> >      if (status) {
> > -        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
> > -                                      aes->what, pid, status);
> > +        if (!aes->rc)
> 
> Could be one "if (status && !aes->rc)", unless perhaps there is more
> code to come in this block?

No, there is no more to come.  I find it clearer this way but I don't
mind changing it.

> > +                        libxl__async_exec_state *aes, int rc, int status);
> > +/*
> > + * Meaning of status and rc:
> > + *  rc==0, status==0    all went well
> > + *  rc==0, status!=0    everything OK except child exited nonzero (logged)
> > + *  rc!=0               something else went wrong (status is real
> > + *                       exit status, maybe reflecting SIGKILL if aes
> > + *                       code killed the child).  Logged unless CANCELLED.
> 
> I'm unclear on whether status is valid in this third case or not. I
> think you are saying that it is (probably?) valid but if rc!=0 the
> caller likely doesn't actually care what it is?

status is definitely valid but maybe uninteresting, as stated in the
comment.

Would it help to add something about status to the third row of the
little table bit at the left ?

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 13/29] libxl: domain create: Do not destroy on cancellation
  2015-03-24 11:24   ` Ian Campbell
@ 2015-03-31 18:14     ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 13/29] libxl: domain create: Do not destroy on cancellation"):
> I presume at some later stage in the series a suitable
> LIBXL_HAVE_CANCELLATION will materialise? I mention it here because it
> is on my mind.

Yes, this is in
  libxl: cancellation: Provide public ao cancellation API
(17/29 in v2)

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 16/29] libxl: ao: Provide manip_refcnt
  2015-03-24 11:32   ` Ian Campbell
@ 2015-03-31 18:23     ` Ian Jackson
  2015-04-01  9:34       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 16/29] libxl: ao: Provide manip_refcnt"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > +/*
> > + * A "manip" is a libxl public function manipulating this ao, which
> > + * has a pointer to it.  We have to not destroy it while that's the
> > + * case, obviously.
> 
> It might be nice to the reader to make a reference to the "An ao and its
> gc may be accessed only with the ctx lock held." sentence in the overall
> ao docs?

This seems to have been documented already in May 2012 in exactly
those terms, at the top of struct libxl__ao, in 738b3d442522.  Am I
confused ?

I have added a comment about locking in the manip functions' doc
comment.

> [...]
> > +        libxl__ao__destroy(ctx,ao);
> 
> Nit: missing space after ",".

Fixed.

> But other than those and including if you disagree about extending the
> comment, since it's not a big deal:
>         Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-03-24 11:45   ` Ian Campbell
@ 2015-03-31 18:26     ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:26 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Euan Harris, xen-devel, Ian Jackson

Ian Campbell writes ("Re: [Xen-devel] [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > +/*
> > + * It is sometimes possible to cancel an asynchronous operation.
> > + *
> > + * libxl_ao_cancel searches for an ongoing asynchronous operation whose
> > + * ao_how is identical to *how, and tries to cancel it.
> 
> I can see that you have arranged for the pointer not to be required to
> match, just the contents of the struct, which may be convenient for some
> callers who haven't remembered the ao_how somewhere convenient, but is
> it permissible to use the same pointer if it is convenient?

The pointer is a const libxl_asyncop_how* (both at operation
initiation and for libxl_ao_cancel), so libxl promises not to fiddle
with the struct's contents.  I think that it should therefore be
obvious that you can use the same pointer, as well as another struct
with the same contents (the latter being defined by the doc comment
saying that libxl_ao_cancel looks for an identical *how).

> Other than wondering about that this patch looks good,
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-03-24 11:48   ` Ian Campbell
@ 2015-03-31 18:33     ` Ian Jackson
  2015-04-01  9:38       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:33 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > +    /*
> > +     * For nested aos:
> > +     *  Semantically, cancellation affects the whole tree of aos,
> > +     *    not just the parent.
> > +     *  libxl__ao_cancellable.ao refers to the child, so
> > +     *    that the child callback sees the right ao.  (After all,
> > +     *    it was code dealing with the child that set .ao.)
> > +     *  But, the cancellable is recorded on the "cancellables" list
> > +     *    for the ultimate root ao, so that every possible child
> > +     *    cancellation occurs as a result of the cancellation of the
> > +     *    parent.
> > +     *  We set ao->cancelling only in the root.
> > +     */
> 
> WRT this, given a tree of ao's, which ones need to be cancellable for a
> cancellation to succeed? I would assume all of them do, or else the
> cancellation can only occur if/when the non-cancellable ones happen to
> end?

It depends what you mean by `succeed'.  libxl_ao_cancel reports
success if it did something.  That is, if anyone was listening.  For
this purpose, with a tree of ao's, it is sufficient for any one of the
children to have registered a cancellable.

> Do we(/are we going to) take steps to stop new non-cancellable ao's to
> the tree once the root is cancelled?

Yes.  Individual callback setups, and operations, can check by calling
libxl__ao_cancelling.  Normally this is done by
libxl__ao_cancellable_register: it is not possible to register a new
cancellable for an ao which we are supposed to be cancelling.

This affects, for example, libxl__ev_time_register_*.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled
  2015-03-24 11:51   ` Ian Campbell
@ 2015-03-31 18:37     ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris, Roger Pau Monne

Ian Campbell writes ("Re: [Xen-devel] [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > In practice, cancelling this task will cause all subsequent actual
> > backend operations to fail, but will not actually cause the
> > libxl_device_events_handler operation to complete.
...
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> This is an unfortunate short-coming though, and I presume one which
> could be fixed by updates to the toolstack<->driver domain protocol?

It could be fixed by updates to the code inside libxl for the driver
domain task.  But it is not clear that the benefit would be large, and
the memory management in the driver domain task means that it would be
tricky to implement - there would have to be more tracking of what
operations are ongoing.

The behaviour of a driver domain service task which is in the process
of being cancelled is not anything that you would want, so a
cancel-and-restart isn't a nice operation anyway.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 21/29] libxl: cancellation: Make spawns cancellable
  2015-03-24 11:53   ` Ian Campbell
@ 2015-03-31 18:45     ` Ian Jackson
  2015-04-01  9:40       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 21/29] libxl: cancellation: Make spawns cancellable"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > The libxl__spawn_spawn internal API permits the caller to specify
> > .timeout_ms==-1, meaning to wait forever.  Provide an explicit
> > cancellable to allow spawns to be cancelled.
> 
> AIUI this also lets spawns with timeout_ms > -1 to be cancelled, which I
> think is desirable.

Yes.

Spawns with finite timeouts will be cancelled either by the explicit
cancellable in the spawn, or the one implied by the finite timeout; it
doesn't matter which - the path goes from spawn_cancel or
spawn_watch_event (respectively) to spawn_fail.

I was going to say that this patch is needed because libxl__ev_time's
cancellable is not registered when the timeout is infinite.  But
actually I see from libxl_event.c that this isn't the case.  So I
think I should drop this patch.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code
  2015-03-24 11:56   ` Ian Campbell
@ 2015-03-31 18:47     ` Ian Jackson
  2015-04-01  9:41       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code"):
> Konrad has a semantically similar error code which he is adding, I think
> in his recent libxl series to do with vcpu-set.

That is valuable, I think.

> AIUI Konrad's semantics are simply "domain does not exist", which seems
> to be usefully distinct from your "did exist but doesn't any more".

Indeed, although it's not clear to me without peering at the code in
detail whether my code always checks that the domain does exist before
setting up the watch which might result in its death being reported.

> I just wanted to mention it in case I'd misunderstood one or both error
> codes. As it stands this patch seems fine to me:
...
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks.

> I do wonder though if we ought to be better about documenting in the
> code|headers|idl what error codes mean and where they should be used
> (some are global, others specific to a subset of calls etc).

That would definitely be nice.  Do you think it would be helpful to
respin this patch with that information ?

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 24/29] libxl: Introduce FILLZERO
  2015-03-24 12:03   ` Ian Campbell
@ 2015-03-31 18:51     ` Ian Jackson
  2015-04-01  9:42       ` Ian Campbell
  0 siblings, 1 reply; 98+ messages in thread
From: Ian Jackson @ 2015-03-31 18:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [PATCH 24/29] libxl: Introduce FILLZERO"):
> On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > FILLZERO is a macro for memset(&foo,0,sizeof(foo)).  It eliminates the
> > possiblity to make the error memset(&foo,0,sizeof(&foo)).
> 
> but not:
>         foo *p = allocate_a_foo()
>      memset(p, 0, sizeof(p))
> although that's probably less likely to go wrong and I don't think it
> can be avoided by the sorts of tricks used here.

Well, we could make a macro that you _have_ to pass the pointer to.
The result is that passing a non-pointer object gets you an error.

> > No callers yet, but document it in CODING_STYLE.  (In accordance with
> > existing libxl policy, I haven't gone through all existing possible
> > call sites.)
> 
> We don't usually expose such helpers in the public API, but I suppose
> you have a good reason to do so here, could you mention it in the commit
> log please.

It should be in the public API because xl*.c is full of uses of memset
which ought to be replaced with FILLZERO.  Do you really think I need
to mention this in the commit message ?

> > +#define LIBXL_FILLZERO(object) (memset(&(object), 0, sizeof((object))))
> 
> Evaluates object twice, so LIBXL_FILEZERO(*(p++)), would behave
> surprisingly. I'm not sure if this can be resolved though, so this might
> be a Don't Do That Then situation.

sizeof() does not evaluate its argument.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device
  2015-03-31 18:06     ` Ian Jackson
@ 2015-04-01  9:25       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:25 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:06 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device"):
> > On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > > @@ -1164,7 +1136,7 @@ static void device_hotplug_clean(libxl__gc *gc, libxl__ao_device *aodev)
> > >  {
> > >      /* Clean events and check reentrancy */
> > >      libxl__ev_time_deregister(gc, &aodev->timeout);
> > 
> > You seem to have removed the initialisation of this in a previous hunk
> > but not this deregistration or the field itself.
> > 
> > Was that deliberate, perhaps it was serving dual purpose somewhere?
> 
> Yes.  See "TODO: 4.2 Bodge due to QEMU, see comment on top of"...
> in libxl__initiate_device_remove in libxl_device.c.

Ah, yes, thanks.
Then: Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc
  2015-03-31 18:12     ` Ian Jackson
@ 2015-04-01  9:29       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:29 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:12 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc"):
> > On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > > diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> > > index 754e2d1..891cdb8 100644
> > > --- a/tools/libxl/libxl_aoutils.c
> > > +++ b/tools/libxl/libxl_aoutils.c
> > > @@ -483,11 +483,12 @@ static void async_exec_done(libxl__egc *egc,
> > >      libxl__ev_time_deregister(gc, &aes->time);
> > >  
> > >      if (status) {
> > > -        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
> > > -                                      aes->what, pid, status);
> > > +        if (!aes->rc)
> > 
> > Could be one "if (status && !aes->rc)", unless perhaps there is more
> > code to come in this block?
> 
> No, there is no more to come.  I find it clearer this way but I don't
> mind changing it.

No it's fine if you don't want to.

> > > +                        libxl__async_exec_state *aes, int rc, int status);
> > > +/*
> > > + * Meaning of status and rc:
> > > + *  rc==0, status==0    all went well
> > > + *  rc==0, status!=0    everything OK except child exited nonzero (logged)
> > > + *  rc!=0               something else went wrong (status is real
> > > + *                       exit status, maybe reflecting SIGKILL if aes
> > > + *                       code killed the child).  Logged unless CANCELLED.
> > 
> > I'm unclear on whether status is valid in this third case or not. I
> > think you are saying that it is (probably?) valid but if rc!=0 the
> > caller likely doesn't actually care what it is?
> 
> status is definitely valid but maybe uninteresting, as stated in the
> comment.

I think I initially parsed it as "real exit status, maybe", which isn't
really what it says...

> Would it help to add something about status to the third row of the
> little table bit at the left ?

Perhaps, or perhaps:
s/ maybe//; s/child)/child, and therefore likely to be uninteresting)/
?

In any case now that I've read it correctly:
        Acked-by: Ian Campbell <ian.campbell@citrix.com>

But if you want to clarify any further please go ahead and retain the
ack.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 16/29] libxl: ao: Provide manip_refcnt
  2015-03-31 18:23     ` Ian Jackson
@ 2015-04-01  9:34       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:34 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:23 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 16/29] libxl: ao: Provide manip_refcnt"):
> > On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > > +/*
> > > + * A "manip" is a libxl public function manipulating this ao, which
> > > + * has a pointer to it.  We have to not destroy it while that's the
> > > + * case, obviously.
> > 
> > It might be nice to the reader to make a reference to the "An ao and its
> > gc may be accessed only with the ctx lock held." sentence in the overall
> > ao docs?
> 
> This seems to have been documented already in May 2012 in exactly
> those terms, at the top of struct libxl__ao, in 738b3d442522.  Am I
> confused ?

Right, my suggestions was to make a reference to that comment. One might
expect ao__manip_enter to take care of locking for you as part of the
"prepare this thing to be manipulated".

We don't seem to have an ASSERT_CTX_LOCKED construct, or I would suggest
using it in cases like these.

> I have added a comment about locking in the manip functions' doc
> comment.
> 
> > [...]
> > > +        libxl__ao__destroy(ctx,ao);
> > 
> > Nit: missing space after ",".
> 
> Fixed.
> 
> > But other than those and including if you disagree about extending the
> > comment, since it's not a big deal:
> >         Acked-by: Ian Campbell <ian.campbell@citrix.com>

This stands.

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API
  2015-03-31 18:33     ` Ian Jackson
@ 2015-04-01  9:38       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:38 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:33 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API"):
> > On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > > +    /*
> > > +     * For nested aos:
> > > +     *  Semantically, cancellation affects the whole tree of aos,
> > > +     *    not just the parent.
> > > +     *  libxl__ao_cancellable.ao refers to the child, so
> > > +     *    that the child callback sees the right ao.  (After all,
> > > +     *    it was code dealing with the child that set .ao.)
> > > +     *  But, the cancellable is recorded on the "cancellables" list
> > > +     *    for the ultimate root ao, so that every possible child
> > > +     *    cancellation occurs as a result of the cancellation of the
> > > +     *    parent.
> > > +     *  We set ao->cancelling only in the root.
> > > +     */
> > 
> > WRT this, given a tree of ao's, which ones need to be cancellable for a
> > cancellation to succeed? I would assume all of them do, or else the
> > cancellation can only occur if/when the non-cancellable ones happen to
> > end?
> 
> It depends what you mean by `succeed'.  libxl_ao_cancel reports
> success if it did something.  That is, if anyone was listening.  For
> this purpose, with a tree of ao's, it is sufficient for any one of the
> children to have registered a cancellable.

Right, and this "only at least one, not all" semantic is what leads to
the "It will be cut short.  However, it may still take some time to
cancel." wording elsewhere since for the cancellation to "complete" all
of the non-cancellable things would need to finish too, but we expect
that would happen sooner if any one thing in the tree was cancelled than
if not.

> > Do we(/are we going to) take steps to stop new non-cancellable ao's to
> > the tree once the root is cancelled?
> 
> Yes.  Individual callback setups, and operations, can check by calling
> libxl__ao_cancelling.  Normally this is done by
> libxl__ao_cancellable_register: it is not possible to register a new
> cancellable for an ao which we are supposed to be cancelling.
> 
> This affects, for example, libxl__ev_time_register_*.

Great.

Acked-by: Ian Campbell <ian.campbell@citrix.com>

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 21/29] libxl: cancellation: Make spawns cancellable
  2015-03-31 18:45     ` Ian Jackson
@ 2015-04-01  9:40       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:40 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:45 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 21/29] libxl: cancellation: Make spawns cancellable"):
> > On Tue, 2015-02-10 at 20:10 +0000, Ian Jackson wrote:
> > > The libxl__spawn_spawn internal API permits the caller to specify
> > > .timeout_ms==-1, meaning to wait forever.  Provide an explicit
> > > cancellable to allow spawns to be cancelled.
> > 
> > AIUI this also lets spawns with timeout_ms > -1 to be cancelled, which I
> > think is desirable.
> 
> Yes.
> 
> Spawns with finite timeouts will be cancelled either by the explicit
> cancellable in the spawn, or the one implied by the finite timeout; it
> doesn't matter which - the path goes from spawn_cancel or
> spawn_watch_event (respectively) to spawn_fail.
> 
> I was going to say that this patch is needed because libxl__ev_time's
> cancellable is not registered when the timeout is infinite.  But
> actually I see from libxl_event.c that this isn't the case.  So I
> think I should drop this patch.

OK.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code
  2015-03-31 18:47     ` Ian Jackson
@ 2015-04-01  9:41       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:41 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:47 +0100, Ian Jackson wrote:
> Ian Campbell writes ("Re: [Xen-devel] [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code"):
> > Konrad has a semantically similar error code which he is adding, I think
> > in his recent libxl series to do with vcpu-set.
> 
> That is valuable, I think.
> 
> > AIUI Konrad's semantics are simply "domain does not exist", which seems
> > to be usefully distinct from your "did exist but doesn't any more".
> 
> Indeed, although it's not clear to me without peering at the code in
> detail whether my code always checks that the domain does exist before
> setting up the watch which might result in its death being reported.
> 
> > I just wanted to mention it in case I'd misunderstood one or both error
> > codes. As it stands this patch seems fine to me:
> ...
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> Thanks.
> 
> > I do wonder though if we ought to be better about documenting in the
> > code|headers|idl what error codes mean and where they should be used
> > (some are global, others specific to a subset of calls etc).
> 
> That would definitely be nice.  Do you think it would be helpful to
> respin this patch with that information ?

If you can be bothered, sure. I suppose we ought to start somewhere!

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 24/29] libxl: Introduce FILLZERO
  2015-03-31 18:51     ` Ian Jackson
@ 2015-04-01  9:42       ` Ian Campbell
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Campbell @ 2015-04-01  9:42 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Euan Harris

On Tue, 2015-03-31 at 19:51 +0100, Ian Jackson wrote:

> > > No callers yet, but document it in CODING_STYLE.  (In accordance with
> > > existing libxl policy, I haven't gone through all existing possible
> > > call sites.)
> > 
> > We don't usually expose such helpers in the public API, but I suppose
> > you have a good reason to do so here, could you mention it in the commit
> > log please.
> 
> It should be in the public API because xl*.c is full of uses of memset
> which ought to be replaced with FILLZERO.  Do you really think I need
> to mention this in the commit message ?

I suppose not.

> > > +#define LIBXL_FILLZERO(object) (memset(&(object), 0, sizeof((object))))
> > 
> > Evaluates object twice, so LIBXL_FILEZERO(*(p++)), would behave
> > surprisingly. I'm not sure if this can be resolved though, so this might
> > be a Don't Do That Then situation.
> 
> sizeof() does not evaluate its argument.

Useful!

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-02-18 16:10 ` Euan Harris
  2015-02-18 16:12   ` Ian Jackson
@ 2015-04-07 17:08   ` Euan Harris
  2015-04-07 17:19     ` Ian Jackson
  1 sibling, 1 reply; 98+ messages in thread
From: Euan Harris @ 2015-04-07 17:08 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, james.bulpin

Hi,

On Wed, Feb 18, 2015 at 04:10:35PM +0000, Euan Harris wrote:
> We had a chat about testing these changes, and integrating them into xenopsd.
> We agreed that we each had slightly different expectations of what we were 
> going to do, and when.   I think we came to the following major conclusions:
> 
>   - I will start work on a simple test framework for cancellation,
>     hopefully to have first results in a fortnight or so.
>   - Once the test framework is available you will fix whatever bugs it
>     unearths, then we will rinse and repeat.
>   - You will think some more about the possibility of adding cancellation
>     to the xl command line tool, but since this is tricky there is no 
>     expectation of when it might happen.

I think the most straightforward way to test the cancellation mechanism in
LibXL will be to adapt the way we test similar functionality in xenopsd:

   * define numbered 'cancellation points' at which cancellable operations
     can be cancelled
   * before testing a cancellable operation, pre-set the cancellation point
     at which cancellation should be attempted
   * when execution reaches the pre-set cancellation point, run the cancellation
     procedure

This approach alone will not allow us to test asynchronous cancellation in
the middle of long-running operations, such as writing a suspend image
to disk - that will require a way to synchronize the test program with
the long-running operation.

My first guess about how this might be done was:

   * add current cancellation point and a trigger point variables to the context
     struct
   * increment the counter and fire the cancellation logic in
     libxl__ao_cancellable_register()

In this way we could write a loop which iterated through all possible
cancellation points.   However you pointed out that we cannot call
libxl_ao_cancel() while holding the context lock, so this idea needs
some refinement.   One possibility would be to tell another thread to try
to do the cancellation immediately after we release the lock;  another
option, if we didn't want to write a multi-thread test driver,
would be to do the cancellation at the top of libxl's event loop.

I think this captures roughly what we talked about.   Please let me know
if I misunderstood or missed out any details.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-04-07 17:08   ` Euan Harris
@ 2015-04-07 17:19     ` Ian Jackson
  2015-04-09 16:44       ` Euan Harris
  2015-04-14  9:43       ` Euan Harris
  0 siblings, 2 replies; 98+ messages in thread
From: Ian Jackson @ 2015-04-07 17:19 UTC (permalink / raw)
  To: Euan Harris; +Cc: xen-devel, james.bulpin

Euan Harris writes ("Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations"):
> On Wed, Feb 18, 2015 at 04:10:35PM +0000, Euan Harris wrote:
> I think the most straightforward way to test the cancellation mechanism in
> LibXL will be to adapt the way we test similar functionality in xenopsd:
> 
>    * define numbered 'cancellation points' at which cancellable operations
>      can be cancelled
>    * before testing a cancellable operation, pre-set the cancellation point
>      at which cancellation should be attempted
>    * when execution reaches the pre-set cancellation point, run the cancellation
>      procedure

This seems likely to work.

> This approach alone will not allow us to test asynchronous cancellation in
> the middle of long-running operations, such as writing a suspend image
> to disk - that will require a way to synchronize the test program with
> the long-running operation.

On the contrary, I think many long-running operations, such as suspend
and migrations, involve multiple iterations of the libxl event loop.
Actual suspend/migrate is done in a helper process; the main process
is responsible for progress report handling, coordination, etc.

> My first guess about how this might be done was:
> 
>    * add current cancellation point and a trigger point variables to the context
>      struct
>    * increment the counter and fire the cancellation logic in
>      libxl__ao_cancellable_register()
> 
> In this way we could write a loop which iterated through all possible
> cancellation points.   However you pointed out that we cannot call
> libxl_ao_cancel() while holding the context lock, so this idea needs
> some refinement.   One possibility would be to tell another thread to try
> to do the cancellation immediately after we release the lock;  another
> option, if we didn't want to write a multi-thread test driver,
> would be to do the cancellation at the top of libxl's event loop.

The relevant function for this latter approach is eventloop_iteration
in libxl_event.c.  This is used by libxl whenever the caller specifies
that a long-running operation is to be done synchronously (ao_how==0),
which is what xl does.

You might also consider whether to add a debug option for
afterpoll_internal to make it return after every callback (ie, after
the call to efd->func() and the call to time_occurs).  That would
allow you to inject cancellation in a slightly more fine-grained
manner.

> I think this captures roughly what we talked about.   Please let me know
> if I misunderstood or missed out any details.

I also mentioned that you counting invocations of
libxl__ao_cancellable_register is less than ideal because it is very
coarse-grained.

Regards,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-03-03 12:08 ` Ian Campbell
  2015-03-20 10:39   ` Euan Harris
@ 2015-04-07 17:22   ` Ian Jackson
  1 sibling, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-04-07 17:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Euan Harris

Ian Campbell writes ("Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations"):
> On Tue, 2015-02-10 at 20:09 +0000, Ian Jackson wrote:
> > This is v2 of my work-in-progress series to support cancellation of
> > long-running libxl operations.
> [...]
> > I wouldn't recommend testing it yet until I've at least smoke tested
> > it to see that things still work if you don't cancel them.
> 
> Would review of the series be useful and/or appreciated at this stage?

Review of the APIs, and general approach, would be very much
appreciated.  That's probably best done by looking at the tip and
\x03diffing libxl[_internal].h.

> Perhaps the first half dozen or so look like preparatory cleanups which
> I could sensibly look at?

That would also be useful.

> > Here's a list of the patches:
> > 
> >   01/29  libxl: Further fix exit paths from libxl_device_events_handler
> >   02/29  libxl: Comment cleanups
> >   03/29  libxl: suspend: switch_logdirty_done takes rc
> >   04/29  libxl: suspend: common suspend callbacks take rc
> >   05/29  libxl: suspend: Return correct error from callbacks
> >   06/29  libxl: Use libxl__xswait* in libxl__ao_device
> >   07/29  libxl: xswait/devstate: Move xswait to before devstate
> >   08/29  libxl: devstate: Use libxl__xswait*

These first eight are cleanups and could in principle go in right
away.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-04-07 17:19     ` Ian Jackson
@ 2015-04-09 16:44       ` Euan Harris
  2015-04-09 16:50         ` Ian Jackson
  2015-04-14  9:43       ` Euan Harris
  1 sibling, 1 reply; 98+ messages in thread
From: Euan Harris @ 2015-04-09 16:44 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, james.bulpin

On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote:
> On the contrary, I think many long-running operations, such as suspend
> and migrations, involve multiple iterations of the libxl event loop.
> Actual suspend/migrate is done in a helper process; the main process
> is responsible for progress report handling, coordination, etc.

Yes, that would work, but an open loop approach like that can lead to
frustratingly unreliable tests.   I think it would be best to make
the test aware of the state of the helper - or even in control of it.
That would allow us to wait for the helper to reach a particular state
before killing it.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-04-09 16:44       ` Euan Harris
@ 2015-04-09 16:50         ` Ian Jackson
  0 siblings, 0 replies; 98+ messages in thread
From: Ian Jackson @ 2015-04-09 16:50 UTC (permalink / raw)
  To: Euan Harris; +Cc: xen-devel, james.bulpin

Euan Harris writes ("Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations"):
> Yes, that would work, but an open loop approach like that can lead to
> frustratingly unreliable tests.   I think it would be best to make
> the test aware of the state of the helper - or even in control of it.
> That would allow us to wait for the helper to reach a particular state
> before killing it.

This is less bad than you might think because the helper's progress
messages to libxl are at fairly predictable progress points.

In any case, the helper (in general) runs concurrently with libxl, so
when libxl decides to stop the progress there will often be a race.
(Sometimes the helper has to stop and wait for libxl to confirm.)

Ian.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
  2015-04-07 17:19     ` Ian Jackson
  2015-04-09 16:44       ` Euan Harris
@ 2015-04-14  9:43       ` Euan Harris
  1 sibling, 0 replies; 98+ messages in thread
From: Euan Harris @ 2015-04-14  9:43 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, james.bulpin

On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote:
> I also mentioned that you counting invocations of
> libxl__ao_cancellable_register is less than ideal because it is very
> coarse-grained.

Yes.  A nice feature of counting calls to libxl__ao_cancellable_register
is that it is closely tied to identifiable points in the lifetime
of the cancellable operation.   However for better resolution we could count
event-loop iterations since a registration call, or any other easily
identifiable point.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2015-04-14  9:43 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-10 20:09 [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
2015-02-10 20:09 ` [PATCH 01/29] libxl: Further fix exit paths from libxl_device_events_handler Ian Jackson
2015-02-18 16:23   ` Roger Pau Monné
2015-03-24 10:49     ` Ian Campbell
2015-02-10 20:09 ` [PATCH 02/29] libxl: Comment cleanups Ian Jackson
2015-02-11  1:46   ` Hongyang Yang
2015-02-11 12:05     ` Ian Jackson
2015-03-30  9:08     ` Ian Campbell
2015-03-24 10:50   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 03/29] libxl: suspend: switch_logdirty_done takes rc Ian Jackson
2015-03-24 10:53   ` Ian Campbell
2015-03-31 18:02     ` Ian Jackson
2015-02-10 20:09 ` [PATCH 04/29] libxl: suspend: common suspend callbacks take rc Ian Jackson
2015-03-24 10:55   ` Ian Campbell
2015-03-31 18:03     ` Ian Jackson
2015-02-10 20:09 ` [PATCH 05/29] libxl: suspend: Return correct error from callbacks Ian Jackson
2015-03-24 10:58   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 06/29] libxl: Use libxl__xswait* in libxl__ao_device Ian Jackson
2015-03-24 11:04   ` Ian Campbell
2015-03-31 18:06     ` Ian Jackson
2015-04-01  9:25       ` Ian Campbell
2015-02-10 20:09 ` [PATCH 07/29] libxl: xswait/devstate: Move xswait to before devstate Ian Jackson
2015-03-24 11:05   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 08/29] libxl: devstate: Use libxl__xswait* Ian Jackson
2015-03-24 11:07   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 09/29] libxl: New error codes CANCELLED etc Ian Jackson
2015-03-24 11:08   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 10/29] libxl: events: Make timeout and async exec setup take an ao, not a gc Ian Jackson
2015-02-11  1:04   ` Wen Congyang
2015-02-11 12:04     ` Ian Jackson
2015-03-31 18:09     ` Ian Jackson
2015-03-24 11:09   ` Ian Campbell
2015-02-10 20:09 ` [PATCH 11/29] libxl: events: Make libxl__async_exec_* pass caller an rc Ian Jackson
2015-03-24 11:20   ` Ian Campbell
2015-03-31 18:12     ` Ian Jackson
2015-04-01  9:29       ` Ian Campbell
2015-02-10 20:09 ` [PATCH 12/29] libxl: events: Permit timeouts to signal cancellation Ian Jackson
2015-03-24 11:22   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 13/29] libxl: domain create: Do not destroy on cancellation Ian Jackson
2015-03-24 11:24   ` Ian Campbell
2015-03-31 18:14     ` Ian Jackson
2015-02-10 20:10 ` [PATCH 14/29] libxl: ao: Record ultimate parent of a nested ao Ian Jackson
2015-03-24 11:26   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 15/29] libxl: ao: Count the nested progeny of an ao Ian Jackson
2015-03-24 11:27   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 16/29] libxl: ao: Provide manip_refcnt Ian Jackson
2015-03-24 11:32   ` Ian Campbell
2015-03-31 18:23     ` Ian Jackson
2015-04-01  9:34       ` Ian Campbell
2015-02-10 20:10 ` [PATCH 17/29] libxl: cancellation: Provide public ao cancellation API Ian Jackson
2015-03-24 11:45   ` Ian Campbell
2015-03-31 18:26     ` Ian Jackson
2015-03-24 11:48   ` Ian Campbell
2015-03-31 18:33     ` Ian Jackson
2015-04-01  9:38       ` Ian Campbell
2015-02-10 20:10 ` [PATCH 18/29] libxl: cancellation: Provide explicit internal cancel check API Ian Jackson
2015-03-24 11:45   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 19/29] libxl: cancellation: Make timeouts cancellable Ian Jackson
2015-03-24 11:50   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 20/29] libxl: cancellation: Note that driver domain task cannot be usefully cancelled Ian Jackson
2015-02-18 16:24   ` Roger Pau Monné
2015-03-24 11:51   ` Ian Campbell
2015-03-31 18:37     ` Ian Jackson
2015-02-10 20:10 ` [PATCH 21/29] libxl: cancellation: Make spawns cancellable Ian Jackson
2015-03-24 11:53   ` Ian Campbell
2015-03-31 18:45     ` Ian Jackson
2015-04-01  9:40       ` Ian Campbell
2015-02-10 20:10 ` [PATCH 22/29] libxl: Introduce DOMAIN_DESTROYED error code Ian Jackson
2015-03-24 11:56   ` Ian Campbell
2015-03-31 18:47     ` Ian Jackson
2015-04-01  9:41       ` Ian Campbell
2015-02-10 20:10 ` [PATCH 23/29] libxl: cancellation: Support cancellation where we spot domain death Ian Jackson
2015-03-24 11:58   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 24/29] libxl: Introduce FILLZERO Ian Jackson
2015-03-24 12:03   ` Ian Campbell
2015-03-31 18:51     ` Ian Jackson
2015-04-01  9:42       ` Ian Campbell
2015-02-10 20:10 ` [PATCH 25/29] libxl: cancellation: Preparations for save/restore cancellation Ian Jackson
2015-03-24 12:04   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 26/29] libxl: cancellation: Handle SIGTERM in save/restore helper Ian Jackson
2015-03-24 12:07   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 27/29] libxl: cancellation: Cancel libxc save/restore Ian Jackson
2015-03-24 12:08   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 28/29] libxl: ao: datacopier callback gets an rc Ian Jackson
2015-03-24 12:10   ` Ian Campbell
2015-02-10 20:10 ` [PATCH 29/29] libxl: cancellation: Make datacopiers cancellable Ian Jackson
2015-03-24 12:11   ` Ian Campbell
2015-02-10 20:13 ` [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations Ian Jackson
2015-02-18 16:10 ` Euan Harris
2015-02-18 16:12   ` Ian Jackson
2015-04-07 17:08   ` Euan Harris
2015-04-07 17:19     ` Ian Jackson
2015-04-09 16:44       ` Euan Harris
2015-04-09 16:50         ` Ian Jackson
2015-04-14  9:43       ` Euan Harris
2015-03-03 12:08 ` Ian Campbell
2015-03-20 10:39   ` Euan Harris
2015-04-07 17:22   ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.