xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Suspend and resume stubdomains
@ 2022-09-27  3:03 Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 1/5] libxl: Add a utility function for domain resume Demi Marie Obenour
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross

Currently, libxl neither pauses nor suspends a stubdomain when
suspending the domain it serves.  Qubes OS has an out-of-tree patch that
just pauses the stubdomain, but that is also insufficient: sys-net (an
HVM with an attached PCI device) does not properly resume from suspend
on some systems, and the stubdomain considers the TSC clocksource to be
unstable after resume.

The first two patches add utility functions that will be used later.
The third patch is the one that actually suspends the stubdomain.  The
fourth patch makes suspending slow-to-respond domains more robust, and
the fifth patch adds extra logging.

Demi Marie Obenour (5):
  libxl: Add a utility function for domain resume
  libxl: Add utility function to check guest status
  libxl: Properly suspend stubdomains
  libxl: Fix race condition in domain suspension
  libxl: Add additional domain suspend/resume logs

 tools/libxl/libxl_dom_suspend.c | 276 +++++++++++++++++++++++++-------
 tools/libxl/libxl_domain.c      |   1 +
 tools/libxl/libxl_internal.h    |   1 +
 3 files changed, 218 insertions(+), 60 deletions(-)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] libxl: Add a utility function for domain resume
  2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
@ 2022-09-27  3:03 ` Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 2/5] libxl: Add utility function to check guest status Demi Marie Obenour
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross

It is necessary to all xs_resume_domain after any successful call to
xc_domain_resume, so that XenStore is notified of the resumption.
However, it is also very easy to forget to call this.  This took me
several days to debug.

Fix this by adding a utility function to resume a domain and then notify
XenStore of the resumption.  This function does not resume any device
model, so it is still internal to libxl, but it makes future changes to
libxl much less error-prone.  It also makes libxl itself smaller.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 tools/libs/light/libxl_dom_suspend.c | 41 +++++++++++++++------------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/tools/libs/light/libxl_dom_suspend.c b/tools/libs/light/libxl_dom_suspend.c
index 4fa22bb7391049f2ea4ac32f21660212053bd4bc..fa50e8801f35d173a99ae5dd19eb941649e14019 100644
--- a/tools/libs/light/libxl_dom_suspend.c
+++ b/tools/libs/light/libxl_dom_suspend.c
@@ -451,6 +451,22 @@ int libxl__domain_resume_device_model_deprecated(libxl__gc *gc, uint32_t domid)
     return 0;
 }
 
+/* Just resumes the domain.  The device model must have been resumed already. */
+static int domain_resume_raw(libxl__gc *gc, uint32_t domid, int suspend_cancel)
+{
+    if (xc_domain_resume(CTX->xch, domid, suspend_cancel)) {
+        LOGED(ERROR, domid, "xc_domain_resume failed");
+        return ERROR_FAIL;
+    }
+
+    if (!xs_resume_domain(CTX->xsh, domid)) {
+        LOGED(ERROR, domid, "xs_resume_domain failed");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 int libxl__domain_resume_deprecated(libxl__gc *gc, uint32_t domid, int suspend_cancel)
 {
     int rc = 0;
@@ -469,16 +485,7 @@ int libxl__domain_resume_deprecated(libxl__gc *gc, uint32_t domid, int suspend_c
         }
     }
 
-    if (xc_domain_resume(CTX->xch, domid, suspend_cancel)) {
-        LOGED(ERROR, domid, "xc_domain_resume failed");
-        rc = ERROR_FAIL;
-        goto out;
-    }
-
-    if (!xs_resume_domain(CTX->xsh, domid)) {
-        LOGED(ERROR, domid, "xs_resume_domain failed");
-        rc = ERROR_FAIL;
-    }
+    rc = domain_resume_raw(gc, domid, suspend_cancel);
 out:
     return rc;
 }
@@ -660,19 +667,9 @@ static void domain_resume_done(libxl__egc *egc,
     /* Convenience aliases */
     libxl_domid domid = dmrs->domid;
 
-    if (rc) goto out;
-
-    if (xc_domain_resume(CTX->xch, domid, dmrs->suspend_cancel)) {
-        LOGED(ERROR, domid, "xc_domain_resume failed");
-        rc = ERROR_FAIL;
-        goto out;
-    }
+    if (!rc)
+        rc = domain_resume_raw(gc, domid, dmrs->suspend_cancel);
 
-    if (!xs_resume_domain(CTX->xsh, domid)) {
-        LOGED(ERROR, domid, "xs_resume_domain failed");
-        rc = ERROR_FAIL;
-    }
-out:
     dmrs->callback(egc, dmrs, rc);
 }
 
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] libxl: Add utility function to check guest status
  2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 1/5] libxl: Add a utility function for domain resume Demi Marie Obenour
@ 2022-09-27  3:03 ` Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 3/5] libxl: Properly suspend stubdomains Demi Marie Obenour
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross

This is used to check that a guest has not been destroyed and to obtain
information about it.  It will be used in subsequent patches.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 tools/libs/light/libxl_dom_suspend.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/libs/light/libxl_dom_suspend.c b/tools/libs/light/libxl_dom_suspend.c
index fa50e8801f35d173a99ae5dd19eb941649e14019..d2a88ea34efb115a8f2b861daf5884d95f39e81d 100644
--- a/tools/libs/light/libxl_dom_suspend.c
+++ b/tools/libs/light/libxl_dom_suspend.c
@@ -321,27 +321,36 @@ static void suspend_common_wait_guest_watch(libxl__egc *egc,
     suspend_common_wait_guest_check(egc, dsps);
 }
 
+static int check_guest_status(libxl__gc *gc, const uint32_t domid,
+                              xc_domaininfo_t *info, const char *what)
+{
+    int ret = xc_domain_getinfolist(CTX->xch, domid, 1, info);
+
+    if (ret < 0) {
+        LOGED(ERROR, domid, "unable to check for status of guest");
+        return ERROR_FAIL;
+    }
+
+    if (!(ret == 1 && info->domain == domid)) {
+        LOGED(ERROR, domid, "guest we were %s has been destroyed", what);
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 static void suspend_common_wait_guest_check(libxl__egc *egc,
         libxl__domain_suspend_state *dsps)
 {
     STATE_AO_GC(dsps->ao);
     xc_domaininfo_t info;
-    int ret;
     int shutdown_reason;
 
     /* Convenience aliases */
     const uint32_t domid = dsps->domid;
 
-    ret = xc_domain_getinfolist(CTX->xch, domid, 1, &info);
-    if (ret < 0) {
-        LOGED(ERROR, domid, "unable to check for status of guest");
+    if (check_guest_status(gc, domid, &info, "suspending"))
         goto err;
-    }
-
-    if (!(ret == 1 && info.domain == domid)) {
-        LOGED(ERROR, domid, "guest we were suspending has been destroyed");
-        goto err;
-    }
 
     if (!(info.flags & XEN_DOMINF_shutdown))
         /* keep waiting */
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] libxl: Properly suspend stubdomains
  2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 1/5] libxl: Add a utility function for domain resume Demi Marie Obenour
  2022-09-27  3:03 ` [PATCH 2/5] libxl: Add utility function to check guest status Demi Marie Obenour
@ 2022-09-27  3:03 ` Demi Marie Obenour
  2022-09-27  3:04 ` [PATCH 4/5] libxl: Fix race condition in domain suspension Demi Marie Obenour
  2022-09-27  3:04 ` [PATCH 5/5] libxl: Add additional domain suspend/resume logs Demi Marie Obenour
  4 siblings, 0 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross,
	Marek Marczykowski-Górecki

Currently, libxl neither pauses nor suspends a stubdomain when
suspending the domain it serves.  Qubes OS has an out-of-tree patch that
just pauses the stubdomain, but that is also insufficient: sys-net (an
HVM with an attached PCI device) does not properly resume from suspend
on some systems, and the stubdomain considers the TSC clocksource to be
unstable after resume.

This patch properly suspends the stubdomain.  Doing so requires creating
a nested libxl__domain_suspend_state structure and freeing it when
necessary.  Additionally, a new callback function is added that runs
when the stubdomain has been suspended.  libxl__qmp_suspend_save() is
called by this new callback.

Saving the state doesn't work on Qubes for two reasons:
 - save/restore consoles are not enabled (as requiring qemu in dom0)
 - avoid using QMP

Link: https://github.com/QubesOS/qubes-issues/issues/7404
Co-authored-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 tools/libs/light/libxl_dom_suspend.c | 171 ++++++++++++++++++++++++++++----
 tools/libs/light/libxl_internal.h    |   1 +
 2 files changed, 150 insertions(+), 22 deletions(-)

diff --git a/tools/libs/light/libxl_dom_suspend.c b/tools/libs/light/libxl_dom_suspend.c
index d2a88ea34efb115a8f2b861daf5884d95f39e81d..d276b3c17e70105c19c82e9da570a24297d039f5 100644
--- a/tools/libs/light/libxl_dom_suspend.c
+++ b/tools/libs/light/libxl_dom_suspend.c
@@ -19,9 +19,9 @@
 
 /*====================== Domain suspend =======================*/
 
-int libxl__domain_suspend_init(libxl__egc *egc,
-                               libxl__domain_suspend_state *dsps,
-                               libxl_domain_type type)
+static int libxl__domain_suspend_init_inner(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dsps,
+                                            libxl_domain_type type)
 {
     STATE_AO_GC(dsps->ao);
     int rc = ERROR_FAIL;
@@ -35,6 +35,7 @@ int libxl__domain_suspend_init(libxl__egc *egc,
     libxl__ev_xswatch_init(&dsps->guest_watch);
     libxl__ev_time_init(&dsps->guest_timeout);
     libxl__ev_qmp_init(&dsps->qmp);
+    dsps->dm_dsps = dsps->parent_dsps = NULL;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID) goto out;
     dsps->type = type;
@@ -67,18 +68,95 @@ out:
     return rc;
 }
 
+static void domain_suspend_device_model_domain_callback(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dsps,
+                                       int rc);
+
+int libxl__domain_suspend_init(libxl__egc *egc,
+                               libxl__domain_suspend_state *dsps,
+                               libxl_domain_type type)
+{
+    STATE_AO_GC(dsps->ao);
+    uint32_t const domid = dsps->domid;
+    int rc = libxl__domain_suspend_init_inner(egc, dsps, type);
+
+    LOGD(DEBUG, domid, "Initialized suspend state");
+    if (type != LIBXL_DOMAIN_TYPE_HVM ||
+        !libxl__stubdomain_is_linux_running(gc, domid))
+        return rc;
+
+    LOGD(DEBUG, domid, "Need to suspend stubdomain too");
+    /* need to suspend the stubdomain too */
+    uint32_t const dm_domid = libxl_get_stubdom_id(CTX, domid);
+    if (rc == 0 && dm_domid != 0) {
+        libxl__domain_suspend_state *dm_dsps;
+
+        GCNEW(dm_dsps);
+        dm_dsps->domid = dm_domid;
+        dm_dsps->ao = dsps->ao;
+
+        dm_dsps->type = libxl__domain_type(gc, dm_domid);
+        if (dm_dsps->type == LIBXL_DOMAIN_TYPE_PV ||
+            dm_dsps->type == LIBXL_DOMAIN_TYPE_PVH) {
+            rc = libxl__domain_suspend_init_inner(egc, dm_dsps, dm_dsps->type);
+        } else {
+            LOGD(ERROR, domid, "Stubdomain %" PRIu32 " detected as neither PV "
+                               "nor PVH (got %d), cannot suspend", dm_domid, dm_dsps->type);
+            rc = ERROR_FAIL;
+        }
+        if (rc)
+            libxl__domain_suspend_dispose(gc, dsps);
+        else {
+            dm_dsps->callback_common_done = domain_suspend_device_model_domain_callback;
+            dsps->dm_dsps = dm_dsps;
+            dm_dsps->parent_dsps = dsps;
+        }
+    }
+    return rc;
+}
+
 void libxl__domain_suspend_dispose(libxl__gc *gc,
                                    libxl__domain_suspend_state  *dsps)
 {
-    libxl__xswait_stop(gc, &dsps->pvcontrol);
-    libxl__ev_evtchn_cancel(gc, &dsps->guest_evtchn);
-    libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
-    libxl__ev_time_deregister(gc, &dsps->guest_timeout);
-    libxl__ev_qmp_dispose(gc, &dsps->qmp);
+    for (;;) {
+        libxl__xswait_stop(gc, &dsps->pvcontrol);
+        libxl__ev_evtchn_cancel(gc, &dsps->guest_evtchn);
+        libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
+        libxl__ev_time_deregister(gc, &dsps->guest_timeout);
+        libxl__ev_qmp_dispose(gc, &dsps->qmp);
+        if (dsps->dm_dsps == NULL)
+            break;
+        assert(dsps->parent_dsps == NULL);
+        assert(dsps->dm_dsps->parent_dsps == dsps);
+        dsps = dsps->dm_dsps;
+        assert(dsps->dm_dsps == NULL);
+    }
 }
 
 /*----- callbacks, called by xc_domain_save -----*/
 
+static void domain_suspend_device_model_domain_callback(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dm_dsps,
+                                       int rc)
+{
+    STATE_AO_GC(dm_dsps->ao);
+    libxl__domain_suspend_state *dsps = dm_dsps->parent_dsps;
+    assert(dm_dsps->dm_dsps == NULL);
+    assert(dsps);
+    assert(dsps->dm_dsps == dm_dsps);
+    if (rc) {
+        LOGD(ERROR, dsps->domid,
+             "failed to suspend device model (stubdom id %d), rc=%d", dm_dsps->domid, rc);
+    } else {
+        LOGD(DEBUG, dsps->domid,
+             "Successfully suspended stubdomain (stubdom id %d)", dm_dsps->domid);
+    }
+    dsps->callback_device_model_done(egc, dsps, rc); /* must be last */
+}
+
+static void domain_suspend_callback_common(libxl__egc *egc,
+                                           libxl__domain_suspend_state *dsps);
+
 void libxl__domain_suspend_device_model(libxl__egc *egc,
                                        libxl__domain_suspend_state *dsps)
 {
@@ -86,6 +164,7 @@ void libxl__domain_suspend_device_model(libxl__egc *egc,
     int rc = 0;
     uint32_t const domid = dsps->domid;
     const char *const filename = dsps->dm_savefile;
+    libxl__domain_suspend_state *dm_dsps = dsps->dm_dsps;
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
@@ -95,15 +174,24 @@ void libxl__domain_suspend_device_model(libxl__egc *egc,
         break;
     }
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        /* calls dsps->callback_device_model_done when done */
-        libxl__qmp_suspend_save(egc, dsps); /* must be last */
+        if (dm_dsps) {
+            assert(dm_dsps->type == LIBXL_DOMAIN_TYPE_PVH ||
+                   dm_dsps->type == LIBXL_DOMAIN_TYPE_PV);
+            LOGD(DEBUG, domid, "Suspending stubdomain (domid %" PRIu32 ")",
+                 dm_dsps->domid);
+            /* calls dm_dsps->callback_common_done when done */
+            domain_suspend_callback_common(egc, dm_dsps); /* must be last */
+        } else {
+            LOGD(DEBUG, domid, "Stubdomain not in use");
+            /* calls dsps->callback_device_model_done when done */
+            libxl__qmp_suspend_save(egc, dsps); /* must be last */
+        }
         return;
     default:
         rc = ERROR_INVAL;
-        goto out;
+        break;
     }
 
-out:
     if (rc)
         LOGD(ERROR, dsps->domid,
              "failed to suspend device model, rc=%d", rc);
@@ -130,8 +218,6 @@ static void domain_suspend_common_done(libxl__egc *egc,
                                        libxl__domain_suspend_state *dsps,
                                        int rc);
 
-static void domain_suspend_callback_common(libxl__egc *egc,
-                                           libxl__domain_suspend_state *dsps);
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dsps, int rc);
 
@@ -308,6 +394,7 @@ static void domain_suspend_common_wait_guest(libxl__egc *egc,
                                      suspend_common_wait_guest_timeout,
                                      60*1000);
     if (rc) goto err;
+
     return;
 
  err:
@@ -528,6 +615,7 @@ void libxl__dm_resume(libxl__egc *egc,
 {
     STATE_AO_GC(dmrs->ao);
     int rc = 0;
+    uint32_t dm_domid = libxl_get_stubdom_id(CTX, dmrs->domid);
 
     /* Convenience aliases */
     libxl_domid domid = dmrs->domid;
@@ -543,7 +631,6 @@ void libxl__dm_resume(libxl__egc *egc,
 
     switch (libxl__device_model_version_running(gc, domid)) {
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
-        uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid);
         const char *path, *state;
 
         path = DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, "/state");
@@ -563,14 +650,54 @@ void libxl__dm_resume(libxl__egc *egc,
         if (rc) goto out;
         break;
     }
-    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
-        qmp->ao = dmrs->ao;
-        qmp->domid = domid;
-        qmp->callback = dm_resume_qmp_done;
-        qmp->payload_fd = -1;
-        rc = libxl__ev_qmp_send(egc, qmp, "cont", NULL);
+    case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: {
+        xc_domaininfo_t dm_info;
+
+        if (dm_domid == 0 /* || !libxl__stubdomain_is_linux_running() */) {
+            LOGD(DEBUG, domid, "Resuming dom0 device model using QMP");
+            qmp->ao = dmrs->ao;
+            qmp->domid = domid;
+            qmp->callback = dm_resume_qmp_done;
+            qmp->payload_fd = -1;
+            rc = libxl__ev_qmp_send(egc, qmp, "cont", NULL);
+            if (rc) goto out;
+            return;
+        }
+
+        LOGD(DEBUG, domid, "Resuming modern stubdomain: ID %" PRIu32, dm_domid);
+
+        rc = check_guest_status(gc, dm_domid, &dm_info, "resuming");
         if (rc) goto out;
-        break;
+
+        if ((dm_info.flags & XEN_DOMINF_paused)) {
+            rc = xc_domain_unpause(CTX->xch, dm_domid);
+            if (rc < 0) {
+                LOGED(ERROR, domid,
+                      "xc_domain_unpause failed for stubdomain %" PRIu32,
+                      dm_domid);
+                goto out;
+            }
+            LOGD(DEBUG, domid,
+                 "xc_domain_unpause succeeded for stubdomain %" PRIu32,
+                 dm_domid);
+        }
+
+        if ((dm_info.flags & XEN_DOMINF_shutdown)) {
+            int shutdown_reason =
+                (dm_info.flags >> XEN_DOMINF_shutdownshift)
+                & XEN_DOMINF_shutdownmask;
+            if (shutdown_reason != SHUTDOWN_suspend) {
+                LOGD(ERROR, domid, "stubdomain %d being resumed shut down"
+                     " with unexpected reason code %d",
+                     dm_domid, shutdown_reason);
+                rc = ERROR_FAIL;
+                goto out;
+            }
+
+            rc = domain_resume_raw(gc, dm_domid, dmrs->suspend_cancel);
+        }
+        goto out;
+    }
     default:
         rc = ERROR_INVAL;
         goto out;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 34063baf81c4eae81790e2f25cc37f5cf58eb196..37e5c98f63472e66100b6301d78ac0178cbf987e 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3615,6 +3615,7 @@ struct libxl__domain_suspend_state {
                               struct libxl__domain_suspend_state*, int rc);
     void (*callback_common_done)(libxl__egc*,
                                  struct libxl__domain_suspend_state*, int ok);
+    struct libxl__domain_suspend_state *dm_dsps, *parent_dsps;
 };
 int libxl__domain_suspend_init(libxl__egc *egc,
                                libxl__domain_suspend_state *dsps,
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] libxl: Fix race condition in domain suspension
  2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
                   ` (2 preceding siblings ...)
  2022-09-27  3:03 ` [PATCH 3/5] libxl: Properly suspend stubdomains Demi Marie Obenour
@ 2022-09-27  3:04 ` Demi Marie Obenour
  2022-09-27  3:04 ` [PATCH 5/5] libxl: Add additional domain suspend/resume logs Demi Marie Obenour
  4 siblings, 0 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross

Check if the domain has suspended after setting the XenStore watch to
prevent race conditions.  Also check if a guest has suspended when the
timeout handler is called, and do not consider this to be a timeout.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 tools/libs/light/libxl_dom_suspend.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/libs/light/libxl_dom_suspend.c b/tools/libs/light/libxl_dom_suspend.c
index d276b3c17e70105c19c82e9da570a24297d039f5..42c0e0a152e04fab34152d711564ffe148f24a4c 100644
--- a/tools/libs/light/libxl_dom_suspend.c
+++ b/tools/libs/light/libxl_dom_suspend.c
@@ -209,7 +209,8 @@ static void domain_suspend_common_wait_guest_evtchn(libxl__egc *egc,
         libxl__ev_evtchn *evev);
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path);
-static void suspend_common_wait_guest_check(libxl__egc *egc,
+/* Returns true if a callback was called, false otherwise */
+static bool suspend_common_wait_guest_check(libxl__egc *egc,
         libxl__domain_suspend_state *dsps);
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
       libxl__ev_time *ev, const struct timeval *requested_abs, int rc);
@@ -426,7 +427,7 @@ static int check_guest_status(libxl__gc *gc, const uint32_t domid,
     return 0;
 }
 
-static void suspend_common_wait_guest_check(libxl__egc *egc,
+static bool suspend_common_wait_guest_check(libxl__egc *egc,
         libxl__domain_suspend_state *dsps)
 {
     STATE_AO_GC(dsps->ao);
@@ -441,7 +442,7 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
 
     if (!(info.flags & XEN_DOMINF_shutdown))
         /* keep waiting */
-        return;
+        return false;
 
     shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift)
         & XEN_DOMINF_shutdownmask;
@@ -452,11 +453,15 @@ static void suspend_common_wait_guest_check(libxl__egc *egc,
     }
 
     LOGD(DEBUG, domid, "guest has suspended");
+    dsps->guest_responded = 1;
+    libxl__xswait_stop(gc, &dsps->pvcontrol);
     domain_suspend_common_guest_suspended(egc, dsps);
-    return;
+    return true;
 
  err:
+    libxl__xswait_stop(gc, &dsps->pvcontrol);
     domain_suspend_common_done(egc, dsps, ERROR_FAIL);
+    return true;
 }
 
 static void suspend_common_wait_guest_timeout(libxl__egc *egc,
@@ -464,6 +469,8 @@ static void suspend_common_wait_guest_timeout(libxl__egc *egc,
 {
     libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, guest_timeout);
     STATE_AO_GC(dsps->ao);
+    if (suspend_common_wait_guest_check(egc, dsps))
+        return;
     if (rc == ERROR_TIMEDOUT) {
         LOGD(ERROR, dsps->domid, "guest did not suspend, timed out");
         rc = ERROR_GUEST_TIMEDOUT;
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] libxl: Add additional domain suspend/resume logs
  2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
                   ` (3 preceding siblings ...)
  2022-09-27  3:04 ` [PATCH 4/5] libxl: Fix race condition in domain suspension Demi Marie Obenour
@ 2022-09-27  3:04 ` Demi Marie Obenour
  4 siblings, 0 replies; 6+ messages in thread
From: Demi Marie Obenour @ 2022-09-27  3:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Demi Marie Obenour, Wei Liu, Anthony PERARD, Juergen Gross

This was useful when debugging, but is not required.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
 tools/libs/light/libxl_dom_suspend.c | 20 ++++++++++++++++++--
 tools/libs/light/libxl_domain.c      |  1 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_dom_suspend.c b/tools/libs/light/libxl_dom_suspend.c
index 42c0e0a152e04fab34152d711564ffe148f24a4c..55a172a46f8703661e696971bee07dce93117411 100644
--- a/tools/libs/light/libxl_dom_suspend.c
+++ b/tools/libs/light/libxl_dom_suspend.c
@@ -321,9 +321,11 @@ static void domain_suspend_common_pvcontrol_suspending(libxl__egc *egc,
     STATE_AO_GC(dsps->ao);
     xs_transaction_t t = 0;
 
-    if (!rc && !domain_suspend_pvcontrol_acked(state))
+    if (!rc && !domain_suspend_pvcontrol_acked(state)) {
         /* keep waiting */
+        LOGD(DEBUG, dsps->domid, "PV control callback without ack");
         return;
+    }
 
     libxl__xswait_stop(gc, &dsps->pvcontrol);
 
@@ -405,7 +407,10 @@ static void domain_suspend_common_wait_guest(libxl__egc *egc,
 static void suspend_common_wait_guest_watch(libxl__egc *egc,
       libxl__ev_xswatch *xsw, const char *watch_path, const char *event_path)
 {
+    EGC_GC;
     libxl__domain_suspend_state *dsps = CONTAINER_OF(xsw, *dsps, guest_watch);
+
+    LOGD(DEBUG, dsps->domid, "@releaseDomain watch fired, checking guest status");
     suspend_common_wait_guest_check(egc, dsps);
 }
 
@@ -440,9 +445,11 @@ static bool suspend_common_wait_guest_check(libxl__egc *egc,
     if (check_guest_status(gc, domid, &info, "suspending"))
         goto err;
 
-    if (!(info.flags & XEN_DOMINF_shutdown))
+    if (!(info.flags & XEN_DOMINF_shutdown)) {
+        LOGD(DEBUG, domid, "guest we were suspending has not shut down yet");
         /* keep waiting */
         return false;
+    }
 
     shutdown_reason = (info.flags >> XEN_DOMINF_shutdownshift)
         & XEN_DOMINF_shutdownmask;
@@ -469,11 +476,14 @@ static void suspend_common_wait_guest_timeout(libxl__egc *egc,
 {
     libxl__domain_suspend_state *dsps = CONTAINER_OF(ev, *dsps, guest_timeout);
     STATE_AO_GC(dsps->ao);
+    LOGD(DEBUG, dsps->domid, "Timeout callback triggered");
     if (suspend_common_wait_guest_check(egc, dsps))
         return;
     if (rc == ERROR_TIMEDOUT) {
         LOGD(ERROR, dsps->domid, "guest did not suspend, timed out");
         rc = ERROR_GUEST_TIMEDOUT;
+    } else {
+        LOGD(ERROR, dsps->domid, "error in timeout handler (code %d)", rc);
     }
     domain_suspend_common_done(egc, dsps, rc);
 }
@@ -628,6 +638,8 @@ void libxl__dm_resume(libxl__egc *egc,
     libxl_domid domid = dmrs->domid;
     libxl__ev_qmp *qmp = &dmrs->qmp;
 
+    LOGD(DEBUG, domid, "Resuming device model");
+
     dm_resume_init(dmrs);
 
     rc = libxl__ev_time_register_rel(dmrs->ao,
@@ -640,6 +652,7 @@ void libxl__dm_resume(libxl__egc *egc,
     case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: {
         const char *path, *state;
 
+        LOGD(DEBUG, domid, "Resuming legacy device model: stubdomain ID %" PRIu32, dm_domid);
         path = DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, "/state");
         rc = libxl__xs_read_checked(gc, XBT_NULL, path, &state);
         if (rc) goto out;
@@ -706,6 +719,7 @@ void libxl__dm_resume(libxl__egc *egc,
         goto out;
     }
     default:
+        LOGD(ERROR, domid, "Invalid device model type, cannot resume");
         rc = ERROR_INVAL;
         goto out;
     }
@@ -782,6 +796,8 @@ void libxl__domain_resume(libxl__egc *egc,
     int rc = 0;
     libxl_domain_type type = libxl__domain_type(gc, dmrs->domid);
 
+    LOGD(DEBUG, dmrs->domid, "Resuming domain");
+
     if (type == LIBXL_DOMAIN_TYPE_INVALID) {
         rc = ERROR_FAIL;
         goto out;
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index a6b0b509cc04379e9d596a38644e3db5963011ea..e8e0894c8617a36f6fc13af73daa1ed290a466ce 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -566,6 +566,7 @@ int libxl_domain_suspend_only(libxl_ctx *ctx, uint32_t domid,
     dsps->ao = ao;
     dsps->domid = domid;
     dsps->type = type;
+    LOGD(DEBUG, domid, "Received request to suspend domain");
     rc = libxl__domain_suspend_init(egc, dsps, type);
     if (rc < 0) goto out_err;
     dsps->callback_common_done = domain_suspend_empty_cb;
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-27  3:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27  3:03 [PATCH 0/5] Suspend and resume stubdomains Demi Marie Obenour
2022-09-27  3:03 ` [PATCH 1/5] libxl: Add a utility function for domain resume Demi Marie Obenour
2022-09-27  3:03 ` [PATCH 2/5] libxl: Add utility function to check guest status Demi Marie Obenour
2022-09-27  3:03 ` [PATCH 3/5] libxl: Properly suspend stubdomains Demi Marie Obenour
2022-09-27  3:04 ` [PATCH 4/5] libxl: Fix race condition in domain suspension Demi Marie Obenour
2022-09-27  3:04 ` [PATCH 5/5] libxl: Add additional domain suspend/resume logs Demi Marie Obenour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).