xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Olaf Hering <olaf@aepfle.de>
To: xen-devel@lists.xenproject.org
Cc: Olaf Hering <olaf@aepfle.de>, Ian Jackson <iwj@xenproject.org>,
	Wei Liu <wl@xen.org>, Anthony PERARD <anthony.perard@citrix.com>
Subject: [PATCH v20210601 33/38] tools: add --abort_if_busy to libxl_domain_suspend
Date: Tue,  1 Jun 2021 18:11:13 +0200	[thread overview]
Message-ID: <20210601161118.18986-34-olaf@aepfle.de> (raw)
In-Reply-To: <20210601161118.18986-1-olaf@aepfle.de>

Provide a knob to the host admin to abort the live migration of a
running domU if the downtime during final transit will be too long
for the workload within domU.

Adjust error reporting. Add ERROR_MIGRATION_ABORTED to allow callers of
libxl_domain_suspend to distinguish between errors and the requested
constraint.

Adjust precopy_policy to simplify reporting of remaining dirty pages.
The loop in send_memory_live populates ->dirty_count in a different
place than ->iteration. Let it proceeed one more time to provide the
desired information before leaving the loop.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .abort_if_busy property.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 docs/man/xl.1.pod.in                  |  8 +++++++
 tools/include/libxl.h                 |  1 +
 tools/libs/light/libxl_dom_save.c     |  7 ++++++-
 tools/libs/light/libxl_domain.c       |  1 +
 tools/libs/light/libxl_internal.h     |  2 ++
 tools/libs/light/libxl_stream_write.c |  9 +++++++-
 tools/libs/light/libxl_types.idl      |  1 +
 tools/xl/xl_cmdtable.c                |  6 +++++-
 tools/xl/xl_migrate.c                 | 30 ++++++++++++++++++++-------
 9 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 43609f6cdd..b258d56ab6 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -508,6 +508,14 @@ low, the guest is suspended and the domU will finally be moved to I<host>.
 This allows the host admin to control for how long the domU will likely
 be suspended during transit.
 
+=item B<--abort_if_busy>
+
+Abort migration instead of doing final suspend/move/resume if the
+guest produced more than I<min_remaining> dirty pages during th number
+of I<max_iters> iterations.
+This avoids long periods of time where the guest is suspended, which
+may confuse the workload within domU.
+
 =back
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 28d70b1078..cc056ed627 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1719,6 +1719,7 @@ typedef struct {
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
+#define LIBXL_SUSPEND_ABORT_IF_BUSY 4
 
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
                          libxl_domain_suspend_props *props,
diff --git a/tools/libs/light/libxl_dom_save.c b/tools/libs/light/libxl_dom_save.c
index ad5df89b2c..1999a8997f 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -383,11 +383,16 @@ static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
          stats.iteration, stats.dirty_count, stats.total_written);
     if (stats.dirty_count >= 0 && stats.dirty_count < dss->min_remaining)
         goto stop_copy;
-    if (stats.iteration >= dss->max_iters)
+    if (stats.dirty_count >= 0 && stats.iteration >= dss->max_iters)
         goto stop_copy;
     return XGS_POLICY_CONTINUE_PRECOPY;
 
 stop_copy:
+    if (dss->abort_if_busy)
+    {
+        dss->remaining_dirty_pages = stats.dirty_count;
+        return XGS_POLICY_ABORT;
+    }
     return XGS_POLICY_STOP_AND_COPY;
 }
 
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index ae4dc9ad01..913653bd76 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -529,6 +529,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
     dss->type = type;
     dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
     dss->min_remaining = props->min_remaining ?: LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT;
+    dss->abort_if_busy = props->flags & LIBXL_SUSPEND_ABORT_IF_BUSY;
     dss->live = props->flags & LIBXL_SUSPEND_LIVE;
     dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
     dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 63028586fe..7453a3aa7b 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3640,9 +3640,11 @@ struct libxl__domain_save_state {
     libxl_domain_type type;
     int live;
     int debug;
+    int abort_if_busy;
     int checkpointed_stream;
     uint32_t max_iters;
     uint32_t min_remaining;
+    long remaining_dirty_pages;
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
diff --git a/tools/libs/light/libxl_stream_write.c b/tools/libs/light/libxl_stream_write.c
index 634f3240d1..1ab3943f3e 100644
--- a/tools/libs/light/libxl_stream_write.c
+++ b/tools/libs/light/libxl_stream_write.c
@@ -344,11 +344,18 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
         goto err;
 
     if (retval) {
+        if (dss->remaining_dirty_pages) {
+            LOGD(NOTICE, dss->domid, "saving domain: aborted,"
+                 " %ld remaining dirty pages.", dss->remaining_dirty_pages);
+        } else {
         LOGEVD(ERROR, errnoval, dss->domid, "saving domain: %s",
               dss->dsps.guest_responded ?
               "domain responded to suspend request" :
               "domain did not respond to suspend request");
-        if (!dss->dsps.guest_responded)
+        }
+        if (dss->remaining_dirty_pages)
+           rc = ERROR_MIGRATION_ABORTED;
+        else if(!dss->dsps.guest_responded)
             rc = ERROR_GUEST_TIMEDOUT;
         else if (dss->rc)
             rc = dss->rc;
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index f45adddab0..b91769ee10 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -76,6 +76,7 @@ libxl_error = Enumeration("error", [
     (-30, "QMP_DEVICE_NOT_ACTIVE"), # a device has failed to be become active
     (-31, "QMP_DEVICE_NOT_FOUND"), # the requested device has not been found
     (-32, "QEMU_API"), # QEMU's replies don't contains expected members
+    (-33, "MIGRATION_ABORTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index acb84e3486..6c9de3bdec 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -176,7 +176,11 @@ const struct cmd_spec cmd_table[] = {
       "-p                Do not unpause domain after migrating it.\n"
       "-D                Preserve the domain id\n"
       "--max_iters N     Number of copy iterations before final stop+move\n"
-      "--min_remaining N Number of remaining dirty pages before final stop+move"
+      "--min_remaining N Number of remaining dirty pages before final stop+move\n"
+      "--abort_if_busy   Abort migration instead of doing final stop+move,\n"
+      "                  if the number of dirty pages is higher than <min_remaining>\n"
+      "                  after <max_iters> iterations. Otherwise the amount of memory\n"
+      "                  to be transfered would exceed maximum allowed domU downtime."
     },
     { "restore",
       &main_restore, 0, 1,
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 14feb2b7ec..f523746e5b 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -177,7 +177,7 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 }
 
 static void migrate_domain(uint32_t domid, int preserve_domid,
-                           const char *rune, int debug,
+                           const char *rune, int debug, int abort_if_busy,
                            uint32_t max_iters,
                            uint32_t min_remaining,
                            const char *override_config_file)
@@ -213,14 +213,20 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
 
     if (debug)
         props.flags |= LIBXL_SUSPEND_DEBUG;
+    if (abort_if_busy)
+        props.flags |= LIBXL_SUSPEND_ABORT_IF_BUSY;
     rc = libxl_domain_suspend(ctx, domid, send_fd, &props, NULL);
     if (rc) {
         fprintf(stderr, "migration sender: libxl_domain_suspend failed"
                 " (rc=%d)\n", rc);
-        if (rc == ERROR_GUEST_TIMEDOUT)
-            goto failed_suspend;
-        else
-            goto failed_resume;
+        switch (rc) {
+            case ERROR_GUEST_TIMEDOUT:
+                goto failed_suspend;
+            case ERROR_MIGRATION_ABORTED:
+                goto failed_busy;
+            default:
+                goto failed_resume;
+        }
     }
 
     //fprintf(stderr, "migration sender: Transfer complete.\n");
@@ -302,6 +308,12 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     fprintf(stderr, "Migration failed, failed to suspend at sender.\n");
     exit(EXIT_FAILURE);
 
+ failed_busy:
+    close(send_fd);
+    migration_child_report(recv_fd);
+    fprintf(stderr, "Migration aborted as requested, domain is too busy.\n");
+    exit(EXIT_FAILURE);
+
  failed_resume:
     close(send_fd);
     migration_child_report(recv_fd);
@@ -545,13 +557,14 @@ int main_migrate(int argc, char **argv)
     char *rune = NULL;
     char *host;
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
-    int preserve_domid = 0;
+    int preserve_domid = 0, abort_if_busy = 0;
     uint32_t max_iters = 0;
     uint32_t min_remaining = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
         {"max_iters", 1, 0, 0x101},
         {"min_remaining", 1, 0, 0x102},
+        {"abort_if_busy", 0, 0, 0x103},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
@@ -585,6 +598,9 @@ int main_migrate(int argc, char **argv)
     case 0x102: /* --min_remaining */
         min_remaining = atoi(optarg);
         break;
+    case 0x103: /* --abort_if_busy */
+        abort_if_busy = 1;
+        break;
     case 0x200: /* --live */
         /* ignored for compatibility with xm */
         break;
@@ -619,7 +635,7 @@ int main_migrate(int argc, char **argv)
                   pause_after_migration ? " -p" : "");
     }
 
-    migrate_domain(domid, preserve_domid, rune, debug,
+    migrate_domain(domid, preserve_domid, rune, debug, abort_if_busy,
                    max_iters, min_remaining, config_filename);
     return EXIT_SUCCESS;
 }


  parent reply	other threads:[~2021-06-01 16:14 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 16:10 [PATCH v20210601 00/38] leftover from 2020 Olaf Hering
2021-06-01 16:10 ` [PATCH v20210601 01/38] tools: add API to work with sevaral bits at once Olaf Hering
2021-06-02  6:19   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 02/38] xl: fix description of migrate --debug Olaf Hering
2021-06-02  6:09   ` Juergen Gross
2021-06-02 10:43     ` Olaf Hering
2021-06-02 11:43       ` Juergen Gross
2021-06-02 12:32   ` [PATCH v20210602 " Olaf Hering
2021-06-02 13:48     ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 03/38] tools: create libxensaverestore Olaf Hering
2021-06-01 16:10 ` [PATCH v20210601 04/38] tools: add readv_exact to libxenctrl Olaf Hering
2021-06-02  6:30   ` Juergen Gross
2021-06-02 10:57     ` Olaf Hering
2021-06-02 11:05       ` Olaf Hering
2021-06-02 11:41       ` Juergen Gross
2021-06-07  9:46         ` Olaf Hering
2021-06-07 11:31         ` Olaf Hering
2021-06-01 16:10 ` [PATCH v20210601 05/38] tools: add xc_is_known_page_type " Olaf Hering
2021-06-02  6:51   ` Juergen Gross
2021-06-02 11:10     ` Olaf Hering
2021-06-02 11:48       ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 06/38] tools: use xc_is_known_page_type Olaf Hering
2021-06-02  6:53   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 07/38] tools: unify type checking for data pfns in migration stream Olaf Hering
2021-06-02  6:59   ` Juergen Gross
2021-06-02 11:21     ` Olaf Hering
2021-06-02 12:03       ` Juergen Gross
2021-06-07 10:12         ` Olaf Hering
2021-06-07 10:22           ` Juergen Gross
2021-06-18 12:25     ` Olaf Hering
2021-06-01 16:10 ` [PATCH v20210601 08/38] tools: show migration transfer rate in send_dirty_pages Olaf Hering
2021-06-02  7:10   ` Juergen Gross
2021-06-08  8:58     ` Olaf Hering
2021-06-08 10:07       ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 09/38] tools/guest: prepare to allocate arrays once Olaf Hering
2021-06-02  7:29   ` Juergen Gross
2021-06-02 12:03     ` Olaf Hering
2021-06-02 12:09       ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 10/38] tools/guest: save: move batch_pfns Olaf Hering
2021-06-02  7:31   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 11/38] tools/guest: save: move mfns array Olaf Hering
2021-06-02  7:32   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 12/38] tools/guest: save: move types array Olaf Hering
2021-06-02  7:32   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 13/38] tools/guest: save: move errors array Olaf Hering
2021-06-02  7:33   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 14/38] tools/guest: save: move iov array Olaf Hering
2021-06-02  7:34   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 15/38] tools/guest: save: move rec_pfns array Olaf Hering
2021-06-02  7:35   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 16/38] tools/guest: save: move guest_data array Olaf Hering
2021-06-02  7:39   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 17/38] tools/guest: save: move local_pages array Olaf Hering
2021-06-02  7:47   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 18/38] tools/guest: restore: move pfns array Olaf Hering
2021-06-02  7:55   ` Juergen Gross
2021-06-01 16:10 ` [PATCH v20210601 19/38] tools/guest: restore: move types array Olaf Hering
2021-06-02  7:56   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 20/38] tools/guest: restore: move mfns array Olaf Hering
2021-06-02  7:57   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 21/38] tools/guest: restore: move map_errs array Olaf Hering
2021-06-02  7:58   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 22/38] tools/guest: restore: move mfns array in populate_pfns Olaf Hering
2021-06-02  7:59   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 23/38] tools/guest: restore: move pfns " Olaf Hering
2021-06-02  7:59   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 24/38] tools/guest: restore: split record processing Olaf Hering
2021-06-02  9:57   ` Juergen Gross
2021-06-01 16:11 ` [PATCH v20210601 25/38] tools/guest: restore: split handle_page_data Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 26/38] tools/guest: restore: write data directly into guest Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 27/38] tools: recognize LIBXL_API_VERSION for 4.16 Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 28/38] tools: adjust libxl_domain_suspend to receive a struct props Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 29/38] tools: change struct precopy_stats to precopy_stats_t Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 30/38] tools: add callback to libxl for precopy_policy and precopy_stats_t Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 31/38] tools: add --max_iters to libxl_domain_suspend Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 32/38] tools: add --min_remaining " Olaf Hering
2021-06-01 16:11 ` Olaf Hering [this message]
2021-06-01 16:11 ` [PATCH v20210601 34/38] tools: add API for expandable bitmaps Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 35/38] tools: use xg_sr_bitmap for populated_pfns Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 36/38] tools: use superpages during restore of HVM guest Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 37/38] tools: remove migration stream verify code Olaf Hering
2021-06-01 16:11 ` [PATCH v20210601 38/38] hotplug/Linux: fix starting of xenstored with restarting systemd Olaf Hering
2021-06-02  6:10 ` [PATCH v20210601 00/38] leftover from 2020 Juergen Gross
2021-06-02  6:54   ` Olaf Hering
2021-06-02  7:00     ` Juergen Gross
2021-06-02 12:07       ` Olaf Hering
2021-06-02 12:15         ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210601161118.18986-34-olaf@aepfle.de \
    --to=olaf@aepfle.de \
    --cc=anthony.perard@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).