[PATCH v3 0/9] tests/qtest: make migration-test massively faster

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/9] tests/qtest: make migration-test massively faster
@ 2023-05-31 13:23 Daniel P. Berrangé
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
                   ` (8 more replies)
  0 siblings, 9 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

This makes migration-test faster by observing that most of the pre-copy
tests don't need to be doing a live migration. They get sufficient code
coverage with the guest CPUs paused.

On my machine this cuts the overall execution time of migration-test
from 13 minutes, down to 8 minutes, without sacrificing any noticeable
code coverage.

Of the tests which do still run in live mode, some need to guarantee
a certain number of iterions. This is achieved by running the 1
iteration with an incredibly small bandwidth and max downtime to
prevent convergance, and watching query-migrate for the reported
iteration to increment. This guarantees that all the tests take at
least 30 seconds to run per iteration required.

Watching for the iteration counter to flip is inefficient and not
actually needed, except on the final iteration before starting
convergance. On this final iteration we merely need to prove that
some amount of already transferred data has been made dirty again.
This in turn will guarantee that a further iteration is required
beyond the current one. This proof is easy to achieve by monitoring
the values at two distinct addresses in guest RAM, and can cut the
30 second duration down to 1 second for one of the iterations.

After this this second optimization the runtime is reduced from
8 minutes, down to 1 minute 40 seconds, which is pretty decent given
the amount of coverage we're getting.

Daniel P. Berrangé (9):
  tests/qtest: add various qtest_qmp_assert_success() variants
  tests/qtest: add support for callback to receive QMP events
  tests/qtest: get rid of 'qmp_command' helper in migration test
  tests/qtest: get rid of some 'qtest_qmp' usage in migration test
  tests/qtest: switch to using event callbacks for STOP event
  tests/qtest: replace wait_command() with qtest_qmp_assert_success
  tests/qtest: capture RESUME events during migration
  tests/qtest: make more migration pre-copy scenarios run non-live
  tests/qtest: massively speed up migration-test

 tests/qtest/libqtest.c          | 119 ++++++++-
 tests/qtest/libqtest.h          | 152 ++++++++++-
 tests/qtest/migration-helpers.c | 101 ++-----
 tests/qtest/migration-helpers.h |  16 +-
 tests/qtest/migration-test.c    | 458 ++++++++++++++++++++------------
 5 files changed, 577 insertions(+), 269 deletions(-)

-- 
2.40.1

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:23   ` Thomas Huth
                     ` (2 more replies)
  2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

Add several counterparts of qtest_qmp_assert_success() that can

 * Use va_list instead of ...
 * Accept a list of FDs to send
 * Return the response data

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
 tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 209 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index c3a0ef5bb4..603c26d955 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -1229,14 +1229,23 @@ void qtest_memset(QTestState *s, uint64_t addr, uint8_t pattern, size_t size)
     qtest_rsp(s);
 }
 
-void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
+void qtest_vqmp_assert_success(QTestState *qts,
+                               const char *fmt, va_list args)
 {
-    va_list ap;
     QDict *response;
 
-    va_start(ap, fmt);
-    response = qtest_vqmp(qts, fmt, ap);
-    va_end(ap);
+    response = qtest_vqmp_assert_success_ref(qts, fmt, args);
+
+    qobject_unref(response);
+}
+
+QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
+                                     const char *fmt, va_list args)
+{
+    QDict *response;
+    QDict *ret;
+
+    response = qtest_vqmp(qts, fmt, args);
 
     g_assert(response);
     if (!qdict_haskey(response, "return")) {
@@ -1245,8 +1254,88 @@ void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
         g_string_free(s, true);
     }
     g_assert(qdict_haskey(response, "return"));
+    ret = qdict_get_qdict(response, "return");
+    qobject_ref(ret);
+    qobject_unref(response);
+
+    return ret;
+}
+
+#ifndef _WIN32
+QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
+                                         const char *fmt, va_list args)
+{
+    QDict *response;
+    QDict *ret;
+
+    response = qtest_vqmp_fds(qts, fds, nfds, fmt, args);
+
+    g_assert(response);
+    if (!qdict_haskey(response, "return")) {
+        GString *s = qobject_to_json_pretty(QOBJECT(response), true);
+        g_test_message("%s", s->str);
+        g_string_free(s, true);
+    }
+    g_assert(qdict_haskey(response, "return"));
+    ret = qdict_get_qdict(response, "return");
+    qobject_ref(ret);
+    qobject_unref(response);
+
+    return ret;
+}
+
+void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
+                                   const char *fmt, va_list args)
+{
+    QDict *response;
+    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, args);
     qobject_unref(response);
 }
+#endif /* !_WIN32 */
+
+void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
+{
+    QDict *response;
+    va_list ap;
+    va_start(ap, fmt);
+    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
+    va_end(ap);
+    qobject_unref(response);
+}
+
+QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
+{
+    QDict *response;
+    va_list ap;
+    va_start(ap, fmt);
+    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
+    va_end(ap);
+    return response;
+}
+
+#ifndef _WIN32
+void qtest_qmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
+                                  const char *fmt, ...)
+{
+    QDict *response;
+    va_list ap;
+    va_start(ap, fmt);
+    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);
+    va_end(ap);
+    qobject_unref(response);
+}
+
+QDict *qtest_qmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
+                                        const char *fmt, ...)
+{
+    QDict *response;
+    va_list ap;
+    va_start(ap, fmt);
+    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);
+    va_end(ap);
+    return response;
+}
+#endif /* !_WIN32 */
 
 bool qtest_big_endian(QTestState *s)
 {
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index 8d7d450963..41bc6633bd 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -693,6 +693,73 @@ void qtest_add_abrt_handler(GHookFunc fn, const void *data);
  */
 void qtest_remove_abrt_handler(void *data);
 
+/**
+ * qtest_vqmp_assert_success:
+ * @qts: QTestState instance to operate on
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ * @args: variable arguments for @fmt
+ *
+ * Sends a QMP message to QEMU and asserts that a 'return' key is present in
+ * the response.
+ */
+void qtest_vqmp_assert_success(QTestState *qts,
+                               const char *fmt, va_list args)
+    G_GNUC_PRINTF(2, 0);
+
+/**
+ * qtest_vqmp_assert_success_ref:
+ * @qts: QTestState instance to operate on
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ * @args: variable arguments for @fmt
+ *
+ * Sends a QMP message to QEMU, asserts that a 'return' key is present in
+ * the response, and returns the response.
+ */
+QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
+                                     const char *fmt, va_list args)
+    G_GNUC_PRINTF(2, 0);
+
+#ifndef _WIN32
+/**
+ * qtest_vqmp_fds_assert_success:
+ * @qts: QTestState instance to operate on
+ * @fds: the file descriptors to send
+ * @nfds: number of @fds to send
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ * @args: variable arguments for @fmt
+ *
+ * Sends a QMP message with file descriptors to QEMU and
+ * asserts that a 'return' key is present in the response.
+ */
+void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
+                                   const char *fmt, va_list args)
+    G_GNUC_PRINTF(4, 0);
+
+/**
+ * qtest_vqmp_fds_assert_success_ref:
+ * @qts: QTestState instance to operate on
+ * @fds: the file descriptors to send
+ * @nfds: number of @fds to send
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ * @args: variable arguments for @fmt
+ *
+ * Sends a QMP message with file descriptors to QEMU,
+ * asserts that a 'return' key is present in the response,
+ * and returns the response.
+ */
+QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
+                                         const char *fmt, va_list args)
+    G_GNUC_PRINTF(4, 0);
+#endif /* !_WIN32 */
+
 /**
  * qtest_qmp_assert_success:
  * @qts: QTestState instance to operate on
@@ -706,6 +773,54 @@ void qtest_remove_abrt_handler(void *data);
 void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
     G_GNUC_PRINTF(2, 3);
 
+/**
+ * qtest_qmp_assert_success_ref:
+ * @qts: QTestState instance to operate on
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ *
+ * Sends a QMP message to QEMU, asserts that a 'return' key is present in
+ * the response, and returns the response.
+ */
+QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
+    G_GNUC_PRINTF(2, 3);
+
+#ifndef _WIN32
+/**
+ * qtest_qmp_fd_assert_success:
+ * @qts: QTestState instance to operate on
+ * @fds: the file descriptors to send
+ * @nfds: number of @fds to send
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ *
+ * Sends a QMP message with file descriptors to QEMU and
+ * asserts that a 'return' key is present in the response.
+ */
+void qtest_qmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
+                                  const char *fmt, ...)
+    G_GNUC_PRINTF(4, 5);
+
+/**
+ * qtest_qmp_fd_assert_success_ref:
+ * @qts: QTestState instance to operate on
+ * @fds: the file descriptors to send
+ * @nfds: number of @fds to send
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ *
+ * Sends a QMP message with file descriptors to QEMU,
+ * asserts that a 'return' key is present in the response,
+ * and returns the response.
+ */
+QDict *qtest_qmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
+                                        const char *fmt, ...)
+    G_GNUC_PRINTF(4, 5);
+#endif /* !_WIN32 */
+
 /**
  * qtest_cb_for_every_machine:
  * @cb: Pointer to the callback function
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-05-31 14:57   ` Thomas Huth
  2023-06-01 12:14   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

Currently code must call one of the qtest_qmp_event* functions to
fetch events. These are only usable if the immediate caller knows
the particular event they want to capture, and are only interested
in one specific event type. Adding ability to register an event
callback lets the caller capture a range of events over any period
of time.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/libqtest.c | 20 ++++++++++++++++++--
 tests/qtest/libqtest.h | 37 +++++++++++++++++++++++++++++++++++--
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 603c26d955..1534177fc1 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -82,6 +82,8 @@ struct QTestState
     GString *rx;
     QTestTransportOps ops;
     GList *pending_events;
+    QTestQMPEventCallback eventCB;
+    void *eventData;
 };
 
 static GHookList abrt_hooks;
@@ -703,8 +705,14 @@ QDict *qtest_qmp_receive(QTestState *s)
         if (!qdict_get_try_str(response, "event")) {
             return response;
         }
-        /* Stash the event for a later consumption */
-        s->pending_events = g_list_append(s->pending_events, response);
+
+        if (s->eventCB) {
+            s->eventCB(s, qdict_get_str(response, "event"),
+                       response, s->eventData);
+        } else {
+            /* Stash the event for a later consumption */
+            s->pending_events = g_list_append(s->pending_events, response);
+        }
     }
 }
 
@@ -808,6 +816,14 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
     va_end(ap);
 }
 
+void qtest_qmp_set_event_callback(QTestState *s,
+                                  QTestQMPEventCallback cb, void *opaque)
+{
+    s->eventCB = cb;
+    s->eventData = opaque;
+}
+
+
 QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
 {
     while (s->pending_events) {
diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index 41bc6633bd..67ec4db43e 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -238,17 +238,46 @@ QDict *qtest_qmp_receive_dict(QTestState *s);
  * @s: #QTestState instance to operate on.
  *
  * Reads a QMP message from QEMU and returns the response.
- * Buffers all the events received meanwhile, until a
- * call to qtest_qmp_eventwait
+ *
+ * If a callback is registered with qtest_qmp_set_event_callback,
+ * it will be invoked for every event seen, otherwise events
+ * will be buffered until a call to one of the qtest_qmp_eventwait
+ * family of functions.
  */
 QDict *qtest_qmp_receive(QTestState *s);
 
+/*
+ * QTestQMPEventCallback:
+ * @s: #QTestState instance event was received on
+ * @name: name of the event type
+ * @event: #QDict for the event details
+ * @opaque: opaque data from time of callback registration
+ *
+ * This callback will be invoked whenever an event is received
+ */
+typedef bool (*QTestQMPEventCallback)(QTestState *s, const char *name,
+                                      QDict *event, void *opaque);
+
+/**
+ * qtest_qmp_set_event_callback:
+ * @s: #QTestSTate instance to operate on
+ * @cb: callback to invoke for events
+ * @opaque: data to pass to @cb
+ *
+ * Register a callback to be invoked whenever an event arrives
+ */
+void qtest_qmp_set_event_callback(QTestState *s,
+                                  QTestQMPEventCallback cb, void *opaque);
+
 /**
  * qtest_qmp_eventwait:
  * @s: #QTestState instance to operate on.
  * @event: event to wait for.
  *
  * Continuously polls for QMP responses until it receives the desired event.
+ *
+ * Any callback registered with qtest_qmp_set_event_callback will
+ * be invoked for every event seen.
  */
 void qtest_qmp_eventwait(QTestState *s, const char *event);
 
@@ -258,6 +287,10 @@ void qtest_qmp_eventwait(QTestState *s, const char *event);
  * @event: event to wait for.
  *
  * Continuously polls for QMP responses until it receives the desired event.
+ *
+ * Any callback registered with qtest_qmp_set_event_callback will
+ * be invoked for every event seen.
+ *
  * Returns a copy of the event for further investigation.
  */
 QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
  2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:26   ` Thomas Huth
  2023-06-01 12:17   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

This function duplicates logic of qtest_qmp_assert_success_ref

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 22 ----------------------
 tests/qtest/migration-helpers.h |  3 ---
 tests/qtest/migration-test.c    | 29 +++++++++++++++--------------
 3 files changed, 15 insertions(+), 39 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index f6f3c6680f..bddf3f8d4d 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -85,28 +85,6 @@ QDict *wait_command(QTestState *who, const char *command, ...)
     return ret;
 }
 
-/*
- * Execute the qmp command only
- */
-QDict *qmp_command(QTestState *who, const char *command, ...)
-{
-    va_list ap;
-    QDict *resp, *ret;
-
-    va_start(ap, command);
-    resp = qtest_vqmp(who, command, ap);
-    va_end(ap);
-
-    g_assert(!qdict_haskey(resp, "error"));
-    g_assert(qdict_haskey(resp, "return"));
-
-    ret = qdict_get_qdict(resp, "return");
-    qobject_ref(ret);
-    qobject_unref(resp);
-
-    return ret;
-}
-
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index a188b62787..2e51a6e195 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -25,9 +25,6 @@ QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
-G_GNUC_PRINTF(2, 3)
-QDict *qmp_command(QTestState *who, const char *command, ...);
-
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b99b49a314..9ce27f89ec 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2322,32 +2322,33 @@ static void test_multifd_tcp_cancel(void)
 
 static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
 {
-    qobject_unref(qmp_command(who,
-                  "{ 'execute': 'calc-dirty-rate',"
-                  "'arguments': { "
-                  "'calc-time': %" PRIu64 ","
-                  "'mode': 'dirty-ring' }}",
-                  calc_time));
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'calc-dirty-rate',"
+                             "'arguments': { "
+                             "'calc-time': %" PRIu64 ","
+                             "'mode': 'dirty-ring' }}",
+                             calc_time);
 }
 
 static QDict *query_dirty_rate(QTestState *who)
 {
-    return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+    return qtest_qmp_assert_success_ref(who,
+                                        "{ 'execute': 'query-dirty-rate' }");
 }
 
 static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
 {
-    qobject_unref(qmp_command(who,
-                  "{ 'execute': 'set-vcpu-dirty-limit',"
-                  "'arguments': { "
-                  "'dirty-rate': %" PRIu64 " } }",
-                  dirtyrate));
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'set-vcpu-dirty-limit',"
+                             "'arguments': { "
+                             "'dirty-rate': %" PRIu64 " } }",
+                             dirtyrate);
 }
 
 static void cancel_vcpu_dirty_limit(QTestState *who)
 {
-    qobject_unref(qmp_command(who,
-                  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'cancel-vcpu-dirty-limit' }");
 }
 
 static QDict *query_vcpu_dirty_limit(QTestState *who)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage in migration test
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:28   ` Thomas Huth
  2023-06-01 12:10   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

Some of the usage is just a verbose way of re-inventing the
qtest_qmp_assert_success(_ref) methods.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c |  8 ++---
 tests/qtest/migration-test.c    | 52 ++++++++++++---------------------
 2 files changed, 21 insertions(+), 39 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index bddf3f8d4d..e26fdcb132 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -93,7 +93,7 @@ QDict *wait_command(QTestState *who, const char *command, ...)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
 {
     va_list ap;
-    QDict *args, *rsp;
+    QDict *args;
 
     va_start(ap, fmt);
     args = qdict_from_vjsonf_nofail(fmt, ap);
@@ -102,10 +102,8 @@ void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
     g_assert(!qdict_haskey(args, "uri"));
     qdict_put_str(args, "uri", uri);
 
-    rsp = qtest_qmp(who, "{ 'execute': 'migrate', 'arguments': %p}", args);
-
-    g_assert(qdict_haskey(rsp, "return"));
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate', 'arguments': %p}", args);
 }
 
 /*
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9ce27f89ec..822516286d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -359,14 +359,10 @@ static void migrate_check_parameter_int(QTestState *who, const char *parameter,
 static void migrate_set_parameter_int(QTestState *who, const char *parameter,
                                       long long value)
 {
-    QDict *rsp;
-
-    rsp = qtest_qmp(who,
-                    "{ 'execute': 'migrate-set-parameters',"
-                    "'arguments': { %s: %lld } }",
-                    parameter, value);
-    g_assert(qdict_haskey(rsp, "return"));
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-set-parameters',"
+                             "'arguments': { %s: %lld } }",
+                             parameter, value);
     migrate_check_parameter_int(who, parameter, value);
 }
 
@@ -392,14 +388,10 @@ static void migrate_check_parameter_str(QTestState *who, const char *parameter,
 static void migrate_set_parameter_str(QTestState *who, const char *parameter,
                                       const char *value)
 {
-    QDict *rsp;
-
-    rsp = qtest_qmp(who,
-                    "{ 'execute': 'migrate-set-parameters',"
-                    "'arguments': { %s: %s } }",
-                    parameter, value);
-    g_assert(qdict_haskey(rsp, "return"));
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-set-parameters',"
+                             "'arguments': { %s: %s } }",
+                             parameter, value);
     migrate_check_parameter_str(who, parameter, value);
 }
 
@@ -427,14 +419,10 @@ static void migrate_check_parameter_bool(QTestState *who, const char *parameter,
 static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
                                       int value)
 {
-    QDict *rsp;
-
-    rsp = qtest_qmp(who,
-                    "{ 'execute': 'migrate-set-parameters',"
-                    "'arguments': { %s: %i } }",
-                    parameter, value);
-    g_assert(qdict_haskey(rsp, "return"));
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-set-parameters',"
+                             "'arguments': { %s: %i } }",
+                             parameter, value);
     migrate_check_parameter_bool(who, parameter, value);
 }
 
@@ -494,16 +482,12 @@ static void migrate_cancel(QTestState *who)
 static void migrate_set_capability(QTestState *who, const char *capability,
                                    bool value)
 {
-    QDict *rsp;
-
-    rsp = qtest_qmp(who,
-                    "{ 'execute': 'migrate-set-capabilities',"
-                    "'arguments': { "
-                    "'capabilities': [ { "
-                    "'capability': %s, 'state': %i } ] } }",
-                    capability, value);
-    g_assert(qdict_haskey(rsp, "return"));
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-set-capabilities',"
+                             "'arguments': { "
+                             "'capabilities': [ { "
+                             "'capability': %s, 'state': %i } ] } }",
+                             capability, value);
 }
 
 static void migrate_postcopy_start(QTestState *from, QTestState *to)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:31   ` Thomas Huth
  2023-06-01 12:23   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

Change the migration test to use the new qtest event callback to watch
for the stop event. This ensures that we only watch for the STOP event
on the source QEMU. The previous code would set the single 'got_stop'
flag when either source or dest QEMU got the STOP event.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 18 ++++++++----------
 tests/qtest/migration-helpers.h |  3 ++-
 tests/qtest/migration-test.c    |  4 ++++
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e26fdcb132..936a27a944 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -23,15 +23,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool got_stop;
-
-static void check_stop_event(QTestState *who)
+bool migrate_watch_for_stop(QTestState *who, const char *name,
+                            QDict *event, void *opaque)
 {
-    QDict *event = qtest_qmp_event_ref(who, "STOP");
-    if (event) {
-        got_stop = true;
-        qobject_unref(event);
+    bool *seen = opaque;
+
+    if (g_str_equal(name, "STOP")) {
+        *seen = true;
     }
+
+    return false;
 }
 
 #ifndef _WIN32
@@ -48,7 +49,6 @@ QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...)
     va_end(ap);
 
     resp = qtest_qmp_receive(who);
-    check_stop_event(who);
 
     g_assert(!qdict_haskey(resp, "error"));
     g_assert(qdict_haskey(resp, "return"));
@@ -73,8 +73,6 @@ QDict *wait_command(QTestState *who, const char *command, ...)
     resp = qtest_vqmp(who, command, ap);
     va_end(ap);
 
-    check_stop_event(who);
-
     g_assert(!qdict_haskey(resp, "error"));
     g_assert(qdict_haskey(resp, "return"));
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 2e51a6e195..fa69d1780a 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,7 +15,8 @@
 
 #include "libqtest.h"
 
-extern bool got_stop;
+bool migrate_watch_for_stop(QTestState *who, const char *name,
+                            QDict *event, void *opaque);
 
 #ifndef _WIN32
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 822516286d..0af72c37c2 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,6 +43,7 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
+static bool got_stop;
 
 /*
  * Dirtylimit stop working if dirty page rate error
@@ -703,6 +704,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
                                  ignore_stderr);
     if (!args->only_target) {
         *from = qtest_init(cmd_source);
+        qtest_qmp_set_event_callback(*from,
+                                     migrate_watch_for_stop,
+                                     &got_stop);
     }
 
     cmd_target = g_strdup_printf("-accel kvm%s -accel tcg%s%s "
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:37   ` Thomas Huth
  2023-06-01 12:27   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

Most usage of wait_command() is followed by qobject_unref(), which
is just a verbose re-implementation of qtest_qmp_assert_success().

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c |  53 +---------
 tests/qtest/migration-helpers.h |   8 --
 tests/qtest/migration-test.c    | 170 +++++++++++++-------------------
 3 files changed, 74 insertions(+), 157 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 936a27a944..884d8a2e07 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -35,54 +35,6 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
     return false;
 }
 
-#ifndef _WIN32
-/*
- * Events can get in the way of responses we are actually waiting for.
- */
-QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...)
-{
-    va_list ap;
-    QDict *resp, *ret;
-
-    va_start(ap, command);
-    qtest_qmp_vsend_fds(who, &fd, 1, command, ap);
-    va_end(ap);
-
-    resp = qtest_qmp_receive(who);
-
-    g_assert(!qdict_haskey(resp, "error"));
-    g_assert(qdict_haskey(resp, "return"));
-
-    ret = qdict_get_qdict(resp, "return");
-    qobject_ref(ret);
-    qobject_unref(resp);
-
-    return ret;
-}
-#endif
-
-/*
- * Events can get in the way of responses we are actually waiting for.
- */
-QDict *wait_command(QTestState *who, const char *command, ...)
-{
-    va_list ap;
-    QDict *resp, *ret;
-
-    va_start(ap, command);
-    resp = qtest_vqmp(who, command, ap);
-    va_end(ap);
-
-    g_assert(!qdict_haskey(resp, "error"));
-    g_assert(qdict_haskey(resp, "return"));
-
-    ret = qdict_get_qdict(resp, "return");
-    qobject_ref(ret);
-    qobject_unref(resp);
-
-    return ret;
-}
-
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
@@ -110,7 +62,7 @@ void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
  */
 QDict *migrate_query(QTestState *who)
 {
-    return wait_command(who, "{ 'execute': 'query-migrate' }");
+    return qtest_qmp_assert_success_ref(who, "{ 'execute': 'query-migrate' }");
 }
 
 QDict *migrate_query_not_failed(QTestState *who)
@@ -208,7 +160,8 @@ void wait_for_migration_fail(QTestState *from, bool allow_active)
     } while (!failed);
 
     /* Is the machine currently running? */
-    rsp_return = wait_command(from, "{ 'execute': 'query-status' }");
+    rsp_return = qtest_qmp_assert_success_ref(from,
+                                              "{ 'execute': 'query-status' }");
     g_assert(qdict_haskey(rsp_return, "running"));
     g_assert(qdict_get_bool(rsp_return, "running"));
     qobject_unref(rsp_return);
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index fa69d1780a..aab0745cfe 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,14 +18,6 @@
 bool migrate_watch_for_stop(QTestState *who, const char *name,
                             QDict *event, void *opaque);
 
-#ifndef _WIN32
-G_GNUC_PRINTF(3, 4)
-QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...);
-#endif
-
-G_GNUC_PRINTF(2, 3)
-QDict *wait_command(QTestState *who, const char *command, ...);
-
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0af72c37c2..d8b4282abc 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -342,7 +342,8 @@ static long long migrate_get_parameter_int(QTestState *who,
     QDict *rsp;
     long long result;
 
-    rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+    rsp = qtest_qmp_assert_success_ref(
+        who, "{ 'execute': 'query-migrate-parameters' }");
     result = qdict_get_int(rsp, parameter);
     qobject_unref(rsp);
     return result;
@@ -373,7 +374,8 @@ static char *migrate_get_parameter_str(QTestState *who,
     QDict *rsp;
     char *result;
 
-    rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+    rsp = qtest_qmp_assert_success_ref(
+        who, "{ 'execute': 'query-migrate-parameters' }");
     result = g_strdup(qdict_get_str(rsp, parameter));
     qobject_unref(rsp);
     return result;
@@ -402,7 +404,8 @@ static long long migrate_get_parameter_bool(QTestState *who,
     QDict *rsp;
     int result;
 
-    rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+    rsp = qtest_qmp_assert_success_ref(
+        who, "{ 'execute': 'query-migrate-parameters' }");
     result = qdict_get_bool(rsp, parameter);
     qobject_unref(rsp);
     return !!result;
@@ -443,41 +446,29 @@ static void migrate_ensure_converge(QTestState *who)
 
 static void migrate_pause(QTestState *who)
 {
-    QDict *rsp;
-
-    rsp = wait_command(who, "{ 'execute': 'migrate-pause' }");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who, "{ 'execute': 'migrate-pause' }");
 }
 
 static void migrate_continue(QTestState *who, const char *state)
 {
-    QDict *rsp;
-
-    rsp = wait_command(who,
-                       "{ 'execute': 'migrate-continue',"
-                       "  'arguments': { 'state': %s } }",
-                       state);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-continue',"
+                             "  'arguments': { 'state': %s } }",
+                             state);
 }
 
 static void migrate_recover(QTestState *who, const char *uri)
 {
-    QDict *rsp;
-
-    rsp = wait_command(who,
-                       "{ 'execute': 'migrate-recover', "
-                       "  'id': 'recover-cmd', "
-                       "  'arguments': { 'uri': %s } }",
-                       uri);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-recover', "
+                             "  'id': 'recover-cmd', "
+                             "  'arguments': { 'uri': %s } }",
+                             uri);
 }
 
 static void migrate_cancel(QTestState *who)
 {
-    QDict *rsp;
-
-    rsp = wait_command(who, "{ 'execute': 'migrate_cancel' }");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(who, "{ 'execute': 'migrate_cancel' }");
 }
 
 static void migrate_set_capability(QTestState *who, const char *capability,
@@ -493,10 +484,7 @@ static void migrate_set_capability(QTestState *who, const char *capability,
 
 static void migrate_postcopy_start(QTestState *from, QTestState *to)
 {
-    QDict *rsp;
-
-    rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -785,7 +773,6 @@ test_migrate_tls_psk_start_common(QTestState *from,
 {
     struct TestMigrateTLSPSKData *data =
         g_new0(struct TestMigrateTLSPSKData, 1);
-    QDict *rsp;
 
     data->workdir = g_strdup_printf("%s/tlscredspsk0", tmpfs);
     data->pskfile = g_strdup_printf("%s/%s", data->workdir,
@@ -801,24 +788,22 @@ test_migrate_tls_psk_start_common(QTestState *from,
         test_tls_psk_init_alt(data->pskfilealt);
     }
 
-    rsp = wait_command(from,
-                       "{ 'execute': 'object-add',"
-                       "  'arguments': { 'qom-type': 'tls-creds-psk',"
-                       "                 'id': 'tlscredspsk0',"
-                       "                 'endpoint': 'client',"
-                       "                 'dir': %s,"
-                       "                 'username': 'qemu'} }",
-                       data->workdir);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(from,
+                             "{ 'execute': 'object-add',"
+                             "  'arguments': { 'qom-type': 'tls-creds-psk',"
+                             "                 'id': 'tlscredspsk0',"
+                             "                 'endpoint': 'client',"
+                             "                 'dir': %s,"
+                             "                 'username': 'qemu'} }",
+                             data->workdir);
 
-    rsp = wait_command(to,
-                       "{ 'execute': 'object-add',"
-                       "  'arguments': { 'qom-type': 'tls-creds-psk',"
-                       "                 'id': 'tlscredspsk0',"
-                       "                 'endpoint': 'server',"
-                       "                 'dir': %s } }",
-                       mismatch ? data->workdiralt : data->workdir);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to,
+                             "{ 'execute': 'object-add',"
+                             "  'arguments': { 'qom-type': 'tls-creds-psk',"
+                             "                 'id': 'tlscredspsk0',"
+                             "                 'endpoint': 'server',"
+                             "                 'dir': %s } }",
+                             mismatch ? data->workdiralt : data->workdir);
 
     migrate_set_parameter_str(from, "tls-creds", "tlscredspsk0");
     migrate_set_parameter_str(to, "tls-creds", "tlscredspsk0");
@@ -889,7 +874,6 @@ test_migrate_tls_x509_start_common(QTestState *from,
                                    TestMigrateTLSX509 *args)
 {
     TestMigrateTLSX509Data *data = g_new0(TestMigrateTLSX509Data, 1);
-    QDict *rsp;
 
     data->workdir = g_strdup_printf("%s/tlscredsx5090", tmpfs);
     data->keyfile = g_strdup_printf("%s/key.pem", data->workdir);
@@ -932,40 +916,38 @@ test_migrate_tls_x509_start_common(QTestState *from,
                                args->certhostname,
                                args->certipaddr);
 
-    rsp = wait_command(from,
-                       "{ 'execute': 'object-add',"
-                       "  'arguments': { 'qom-type': 'tls-creds-x509',"
-                       "                 'id': 'tlscredsx509client0',"
-                       "                 'endpoint': 'client',"
-                       "                 'dir': %s,"
-                       "                 'sanity-check': true,"
-                       "                 'verify-peer': true} }",
-                       data->workdir);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(from,
+                             "{ 'execute': 'object-add',"
+                             "  'arguments': { 'qom-type': 'tls-creds-x509',"
+                             "                 'id': 'tlscredsx509client0',"
+                             "                 'endpoint': 'client',"
+                             "                 'dir': %s,"
+                             "                 'sanity-check': true,"
+                             "                 'verify-peer': true} }",
+                             data->workdir);
     migrate_set_parameter_str(from, "tls-creds", "tlscredsx509client0");
     if (args->certhostname) {
         migrate_set_parameter_str(from, "tls-hostname", args->certhostname);
     }
 
-    rsp = wait_command(to,
-                       "{ 'execute': 'object-add',"
-                       "  'arguments': { 'qom-type': 'tls-creds-x509',"
-                       "                 'id': 'tlscredsx509server0',"
-                       "                 'endpoint': 'server',"
-                       "                 'dir': %s,"
-                       "                 'sanity-check': true,"
-                       "                 'verify-peer': %i} }",
-                       data->workdir, args->verifyclient);
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to,
+                             "{ 'execute': 'object-add',"
+                             "  'arguments': { 'qom-type': 'tls-creds-x509',"
+                             "                 'id': 'tlscredsx509server0',"
+                             "                 'endpoint': 'server',"
+                             "                 'dir': %s,"
+                             "                 'sanity-check': true,"
+                             "                 'verify-peer': %i} }",
+                             data->workdir, args->verifyclient);
     migrate_set_parameter_str(to, "tls-creds", "tlscredsx509server0");
 
     if (args->authzclient) {
-        rsp = wait_command(to,
-                           "{ 'execute': 'object-add',"
-                           "  'arguments': { 'qom-type': 'authz-simple',"
-                           "                 'id': 'tlsauthz0',"
-                           "                 'identity': %s} }",
-                           "CN=" QCRYPTO_TLS_TEST_CLIENT_NAME);
+        qtest_qmp_assert_success(to,
+                                 "{ 'execute': 'object-add',"
+                                 "  'arguments': { 'qom-type': 'authz-simple',"
+                                 "                 'id': 'tlsauthz0',"
+                                 "                 'identity': %s} }",
+                                 "CN=" QCRYPTO_TLS_TEST_CLIENT_NAME);
         migrate_set_parameter_str(to, "tls-authz", "tlsauthz0");
     }
 
@@ -1759,7 +1741,6 @@ static void test_precopy_tcp_tls_x509_reject_anon_client(void)
 static void *test_migrate_fd_start_hook(QTestState *from,
                                         QTestState *to)
 {
-    QDict *rsp;
     int ret;
     int pair[2];
 
@@ -1768,22 +1749,19 @@ static void *test_migrate_fd_start_hook(QTestState *from,
     g_assert_cmpint(ret, ==, 0);
 
     /* Send the 1st socket to the target */
-    rsp = wait_command_fd(to, pair[0],
-                          "{ 'execute': 'getfd',"
-                          "  'arguments': { 'fdname': 'fd-mig' }}");
-    qobject_unref(rsp);
+    qtest_qmp_fds_assert_success(to, &(pair[0]), 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
     close(pair[0]);
 
     /* Start incoming migration from the 1st socket */
-    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
-                           "  'arguments': { 'uri': 'fd:fd-mig' }}");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
+                             "  'arguments': { 'uri': 'fd:fd-mig' }}");
 
     /* Send the 2nd socket to the target */
-    rsp = wait_command_fd(from, pair[1],
-                          "{ 'execute': 'getfd',"
-                          "  'arguments': { 'fdname': 'fd-mig' }}");
-    qobject_unref(rsp);
+    qtest_qmp_fds_assert_success(from, &(pair[1]), 1,
+                                 "{ 'execute': 'getfd',"
+                                 "  'arguments': { 'fdname': 'fd-mig' }}");
     close(pair[1]);
 
     return NULL;
@@ -1990,8 +1968,6 @@ test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
                                               QTestState *to,
                                               const char *method)
 {
-    QDict *rsp;
-
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
 
@@ -2002,9 +1978,8 @@ test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
-                           "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
+                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
 
     return NULL;
 }
@@ -2235,7 +2210,6 @@ static void test_multifd_tcp_cancel(void)
         .hide_stderr = true,
     };
     QTestState *from, *to, *to2;
-    QDict *rsp;
     g_autofree char *uri = NULL;
 
     if (test_migrate_start(&from, &to, "defer", &args)) {
@@ -2251,9 +2225,8 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
-                           "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
+                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -2283,9 +2256,8 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to2, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    rsp = wait_command(to2, "{ 'execute': 'migrate-incoming',"
-                            "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
-    qobject_unref(rsp);
+    qtest_qmp_assert_success(to2, "{ 'execute': 'migrate-incoming',"
+                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
 
     g_free(uri);
     uri = migrate_get_socket_address(to2, "socket-address");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (5 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:38   ` Thomas Huth
  2023-06-01 12:31   ` Juan Quintela
  2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
  2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

When running migration tests we monitor for a STOP event so we can skip
redundant waits. This will be needed for the RESUME event too shortly.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 12 ++++++++++++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c    |  5 +++++
 3 files changed, 19 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 884d8a2e07..d50b565967 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -35,6 +35,18 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
     return false;
 }
 
+bool migrate_watch_for_resume(QTestState *who, const char *name,
+                              QDict *event, void *opaque)
+{
+    bool *seen = opaque;
+
+    if (g_str_equal(name, "RESUME")) {
+        *seen = true;
+    }
+
+    return false;
+}
+
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index aab0745cfe..009e250e90 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -17,6 +17,8 @@
 
 bool migrate_watch_for_stop(QTestState *who, const char *name,
                             QDict *event, void *opaque);
+bool migrate_watch_for_resume(QTestState *who, const char *name,
+                              QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d8b4282abc..aee25e1c4f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -44,6 +44,7 @@ unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
 static bool got_stop;
+static bool got_resume;
 
 /*
  * Dirtylimit stop working if dirty page rate error
@@ -607,6 +608,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
     }
 
     got_stop = false;
+    got_resume = false;
     bootpath = g_strdup_printf("%s/bootsect", tmpfs);
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         /* the assembled x86 boot sector should be exactly one sector large */
@@ -712,6 +714,9 @@ static int test_migrate_start(QTestState **from, QTestState **to,
                                  args->opts_target ? args->opts_target : "",
                                  ignore_stderr);
     *to = qtest_init(cmd_target);
+    qtest_qmp_set_event_callback(*to,
+                                 migrate_watch_for_resume,
+                                 &got_resume);
 
     /*
      * Remove shmem file immediately to avoid memory leak in test failed case.
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (6 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
@ 2023-05-31 13:23 ` Daniel P. Berrangé
  2023-06-01  9:47   ` Thomas Huth
                     ` (2 more replies)
  2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
  8 siblings, 3 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

There are 27 pre-copy live migration scenarios being tested. In all of
these we force non-convergance and run for one iteration, then let it
converge and wait for completion during the second (or following)
iterations. At 3 mbps bandwidth limit the first iteration takes a very
long time (~30 seconds).

While it is important to test the migration passes and convergance
logic, it is overkill to do this for all 27 pre-copy scenarios. The
TLS migration scenarios in particular are merely exercising different
code paths during connection establishment.

To optimize time taken, switch most of the test scenarios to run
non-live (ie guest CPUs paused) with no bandwidth limits. This gives
a massive speed up for most of the test scenarios.

For test coverage the following scenarios are unchanged

 * Precopy with UNIX sockets
 * Precopy with UNIX sockets and dirty ring tracking
 * Precopy with XBZRLE
 * Precopy with UNIX compress
 * Precopy with UNIX compress (nowait)
 * Precopy with multifd

On a test machine this reduces execution time from 13 minutes to
8 minutes.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-test.c | 81 +++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index aee25e1c4f..e033d9f0c1 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -577,9 +577,12 @@ typedef struct {
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
 
-    /* Optional: set number of migration passes to wait for */
+    /* Optional: set number of migration passes to wait for, if live==true */
     unsigned int iterations;
 
+    /* Optional: whether the guest CPUs should be running during migration */
+    bool live;
+
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
@@ -1385,8 +1388,6 @@ static void test_precopy_common(MigrateCommon *args)
         return;
     }
 
-    migrate_ensure_non_converge(from);
-
     if (args->start_hook) {
         data_hook = args->start_hook(from, to);
     }
@@ -1396,6 +1397,31 @@ static void test_precopy_common(MigrateCommon *args)
         wait_for_serial("src_serial");
     }
 
+    if (args->live) {
+        /*
+         * Testing live migration, we want to ensure that some
+         * memory is re-dirtied after being transferred, so that
+         * we exercise logic for dirty page handling. We achieve
+         * this with a ridiculosly low bandwidth that guarantees
+         * non-convergance.
+         */
+        migrate_ensure_non_converge(from);
+    } else {
+        /*
+         * Testing non-live migration, we allow it to run at
+         * full speed to ensure short test case duration.
+         * For tests expected to fail, we don't need to
+         * change anything.
+         */
+        if (args->result == MIG_TEST_SUCCEED) {
+            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
+            if (!got_stop) {
+                qtest_qmp_eventwait(from, "STOP");
+            }
+            migrate_ensure_converge(from);
+        }
+    }
+
     if (!args->connect_uri) {
         g_autofree char *local_connect_uri =
             migrate_get_socket_address(to, "socket-address");
@@ -1413,25 +1439,41 @@ static void test_precopy_common(MigrateCommon *args)
             qtest_set_expected_status(to, EXIT_FAILURE);
         }
     } else {
-        if (args->iterations) {
-            while (args->iterations--) {
+        if (args->live) {
+            if (args->iterations) {
+                while (args->iterations--) {
+                    wait_for_migration_pass(from);
+                }
+            } else {
                 wait_for_migration_pass(from);
             }
-        } else {
-            wait_for_migration_pass(from);
-        }
 
-        migrate_ensure_converge(from);
+            migrate_ensure_converge(from);
 
-        /* We do this first, as it has a timeout to stop us
-         * hanging forever if migration didn't converge */
-        wait_for_migration_complete(from);
+            /*
+             * We do this first, as it has a timeout to stop us
+             * hanging forever if migration didn't converge
+             */
+            wait_for_migration_complete(from);
 
-        if (!got_stop) {
-            qtest_qmp_eventwait(from, "STOP");
+            if (!got_stop) {
+                qtest_qmp_eventwait(from, "STOP");
+            }
+        } else {
+            wait_for_migration_complete(from);
+            /*
+             * Must wait for dst to finish reading all incoming
+             * data on the socket before issuing 'cont' otherwise
+             * it'll be ignored
+             */
+            wait_for_migration_complete(to);
+
+            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
         }
 
-        qtest_qmp_eventwait(to, "RESUME");
+        if (!got_resume) {
+            qtest_qmp_eventwait(to, "RESUME");
+        }
 
         wait_for_serial("dest_serial");
     }
@@ -1449,6 +1491,8 @@ static void test_precopy_unix_plain(void)
     MigrateCommon args = {
         .listen_uri = uri,
         .connect_uri = uri,
+
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1464,6 +1508,8 @@ static void test_precopy_unix_dirty_ring(void)
         },
         .listen_uri = uri,
         .connect_uri = uri,
+
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1575,6 +1621,7 @@ static void test_precopy_unix_xbzrle(void)
         .start_hook = test_migrate_xbzrle_start,
 
         .iterations = 2,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1592,6 +1639,7 @@ static void test_precopy_unix_compress(void)
          * the previous iteration.
          */
         .iterations = 2,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1609,6 +1657,7 @@ static void test_precopy_unix_compress_nowait(void)
          * the previous iteration.
          */
         .iterations = 2,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -2017,6 +2066,8 @@ static void test_multifd_tcp_none(void)
     MigrateCommon args = {
         .listen_uri = "defer",
         .start_hook = test_migrate_precopy_tcp_multifd_start,
+
+        .live = true,
     };
     test_precopy_common(&args);
 }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (7 preceding siblings ...)
  2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
@ 2023-05-31 13:24 ` Daniel P. Berrangé
  2023-06-01 10:04   ` Thomas Huth
  2023-06-01 15:46   ` Peter Xu
  8 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 13:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini, Daniel P. Berrangé

The migration test cases that actually exercise live migration want to
ensure there is a minimum of two iterations of pre-copy, in order to
exercise the dirty tracking code.

Historically we've queried the migration status, looking for the
'dirty-sync-count' value to increment to track iterations. This was
not entirely reliable because often all the data would get transferred
quickly enough that the migration would finish before we wanted it
to. So we massively dropped the bandwidth and max downtime to
guarantee non-convergance. This had the unfortunate side effect
that every migration took at least 30 seconds to run (100 MB of
dirty pages / 3 MB/sec).

This optimization takes a different approach to ensuring that a
mimimum of two iterations. Rather than waiting for dirty-sync-count
to increment, directly look for an indication that the source VM
has dirtied RAM that has already been transferred.

On the source VM a magic marker is written just after the 3 MB
offset. The destination VM is now montiored to detect when the
magic marker is transferred. This gives a guarantee that the
first 3 MB of memory have been transferred. Now the source VM
memory is monitored at exactly the 3MB offset until we observe
a flip in its value. This gives us a guaranteed that the guest
workload has dirtied a byte that has already been transferred.

Since we're looking at a place that is only 3 MB from the start
of memory, with the 3 MB/sec bandwidth, this test should complete
in 1 second, instead of 30 seconds.

Once we've proved there is some dirty memory, migration can be
set back to full speed for the remainder of the 1st iteration,
and the entire of the second iteration at which point migration
should be complete.

On a test machine this further reduces the migration test time
from 8 minutes to 1 minute 40.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-test.c | 143 ++++++++++++++++++++++++++++++-----
 1 file changed, 125 insertions(+), 18 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e033d9f0c1..6ea9011423 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -46,6 +46,20 @@ static bool uffd_feature_thread_id;
 static bool got_stop;
 static bool got_resume;
 
+/*
+ * An initial 3 MB offset is used as that corresponds
+ * to ~1 sec of data transfer with our bandwidth setting.
+ */
+#define MAGIC_OFFSET_BASE (3 * 1024 * 1024)
+/*
+ * A further 1k is added to ensure we're not a multiple
+ * of TEST_MEM_PAGE_SIZE, thus avoid clash with writes
+ * from the migration guest workload.
+ */
+#define MAGIC_OFFSET_SHUFFLE 1024
+#define MAGIC_OFFSET (MAGIC_OFFSET_BASE + MAGIC_OFFSET_SHUFFLE)
+#define MAGIC_MARKER 0xFEED12345678CAFEULL
+
 /*
  * Dirtylimit stop working if dirty page rate error
  * value less than DIRTYLIMIT_TOLERANCE_RANGE
@@ -445,6 +459,91 @@ static void migrate_ensure_converge(QTestState *who)
     migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
 }
 
+/*
+ * Our goal is to ensure that we run a single full migration
+ * iteration, and also dirty memory, ensuring that at least
+ * one further iteration is required.
+ *
+ * We can't directly synchronize with the start of a migration
+ * so we have to apply some tricks monitoring memory that is
+ * transferred.
+ *
+ * Initially we set the migration bandwidth to an insanely
+ * low value, with tiny max downtime too. This basically
+ * guarantees migration will never complete.
+ *
+ * This will result in a test that is unacceptably slow though,
+ * so we can't let the entire migration pass run at this speed.
+ * Our intent is to let it run just long enough that we can
+ * prove data prior to the marker has been transferred *AND*
+ * also prove this transferred data is dirty again.
+ *
+ * Before migration starts, we write a 64-bit magic marker
+ * into a fixed location in the src VM RAM.
+ *
+ * Then watch dst memory until the marker appears. This is
+ * proof that start_address -> MAGIC_OFFSET_BASE has been
+ * transferred.
+ *
+ * Finally we go back to the source and read a byte just
+ * before the marker untill we see it flip in value. This
+ * is proof that start_address -> MAGIC_OFFSET_BASE
+ * is now dirty again.
+ *
+ * IOW, we're guaranteed at least a 2nd migration pass
+ * at this point.
+ *
+ * We can now let migration run at full speed to finish
+ * the test
+ */
+static void migrate_prepare_for_dirty_mem(QTestState *from)
+{
+    /*
+     * The guest workflow iterates from start_address to
+     * end_address, writing 1 byte every TEST_MEM_PAGE_SIZE
+     * bytes.
+     *
+     * IOW, if we write to mem at a point which is NOT
+     * a multiple of TEST_MEM_PAGE_SIZE, our write won't
+     * conflict with the migration workflow.
+     *
+     * We put in a marker here, that we'll use to determine
+     * when the data has been transferred to the dst.
+     */
+    qtest_writeq(from, start_address + MAGIC_OFFSET, MAGIC_MARKER);
+}
+
+static void migrate_wait_for_dirty_mem(QTestState *from,
+                                       QTestState *to)
+{
+    uint64_t watch_address = start_address + MAGIC_OFFSET_BASE;
+    uint64_t marker_address = start_address + MAGIC_OFFSET;
+    uint8_t watch_byte;
+
+    /*
+     * Wait for the MAGIC_MARKER to get transferred, as an
+     * indicator that a migration pass has made some known
+     * amount of progress.
+     */
+    do {
+        usleep(1000 * 10);
+    } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
+
+    /*
+     * Now ensure that already transferred bytes are
+     * dirty again from the guest workload. Note the
+     * guest byte value will wrap around and by chance
+     * match the original watch_byte. This is harmless
+     * as we'll eventually see a different value if we
+     * keep watching
+     */
+    watch_byte = qtest_readb(from, watch_address);
+    do {
+        usleep(1000 * 10);
+    } while (qtest_readb(from, watch_address) == watch_byte);
+}
+
+
 static void migrate_pause(QTestState *who)
 {
     qtest_qmp_assert_success(who, "{ 'execute': 'migrate-pause' }");
@@ -577,7 +676,10 @@ typedef struct {
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
 
-    /* Optional: set number of migration passes to wait for, if live==true */
+    /*
+     * Optional: set number of migration passes to wait for, if live==true.
+     * If zero, then merely wait for a few MB of dirty data
+     */
     unsigned int iterations;
 
     /* Optional: whether the guest CPUs should be running during migration */
@@ -1158,12 +1260,14 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
     migrate_ensure_non_converge(from);
 
+    migrate_prepare_for_dirty_mem(from);
+
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     *from_ptr = from;
     *to_ptr = to;
@@ -1398,14 +1502,8 @@ static void test_precopy_common(MigrateCommon *args)
     }
 
     if (args->live) {
-        /*
-         * Testing live migration, we want to ensure that some
-         * memory is re-dirtied after being transferred, so that
-         * we exercise logic for dirty page handling. We achieve
-         * this with a ridiculosly low bandwidth that guarantees
-         * non-convergance.
-         */
         migrate_ensure_non_converge(from);
+        migrate_prepare_for_dirty_mem(from);
     } else {
         /*
          * Testing non-live migration, we allow it to run at
@@ -1440,13 +1538,16 @@ static void test_precopy_common(MigrateCommon *args)
         }
     } else {
         if (args->live) {
-            if (args->iterations) {
-                while (args->iterations--) {
-                    wait_for_migration_pass(from);
-                }
-            } else {
+            /*
+             * For initial iteration(s) we must do a full pass,
+             * but for the final iteration, we need only wait
+             * for some dirty mem before switching to converge
+             */
+            while (args->iterations > 1) {
                 wait_for_migration_pass(from);
+                args->iterations--;
             }
+            migrate_wait_for_dirty_mem(from, to);
 
             migrate_ensure_converge(from);
 
@@ -1573,6 +1674,9 @@ static void test_ignore_shared(void)
         return;
     }
 
+    migrate_ensure_non_converge(from);
+    migrate_prepare_for_dirty_mem(from);
+
     migrate_set_capability(from, "x-ignore-shared", true);
     migrate_set_capability(to, "x-ignore-shared", true);
 
@@ -1581,7 +1685,7 @@ static void test_ignore_shared(void)
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -2273,6 +2377,7 @@ static void test_multifd_tcp_cancel(void)
     }
 
     migrate_ensure_non_converge(from);
+    migrate_prepare_for_dirty_mem(from);
 
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
@@ -2291,7 +2396,7 @@ static void test_multifd_tcp_cancel(void)
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     migrate_cancel(from);
 
@@ -2320,11 +2425,13 @@ static void test_multifd_tcp_cancel(void)
 
     wait_for_migration_status(from, "cancelled", NULL);
 
-    migrate_ensure_converge(from);
+    migrate_ensure_non_converge(from);
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
+
+    migrate_ensure_converge(from);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events
  2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
@ 2023-05-31 14:57   ` Thomas Huth
  2023-06-01 12:14   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-05-31 14:57 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> Currently code must call one of the qtest_qmp_event* functions to
> fetch events. These are only usable if the immediate caller knows
> the particular event they want to capture, and are only interested
> in one specific event type. Adding ability to register an event
> callback lets the caller capture a range of events over any period
> of time.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/libqtest.c | 20 ++++++++++++++++++--
>   tests/qtest/libqtest.h | 37 +++++++++++++++++++++++++++++++++++--
>   2 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 603c26d955..1534177fc1 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -82,6 +82,8 @@ struct QTestState
>       GString *rx;
>       QTestTransportOps ops;
>       GList *pending_events;
> +    QTestQMPEventCallback eventCB;
> +    void *eventData;
>   };
>   
>   static GHookList abrt_hooks;
> @@ -703,8 +705,14 @@ QDict *qtest_qmp_receive(QTestState *s)
>           if (!qdict_get_try_str(response, "event")) {
>               return response;
>           }
> -        /* Stash the event for a later consumption */
> -        s->pending_events = g_list_append(s->pending_events, response);
> +
> +        if (s->eventCB) {
> +            s->eventCB(s, qdict_get_str(response, "event"),
> +                       response, s->eventData);
> +        } else {
> +            /* Stash the event for a later consumption */
> +            s->pending_events = g_list_append(s->pending_events, response);
> +        }
>       }
>   }
>   
> @@ -808,6 +816,14 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
>       va_end(ap);
>   }
>   
> +void qtest_qmp_set_event_callback(QTestState *s,
> +                                  QTestQMPEventCallback cb, void *opaque)
> +{
> +    s->eventCB = cb;
> +    s->eventData = opaque;
> +}
> +
> +

Nit: Use one empty line only instead of two.

Apart from that:
Reviewed-by: Thomas Huth <thuth@redhat.com>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
@ 2023-06-01  9:23   ` Thomas Huth
  2023-06-01 12:48     ` Daniel P. Berrangé
  2023-06-01 12:04   ` Juan Quintela
  2023-06-01 12:20   ` Juan Quintela
  2 siblings, 1 reply; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:23 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> Add several counterparts of qtest_qmp_assert_success() that can
> 
>   * Use va_list instead of ...
>   * Accept a list of FDs to send
>   * Return the response data
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
>   tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 209 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index c3a0ef5bb4..603c26d955 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -1229,14 +1229,23 @@ void qtest_memset(QTestState *s, uint64_t addr, uint8_t pattern, size_t size)
>       qtest_rsp(s);
>   }
>   
> -void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> +void qtest_vqmp_assert_success(QTestState *qts,
> +                               const char *fmt, va_list args)
>   {
> -    va_list ap;
>       QDict *response;
>   
> -    va_start(ap, fmt);
> -    response = qtest_vqmp(qts, fmt, ap);
> -    va_end(ap);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, args);
> +
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
> +                                     const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    QDict *ret;
> +
> +    response = qtest_vqmp(qts, fmt, args);
>   
>       g_assert(response);
>       if (!qdict_haskey(response, "return")) {
> @@ -1245,8 +1254,88 @@ void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
>           g_string_free(s, true);
>       }
>       g_assert(qdict_haskey(response, "return"));
> +    ret = qdict_get_qdict(response, "return");
> +    qobject_ref(ret);
> +    qobject_unref(response);
> +
> +    return ret;
> +}
> +
> +#ifndef _WIN32
> +QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                         const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    QDict *ret;
> +
> +    response = qtest_vqmp_fds(qts, fds, nfds, fmt, args);
> +
> +    g_assert(response);
> +    if (!qdict_haskey(response, "return")) {
> +        GString *s = qobject_to_json_pretty(QOBJECT(response), true);
> +        g_test_message("%s", s->str);
> +        g_string_free(s, true);
> +    }
> +    g_assert(qdict_haskey(response, "return"));
> +    ret = qdict_get_qdict(response, "return");
> +    qobject_ref(ret);
> +    qobject_unref(response);
> +
> +    return ret;
> +}
> +
> +void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                   const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, args);
>       qobject_unref(response);
>   }

<bikeshedding>The ordering is a little bit inconsistent ... for some pairs, 
you do the _success() function first, and for some others you do the 
_success_ref() function first. IMHO it would be nicer to have the same 
ordering everywhere, preferably with the _success_ref() function first 
(since that's the one that is called from the other).</bikeshedding>

> +#endif /* !_WIN32 */
> +
> +void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);

You could use qtest_vqmp_assert_success() instead, I think, and dro the 
qobject_unref() below.

> +    va_end(ap);
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
> +    va_end(ap);
> +    return response;
> +}
> +
> +#ifndef _WIN32
> +void qtest_qmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                  const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);

dito, could use qtest_vqmp_fds_assert_success() instead.

> +    va_end(ap);
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_qmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                        const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);
> +    va_end(ap);
> +    return response;
> +}
> +#endif /* !_WIN32 */

Just nits, so anyway:
Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test
  2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
@ 2023-06-01  9:26   ` Thomas Huth
  2023-06-01  9:32     ` Daniel P. Berrangé
  2023-06-01 12:17   ` Juan Quintela
  1 sibling, 1 reply; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:26 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> This function duplicates logic of qtest_qmp_assert_success_ref
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c | 22 ----------------------
>   tests/qtest/migration-helpers.h |  3 ---
>   tests/qtest/migration-test.c    | 29 +++++++++++++++--------------
>   3 files changed, 15 insertions(+), 39 deletions(-)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index f6f3c6680f..bddf3f8d4d 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -85,28 +85,6 @@ QDict *wait_command(QTestState *who, const char *command, ...)
>       return ret;
>   }
>   
> -/*
> - * Execute the qmp command only
> - */
> -QDict *qmp_command(QTestState *who, const char *command, ...)
> -{
> -    va_list ap;
> -    QDict *resp, *ret;
> -
> -    va_start(ap, command);
> -    resp = qtest_vqmp(who, command, ap);
> -    va_end(ap);
> -
> -    g_assert(!qdict_haskey(resp, "error"));

What about this g_assert(!qdict_haskey(resp, "error")) ? 
qtest_qmp_assert_success_ref() does not have this assert... do we still need 
it somewhere? If not, please add a comment to the patch description why it 
can be ignored now.

  Thomas




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage in migration test
  2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
@ 2023-06-01  9:28   ` Thomas Huth
  2023-06-01 12:10   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:28 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> Some of the usage is just a verbose way of re-inventing the
> qtest_qmp_assert_success(_ref) methods.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c |  8 ++---
>   tests/qtest/migration-test.c    | 52 ++++++++++++---------------------
>   2 files changed, 21 insertions(+), 39 deletions(-)


Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event
  2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
@ 2023-06-01  9:31   ` Thomas Huth
  2023-06-01 12:23   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:31 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> Change the migration test to use the new qtest event callback to watch
> for the stop event. This ensures that we only watch for the STOP event
> on the source QEMU. The previous code would set the single 'got_stop'
> flag when either source or dest QEMU got the STOP event.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c | 18 ++++++++----------
>   tests/qtest/migration-helpers.h |  3 ++-
>   tests/qtest/migration-test.c    |  4 ++++
>   3 files changed, 14 insertions(+), 11 deletions(-)


Acked-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test
  2023-06-01  9:26   ` Thomas Huth
@ 2023-06-01  9:32     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01  9:32 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Juan Quintela,
	Leonardo Bras, Paolo Bonzini

On Thu, Jun 01, 2023 at 11:26:46AM +0200, Thomas Huth wrote:
> On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> > This function duplicates logic of qtest_qmp_assert_success_ref
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   tests/qtest/migration-helpers.c | 22 ----------------------
> >   tests/qtest/migration-helpers.h |  3 ---
> >   tests/qtest/migration-test.c    | 29 +++++++++++++++--------------
> >   3 files changed, 15 insertions(+), 39 deletions(-)
> > 
> > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> > index f6f3c6680f..bddf3f8d4d 100644
> > --- a/tests/qtest/migration-helpers.c
> > +++ b/tests/qtest/migration-helpers.c
> > @@ -85,28 +85,6 @@ QDict *wait_command(QTestState *who, const char *command, ...)
> >       return ret;
> >   }
> > -/*
> > - * Execute the qmp command only
> > - */
> > -QDict *qmp_command(QTestState *who, const char *command, ...)
> > -{
> > -    va_list ap;
> > -    QDict *resp, *ret;
> > -
> > -    va_start(ap, command);
> > -    resp = qtest_vqmp(who, command, ap);
> > -    va_end(ap);
> > -
> > -    g_assert(!qdict_haskey(resp, "error"));
> 
> What about this g_assert(!qdict_haskey(resp, "error")) ?
> qtest_qmp_assert_success_ref() does not have this assert... do we still need
> it somewhere? If not, please add a comment to the patch description why it
> can be ignored now.

The caller just wants the 'return' field data. If that is missing,
qtest_qmp_assert_success_ref() will issue the diagnostic message
printing the entire QMP resposne, which is way more debuggable
than just asserting on 'error' without printing the error contents

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success
  2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
@ 2023-06-01  9:37   ` Thomas Huth
  2023-06-01 12:27   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:37 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> Most usage of wait_command() is followed by qobject_unref(), which
> is just a verbose re-implementation of qtest_qmp_assert_success().
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c |  53 +---------
>   tests/qtest/migration-helpers.h |   8 --
>   tests/qtest/migration-test.c    | 170 +++++++++++++-------------------
>   3 files changed, 74 insertions(+), 157 deletions(-)
...
> @@ -1759,7 +1741,6 @@ static void test_precopy_tcp_tls_x509_reject_anon_client(void)
>   static void *test_migrate_fd_start_hook(QTestState *from,
>                                           QTestState *to)
>   {
> -    QDict *rsp;
>       int ret;
>       int pair[2];
>   
> @@ -1768,22 +1749,19 @@ static void *test_migrate_fd_start_hook(QTestState *from,
>       g_assert_cmpint(ret, ==, 0);
>   
>       /* Send the 1st socket to the target */
> -    rsp = wait_command_fd(to, pair[0],
> -                          "{ 'execute': 'getfd',"
> -                          "  'arguments': { 'fdname': 'fd-mig' }}");
> -    qobject_unref(rsp);
> +    qtest_qmp_fds_assert_success(to, &(pair[0]), 1,

Matter of taste, but I think I'd prefer &pair[0] without the parentheses.

> +                                 "{ 'execute': 'getfd',"
> +                                 "  'arguments': { 'fdname': 'fd-mig' }}");
>       close(pair[0]);
>   
>       /* Start incoming migration from the 1st socket */
> -    rsp = wait_command(to, "{ 'execute': 'migrate-incoming',"
> -                           "  'arguments': { 'uri': 'fd:fd-mig' }}");
> -    qobject_unref(rsp);
> +    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
> +                             "  'arguments': { 'uri': 'fd:fd-mig' }}");
>   
>       /* Send the 2nd socket to the target */
> -    rsp = wait_command_fd(from, pair[1],
> -                          "{ 'execute': 'getfd',"
> -                          "  'arguments': { 'fdname': 'fd-mig' }}");
> -    qobject_unref(rsp);
> +    qtest_qmp_fds_assert_success(from, &(pair[1]), 1,

&pair[1] ?

Anyway,
Reviewed-by: Thomas Huth <thuth@redhat.com>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
@ 2023-06-01  9:38   ` Thomas Huth
  2023-06-01 12:31   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> When running migration tests we monitor for a STOP event so we can skip
> redundant waits. This will be needed for the RESUME event too shortly.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c | 12 ++++++++++++
>   tests/qtest/migration-helpers.h |  2 ++
>   tests/qtest/migration-test.c    |  5 +++++
>   3 files changed, 19 insertions(+)


Acked-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
@ 2023-06-01  9:47   ` Thomas Huth
  2023-06-01 12:33   ` Juan Quintela
  2023-06-01 15:30   ` Peter Xu
  2 siblings, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01  9:47 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
> 
> While it is important to test the migration passes and convergance

s/convergance/convergence/ (also in the first paragraph)

> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
> 
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
> 
> For test coverage the following scenarios are unchanged
> 
>   * Precopy with UNIX sockets
>   * Precopy with UNIX sockets and dirty ring tracking
>   * Precopy with XBZRLE
>   * Precopy with UNIX compress
>   * Precopy with UNIX compress (nowait)
>   * Precopy with multifd
> 
> On a test machine this reduces execution time from 13 minutes to
> 8 minutes.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-test.c | 81 +++++++++++++++++++++++++++++-------
>   1 file changed, 66 insertions(+), 15 deletions(-)

Tested-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
@ 2023-06-01 10:04   ` Thomas Huth
  2023-06-01 15:46   ` Peter Xu
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-06-01 10:04 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Juan Quintela, Leonardo Bras, Paolo Bonzini

On 31/05/2023 15.24, Daniel P. Berrangé wrote:
> The migration test cases that actually exercise live migration want to
> ensure there is a minimum of two iterations of pre-copy, in order to
> exercise the dirty tracking code.
> 
> Historically we've queried the migration status, looking for the
> 'dirty-sync-count' value to increment to track iterations. This was
> not entirely reliable because often all the data would get transferred
> quickly enough that the migration would finish before we wanted it
> to. So we massively dropped the bandwidth and max downtime to
> guarantee non-convergance. This had the unfortunate side effect

convergence

> that every migration took at least 30 seconds to run (100 MB of
> dirty pages / 3 MB/sec).
> 
> This optimization takes a different approach to ensuring that a
> mimimum of two iterations. Rather than waiting for dirty-sync-count

minimum

> to increment, directly look for an indication that the source VM
> has dirtied RAM that has already been transferred.
> 
> On the source VM a magic marker is written just after the 3 MB
> offset. The destination VM is now montiored to detect when the

monitored

...
> @@ -445,6 +459,91 @@ static void migrate_ensure_converge(QTestState *who)
>       migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
>   }
>   
> +/*
> + * Our goal is to ensure that we run a single full migration
> + * iteration, and also dirty memory, ensuring that at least
> + * one further iteration is required.
> + *
> + * We can't directly synchronize with the start of a migration
> + * so we have to apply some tricks monitoring memory that is
> + * transferred.
> + *
> + * Initially we set the migration bandwidth to an insanely
> + * low value, with tiny max downtime too. This basically
> + * guarantees migration will never complete.
> + *
> + * This will result in a test that is unacceptably slow though,
> + * so we can't let the entire migration pass run at this speed.
> + * Our intent is to let it run just long enough that we can
> + * prove data prior to the marker has been transferred *AND*
> + * also prove this transferred data is dirty again.
> + *
> + * Before migration starts, we write a 64-bit magic marker
> + * into a fixed location in the src VM RAM.
> + *
> + * Then watch dst memory until the marker appears. This is
> + * proof that start_address -> MAGIC_OFFSET_BASE has been
> + * transferred.
> + *
> + * Finally we go back to the source and read a byte just
> + * before the marker untill we see it flip in value. This

until

It's indeed much faster now, thank you very much for tackling this!

Tested-by: Thomas Huth <thuth@redhat.com>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
  2023-06-01  9:23   ` Thomas Huth
@ 2023-06-01 12:04   ` Juan Quintela
  2023-06-01 12:20   ` Juan Quintela
  2 siblings, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:04 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Add several counterparts of qtest_qmp_assert_success() that can
>
>  * Use va_list instead of ...
>  * Accept a list of FDs to send
>  * Return the response data
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
>  tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 209 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index c3a0ef5bb4..603c26d955 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -1229,14 +1229,23 @@ void qtest_memset(QTestState *s, uint64_t addr, uint8_t pattern, size_t size)
>      qtest_rsp(s);
>  }
>  
> -void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> +void qtest_vqmp_assert_success(QTestState *qts,
> +                               const char *fmt, va_list args)
>  {
> -    va_list ap;
>      QDict *response;
>  
> -    va_start(ap, fmt);
> -    response = qtest_vqmp(qts, fmt, ap);
> -    va_end(ap);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, args);
> +
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
> +                                     const char *fmt, va_list args)
> +{

I was about to chime in that this two functions should be declared in
reverse and see that Thomas had the same ide.

Later, Juan



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage in migration test
  2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
  2023-06-01  9:28   ` Thomas Huth
@ 2023-06-01 12:10   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:10 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Some of the usage is just a verbose way of re-inventing the
> qtest_qmp_assert_success(_ref) methods.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events
  2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
  2023-05-31 14:57   ` Thomas Huth
@ 2023-06-01 12:14   ` Juan Quintela
  2023-06-01 12:56     ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:14 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Currently code must call one of the qtest_qmp_event* functions to
> fetch events. These are only usable if the immediate caller knows
> the particular event they want to capture, and are only interested
> in one specific event type. Adding ability to register an event
> callback lets the caller capture a range of events over any period
> of time.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

First of all, I *love* the idea of the patch, but ...

>  static GHookList abrt_hooks;
> @@ -703,8 +705,14 @@ QDict *qtest_qmp_receive(QTestState *s)
>          if (!qdict_get_try_str(response, "event")) {
>              return response;
>          }
> -        /* Stash the event for a later consumption */
> -        s->pending_events = g_list_append(s->pending_events, response);
> +
> +        if (s->eventCB) {
> +            s->eventCB(s, qdict_get_str(response, "event"),
> +                       response, s->eventData);
> +        } else {
> +            /* Stash the event for a later consumption */
> +            s->pending_events = g_list_append(s->pending_events, response);
> +        }
>      }

s->eventCB returns a bool that you are not using.  I think this part of
the code would be more usefule if:

        if (!s->eventCB || !s->eventCB(s, qdict_get_str(response, "event"),
                                       response, s->eventData)) {
            /* Stash the event for a later consumption */
            s->pending_events = g_list_append(s->pending_events, response);
        }

So if we are not handling the event, we put it on the pending_events
list.

What do you think?

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test
  2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
  2023-06-01  9:26   ` Thomas Huth
@ 2023-06-01 12:17   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:17 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> This function duplicates logic of qtest_qmp_assert_success_ref
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Much better that the current code.  And as you answer to Thomas, better
messages in case of errors.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
  2023-06-01  9:23   ` Thomas Huth
  2023-06-01 12:04   ` Juan Quintela
@ 2023-06-01 12:20   ` Juan Quintela
  2023-06-01 12:51     ` Daniel P. Berrangé
  2 siblings, 1 reply; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Add several counterparts of qtest_qmp_assert_success() that can
>
>  * Use va_list instead of ...
>  * Accept a list of FDs to send
>  * Return the response data
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
>  tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 209 insertions(+), 5 deletions(-)
>
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index c3a0ef5bb4..603c26d955 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -1229,14 +1229,23 @@ void qtest_memset(QTestState *s, uint64_t addr, uint8_t pattern, size_t size)
>      qtest_rsp(s);
>  }
>  
> -void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> +void qtest_vqmp_assert_success(QTestState *qts,
> +                               const char *fmt, va_list args)
>  {
> -    va_list ap;
>      QDict *response;
>  
> -    va_start(ap, fmt);
> -    response = qtest_vqmp(qts, fmt, ap);
> -    va_end(ap);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, args);
> +
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
> +                                     const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    QDict *ret;
> +
> +    response = qtest_vqmp(qts, fmt, args);
>  
>      g_assert(response);
>      if (!qdict_haskey(response, "return")) {
> @@ -1245,8 +1254,88 @@ void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
>          g_string_free(s, true);
>      }
>      g_assert(qdict_haskey(response, "return"));
> +    ret = qdict_get_qdict(response, "return");
> +    qobject_ref(ret);
> +    qobject_unref(response);
> +
> +    return ret;
> +}
> +
> +#ifndef _WIN32
> +QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                         const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    QDict *ret;
> +
> +    response = qtest_vqmp_fds(qts, fds, nfds, fmt, args);
> +
> +    g_assert(response);
> +    if (!qdict_haskey(response, "return")) {
> +        GString *s = qobject_to_json_pretty(QOBJECT(response), true);
> +        g_test_message("%s", s->str);
> +        g_string_free(s, true);

I know we are not consistent ot this file, but what about using autoptr here?

        g_autoptr(GString) *s = qobject_to_json_pretty(QOBJECT(response), true);
        g_test_message("%s", s->str);

??

> +    }
> +    g_assert(qdict_haskey(response, "return"));
> +    ret = qdict_get_qdict(response, "return");
> +    qobject_ref(ret);
> +    qobject_unref(response);
> +
> +    return ret;
> +}
> +
> +void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                   const char *fmt, va_list args)
> +{
> +    QDict *response;
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, args);
>      qobject_unref(response);
>  }
> +#endif /* !_WIN32 */
> +
> +void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
> +    va_end(ap);
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
> +    va_end(ap);
> +    return response;
> +}
> +
> +#ifndef _WIN32
> +void qtest_qmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                  const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);
> +    va_end(ap);
> +    qobject_unref(response);
> +}
> +
> +QDict *qtest_qmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                        const char *fmt, ...)
> +{
> +    QDict *response;
> +    va_list ap;
> +    va_start(ap, fmt);
> +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, ap);
> +    va_end(ap);
> +    return response;
> +}
> +#endif /* !_WIN32 */
>  
>  bool qtest_big_endian(QTestState *s)
>  {
> diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
> index 8d7d450963..41bc6633bd 100644
> --- a/tests/qtest/libqtest.h
> +++ b/tests/qtest/libqtest.h
> @@ -693,6 +693,73 @@ void qtest_add_abrt_handler(GHookFunc fn, const void *data);
>   */
>  void qtest_remove_abrt_handler(void *data);
>  
> +/**
> + * qtest_vqmp_assert_success:
> + * @qts: QTestState instance to operate on
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + * @args: variable arguments for @fmt
> + *
> + * Sends a QMP message to QEMU and asserts that a 'return' key is present in
> + * the response.
> + */
> +void qtest_vqmp_assert_success(QTestState *qts,
> +                               const char *fmt, va_list args)
> +    G_GNUC_PRINTF(2, 0);
> +
> +/**
> + * qtest_vqmp_assert_success_ref:
> + * @qts: QTestState instance to operate on
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + * @args: variable arguments for @fmt
> + *
> + * Sends a QMP message to QEMU, asserts that a 'return' key is present in
> + * the response, and returns the response.
> + */
> +QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
> +                                     const char *fmt, va_list args)
> +    G_GNUC_PRINTF(2, 0);
> +
> +#ifndef _WIN32
> +/**
> + * qtest_vqmp_fds_assert_success:
> + * @qts: QTestState instance to operate on
> + * @fds: the file descriptors to send
> + * @nfds: number of @fds to send
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + * @args: variable arguments for @fmt
> + *
> + * Sends a QMP message with file descriptors to QEMU and
> + * asserts that a 'return' key is present in the response.
> + */
> +void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                   const char *fmt, va_list args)
> +    G_GNUC_PRINTF(4, 0);
> +
> +/**
> + * qtest_vqmp_fds_assert_success_ref:
> + * @qts: QTestState instance to operate on
> + * @fds: the file descriptors to send
> + * @nfds: number of @fds to send
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + * @args: variable arguments for @fmt
> + *
> + * Sends a QMP message with file descriptors to QEMU,
> + * asserts that a 'return' key is present in the response,
> + * and returns the response.
> + */
> +QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                         const char *fmt, va_list args)
> +    G_GNUC_PRINTF(4, 0);
> +#endif /* !_WIN32 */
> +
>  /**
>   * qtest_qmp_assert_success:
>   * @qts: QTestState instance to operate on
> @@ -706,6 +773,54 @@ void qtest_remove_abrt_handler(void *data);
>  void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
>      G_GNUC_PRINTF(2, 3);
>  
> +/**
> + * qtest_qmp_assert_success_ref:
> + * @qts: QTestState instance to operate on
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + *
> + * Sends a QMP message to QEMU, asserts that a 'return' key is present in
> + * the response, and returns the response.
> + */
> +QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
> +    G_GNUC_PRINTF(2, 3);
> +
> +#ifndef _WIN32
> +/**
> + * qtest_qmp_fd_assert_success:
> + * @qts: QTestState instance to operate on
> + * @fds: the file descriptors to send
> + * @nfds: number of @fds to send
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + *
> + * Sends a QMP message with file descriptors to QEMU and
> + * asserts that a 'return' key is present in the response.
> + */
> +void qtest_qmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> +                                  const char *fmt, ...)
> +    G_GNUC_PRINTF(4, 5);
> +
> +/**
> + * qtest_qmp_fd_assert_success_ref:
> + * @qts: QTestState instance to operate on
> + * @fds: the file descriptors to send
> + * @nfds: number of @fds to send
> + * @fmt: QMP message to send to qemu, formatted like
> + * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
> + * supported after '%'.
> + *
> + * Sends a QMP message with file descriptors to QEMU,
> + * asserts that a 'return' key is present in the response,
> + * and returns the response.
> + */
> +QDict *qtest_qmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> +                                        const char *fmt, ...)
> +    G_GNUC_PRINTF(4, 5);
> +#endif /* !_WIN32 */
> +
>  /**
>   * qtest_cb_for_every_machine:
>   * @cb: Pointer to the callback function



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event
  2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
  2023-06-01  9:31   ` Thomas Huth
@ 2023-06-01 12:23   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Change the migration test to use the new qtest event callback to watch
> for the stop event. This ensures that we only watch for the STOP event
> on the source QEMU. The previous code would set the single 'got_stop'
> flag when either source or dest QEMU got the STOP event.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

If you agreed with my proposed change to patch 1

> -bool got_stop;
> -
> -static void check_stop_event(QTestState *who)
> +bool migrate_watch_for_stop(QTestState *who, const char *name,
> +                            QDict *event, void *opaque)
>  {
> -    QDict *event = qtest_qmp_event_ref(who, "STOP");
> -    if (event) {
> -        got_stop = true;
> -        qobject_unref(event);
> +    bool *seen = opaque;
> +
> +    if (g_str_equal(name, "STOP")) {
> +        *seen = true;

You should
           return true;

here.

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success
  2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
  2023-06-01  9:37   ` Thomas Huth
@ 2023-06-01 12:27   ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:27 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> Most usage of wait_command() is followed by qobject_unref(), which
> is just a verbose re-implementation of qtest_qmp_assert_success().
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>



> -    qobject_unref(rsp);
> +    qtest_qmp_fds_assert_success(to, &(pair[0]), 1,

I was wondering about the parens, and then saw Thomas message O:-)

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
  2023-06-01  9:38   ` Thomas Huth
@ 2023-06-01 12:31   ` Juan Quintela
  2023-06-01 12:34     ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:31 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> When running migration tests we monitor for a STOP event so we can skip
> redundant waits. This will be needed for the RESUME event too shortly.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/migration-helpers.c | 12 ++++++++++++
>  tests/qtest/migration-helpers.h |  2 ++
>  tests/qtest/migration-test.c    |  5 +++++
>  3 files changed, 19 insertions(+)
>
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index 884d8a2e07..d50b565967 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -35,6 +35,18 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
>      return false;
>  }
>  
> +bool migrate_watch_for_resume(QTestState *who, const char *name,
> +                              QDict *event, void *opaque)
> +{
> +    bool *seen = opaque;
> +
> +    if (g_str_equal(name, "RESUME")) {
> +        *seen = true;
> +    }
> +
> +    return false;
> +}
> +

I think I am not understanding this.

Can we wait for both RESUME and STOP events?

Or do you want an implementation that can only look for one event?

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
  2023-06-01  9:47   ` Thomas Huth
@ 2023-06-01 12:33   ` Juan Quintela
  2023-06-01 12:38     ` Daniel P. Berrangé
  2023-06-01 16:09     ` Thomas Huth
  2023-06-01 15:30   ` Peter Xu
  2 siblings, 2 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
>
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
>
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
>
> For test coverage the following scenarios are unchanged
>
>  * Precopy with UNIX sockets
>  * Precopy with UNIX sockets and dirty ring tracking
>  * Precopy with XBZRLE
>  * Precopy with UNIX compress
>  * Precopy with UNIX compress (nowait)
>  * Precopy with multifd
>
> On a test machine this reduces execution time from 13 minutes to
> 8 minutes.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Hi

I have a different idea to get the migration tests faster.  Namely, just
stop letting enter the completation stage until we are ready to use it.

I need that functionality also for vfio migration, so as I have to
create the patches, can we put on hold the last two patches and give me
a couple of days?

Thomas, do you care if I get the whole sets of patches through the
migration tree?  Almost everything is migration related, and I am
changing the same files that Daniel, so it is easier if I can get them
into my tree.

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-06-01 12:31   ` Juan Quintela
@ 2023-06-01 12:34     ` Daniel P. Berrangé
  2023-06-01 12:37       ` Juan Quintela
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:34 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 02:31:13PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > When running migration tests we monitor for a STOP event so we can skip
> > redundant waits. This will be needed for the RESUME event too shortly.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  tests/qtest/migration-helpers.c | 12 ++++++++++++
> >  tests/qtest/migration-helpers.h |  2 ++
> >  tests/qtest/migration-test.c    |  5 +++++
> >  3 files changed, 19 insertions(+)
> >
> > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> > index 884d8a2e07..d50b565967 100644
> > --- a/tests/qtest/migration-helpers.c
> > +++ b/tests/qtest/migration-helpers.c
> > @@ -35,6 +35,18 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
> >      return false;
> >  }
> >  
> > +bool migrate_watch_for_resume(QTestState *who, const char *name,
> > +                              QDict *event, void *opaque)
> > +{
> > +    bool *seen = opaque;
> > +
> > +    if (g_str_equal(name, "RESUME")) {
> > +        *seen = true;
> > +    }
> > +
> > +    return false;
> > +}
> > +
> 
> I think I am not understanding this.
> 
> Can we wait for both RESUME and STOP events?
> 
> Or do you want an implementation that can only look for one event?

For the purpose of the migration test we only need the STOP event
on the src and the RESUME event on the dst. While I could have made
it track both STOP and RESUME events on both src & dst, I figured
it was overkill.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-06-01 12:34     ` Daniel P. Berrangé
@ 2023-06-01 12:37       ` Juan Quintela
  2023-06-01 12:44         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 12:37 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Thu, Jun 01, 2023 at 02:31:13PM +0200, Juan Quintela wrote:
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> > When running migration tests we monitor for a STOP event so we can skip
>> > redundant waits. This will be needed for the RESUME event too shortly.
>> >
>> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> > ---
>> >  tests/qtest/migration-helpers.c | 12 ++++++++++++
>> >  tests/qtest/migration-helpers.h |  2 ++
>> >  tests/qtest/migration-test.c    |  5 +++++
>> >  3 files changed, 19 insertions(+)
>> >
>> > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
>> > index 884d8a2e07..d50b565967 100644
>> > --- a/tests/qtest/migration-helpers.c
>> > +++ b/tests/qtest/migration-helpers.c
>> > @@ -35,6 +35,18 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
>> >      return false;
>> >  }
>> >  
>> > +bool migrate_watch_for_resume(QTestState *who, const char *name,
>> > +                              QDict *event, void *opaque)
>> > +{
>> > +    bool *seen = opaque;
>> > +
>> > +    if (g_str_equal(name, "RESUME")) {
>> > +        *seen = true;
>> > +    }
>> > +
>> > +    return false;
>> > +}
>> > +
>> 
>> I think I am not understanding this.
>> 
>> Can we wait for both RESUME and STOP events?
>> 
>> Or do you want an implementation that can only look for one event?
>
> For the purpose of the migration test we only need the STOP event
> on the src and the RESUME event on the dst. While I could have made
> it track both STOP and RESUME events on both src & dst, I figured
> it was overkill.

Fair enough.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 12:33   ` Juan Quintela
@ 2023-06-01 12:38     ` Daniel P. Berrangé
  2023-06-01 16:09     ` Thomas Huth
  1 sibling, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:38 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 02:33:56PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > There are 27 pre-copy live migration scenarios being tested. In all of
> > these we force non-convergance and run for one iteration, then let it
> > converge and wait for completion during the second (or following)
> > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > long time (~30 seconds).
> >
> > While it is important to test the migration passes and convergance
> > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > TLS migration scenarios in particular are merely exercising different
> > code paths during connection establishment.
> >
> > To optimize time taken, switch most of the test scenarios to run
> > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > a massive speed up for most of the test scenarios.
> >
> > For test coverage the following scenarios are unchanged
> >
> >  * Precopy with UNIX sockets
> >  * Precopy with UNIX sockets and dirty ring tracking
> >  * Precopy with XBZRLE
> >  * Precopy with UNIX compress
> >  * Precopy with UNIX compress (nowait)
> >  * Precopy with multifd
> >
> > On a test machine this reduces execution time from 13 minutes to
> > 8 minutes.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> Hi
> 
> I have a different idea to get the migration tests faster.  Namely, just
> stop letting enter the completation stage until we are ready to use it.

That is still going to be slower than running the migration non-live,
because the guest will be dirtying memory that needs to be transferred.
Even if we have the elevated bandwidth limit that's going to have a
CPU cost which will hurt running time in CI where alot is running in
parallel.

> I need that functionality also for vfio migration, so as I have to
> create the patches, can we put on hold the last two patches and give me
> a couple of days?

IMHO this patch is beneficial regardless.

The next patch might be redundant though.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 7/9] tests/qtest: capture RESUME events during migration
  2023-06-01 12:37       ` Juan Quintela
@ 2023-06-01 12:44         ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:44 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 02:37:37PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Thu, Jun 01, 2023 at 02:31:13PM +0200, Juan Quintela wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >> > When running migration tests we monitor for a STOP event so we can skip
> >> > redundant waits. This will be needed for the RESUME event too shortly.
> >> >
> >> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >> > ---
> >> >  tests/qtest/migration-helpers.c | 12 ++++++++++++
> >> >  tests/qtest/migration-helpers.h |  2 ++
> >> >  tests/qtest/migration-test.c    |  5 +++++
> >> >  3 files changed, 19 insertions(+)
> >> >
> >> > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> >> > index 884d8a2e07..d50b565967 100644
> >> > --- a/tests/qtest/migration-helpers.c
> >> > +++ b/tests/qtest/migration-helpers.c
> >> > @@ -35,6 +35,18 @@ bool migrate_watch_for_stop(QTestState *who, const char *name,
> >> >      return false;
> >> >  }
> >> >  
> >> > +bool migrate_watch_for_resume(QTestState *who, const char *name,
> >> > +                              QDict *event, void *opaque)
> >> > +{
> >> > +    bool *seen = opaque;
> >> > +
> >> > +    if (g_str_equal(name, "RESUME")) {
> >> > +        *seen = true;
> >> > +    }
> >> > +
> >> > +    return false;
> >> > +}
> >> > +
> >> 
> >> I think I am not understanding this.
> >> 
> >> Can we wait for both RESUME and STOP events?
> >> 
> >> Or do you want an implementation that can only look for one event?
> >
> > For the purpose of the migration test we only need the STOP event
> > on the src and the RESUME event on the dst. While I could have made
> > it track both STOP and RESUME events on both src & dst, I figured
> > it was overkill.
> 
> Fair enough.

I think I'll rename 'bool got_stop' to  'bool got_src_stop' and have
'bool got_dst_resume'  to make it explicit which QEMU they're referring
to.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-06-01  9:23   ` Thomas Huth
@ 2023-06-01 12:48     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:48 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Juan Quintela,
	Leonardo Bras, Paolo Bonzini

On Thu, Jun 01, 2023 at 11:23:01AM +0200, Thomas Huth wrote:
> On 31/05/2023 15.23, Daniel P. Berrangé wrote:
> > Add several counterparts of qtest_qmp_assert_success() that can
> > 
> >   * Use va_list instead of ...
> >   * Accept a list of FDs to send
> >   * Return the response data
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
> >   tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 209 insertions(+), 5 deletions(-)


> > +#ifndef _WIN32
> > +QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> > +                                         const char *fmt, va_list args)
> > +{
> > +    QDict *response;
> > +    QDict *ret;
> > +
> > +    response = qtest_vqmp_fds(qts, fds, nfds, fmt, args);
> > +
> > +    g_assert(response);
> > +    if (!qdict_haskey(response, "return")) {
> > +        GString *s = qobject_to_json_pretty(QOBJECT(response), true);
> > +        g_test_message("%s", s->str);
> > +        g_string_free(s, true);
> > +    }
> > +    g_assert(qdict_haskey(response, "return"));
> > +    ret = qdict_get_qdict(response, "return");
> > +    qobject_ref(ret);
> > +    qobject_unref(response);
> > +
> > +    return ret;
> > +}
> > +
> > +void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
> > +                                   const char *fmt, va_list args)
> > +{
> > +    QDict *response;
> > +    response = qtest_vqmp_fds_assert_success_ref(qts, fds, nfds, fmt, args);
> >       qobject_unref(response);
> >   }
> 
> <bikeshedding>The ordering is a little bit inconsistent ... for some pairs,
> you do the _success() function first, and for some others you do the
> _success_ref() function first. IMHO it would be nicer to have the same
> ordering everywhere, preferably with the _success_ref() function first
> (since that's the one that is called from the other).</bikeshedding>

Sure, will do.

> 
> > +#endif /* !_WIN32 */
> > +
> > +void qtest_qmp_assert_success(QTestState *qts, const char *fmt, ...)
> > +{
> > +    QDict *response;
> > +    va_list ap;
> > +    va_start(ap, fmt);
> > +    response = qtest_vqmp_assert_success_ref(qts, fmt, ap);
> 
> You could use qtest_vqmp_assert_success() instead, I think, and dro the
> qobject_unref() below.

Agreed


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants
  2023-06-01 12:20   ` Juan Quintela
@ 2023-06-01 12:51     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:51 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 02:20:34PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Add several counterparts of qtest_qmp_assert_success() that can
> >
> >  * Use va_list instead of ...
> >  * Accept a list of FDs to send
> >  * Return the response data
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  tests/qtest/libqtest.c |  99 +++++++++++++++++++++++++++++++++--
> >  tests/qtest/libqtest.h | 115 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 209 insertions(+), 5 deletions(-)
> >
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index c3a0ef5bb4..603c26d955 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c


> > +#ifndef _WIN32
> > +QDict *qtest_vqmp_fds_assert_success_ref(QTestState *qts, int *fds, size_t nfds,
> > +                                         const char *fmt, va_list args)
> > +{
> > +    QDict *response;
> > +    QDict *ret;
> > +
> > +    response = qtest_vqmp_fds(qts, fds, nfds, fmt, args);
> > +
> > +    g_assert(response);
> > +    if (!qdict_haskey(response, "return")) {
> > +        GString *s = qobject_to_json_pretty(QOBJECT(response), true);
> > +        g_test_message("%s", s->str);
> > +        g_string_free(s, true);
> 
> I know we are not consistent ot this file, but what about using autoptr here?
> 
>         g_autoptr(GString) *s = qobject_to_json_pretty(QOBJECT(response), true);
>         g_test_message("%s", s->str);
> 
> ??

Sure.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events
  2023-06-01 12:14   ` Juan Quintela
@ 2023-06-01 12:56     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 12:56 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 02:14:08PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Currently code must call one of the qtest_qmp_event* functions to
> > fetch events. These are only usable if the immediate caller knows
> > the particular event they want to capture, and are only interested
> > in one specific event type. Adding ability to register an event
> > callback lets the caller capture a range of events over any period
> > of time.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> First of all, I *love* the idea of the patch, but ...
> 
> >  static GHookList abrt_hooks;
> > @@ -703,8 +705,14 @@ QDict *qtest_qmp_receive(QTestState *s)
> >          if (!qdict_get_try_str(response, "event")) {
> >              return response;
> >          }
> > -        /* Stash the event for a later consumption */
> > -        s->pending_events = g_list_append(s->pending_events, response);
> > +
> > +        if (s->eventCB) {
> > +            s->eventCB(s, qdict_get_str(response, "event"),
> > +                       response, s->eventData);
> > +        } else {
> > +            /* Stash the event for a later consumption */
> > +            s->pending_events = g_list_append(s->pending_events, response);
> > +        }
> >      }
> 
> s->eventCB returns a bool that you are not using.  I think this part of
> the code would be more usefule if:
> 
>         if (!s->eventCB || !s->eventCB(s, qdict_get_str(response, "event"),
>                                        response, s->eventData)) {
>             /* Stash the event for a later consumption */
>             s->pending_events = g_list_append(s->pending_events, response);
>         }
> 
> So if we are not handling the event, we put it on the pending_events
> list.
> 
> What do you think?

Sure, it is easy enough to do.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
  2023-06-01  9:47   ` Thomas Huth
  2023-06-01 12:33   ` Juan Quintela
@ 2023-06-01 15:30   ` Peter Xu
  2023-06-01 15:39     ` Daniel P. Berrangé
  2 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2023-06-01 15:30 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

Thanks for looking into this.. definitely worthwhile.

On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
> 
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
> 
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
> 
> For test coverage the following scenarios are unchanged

Curious how are below chosen?  I assume..

> 
>  * Precopy with UNIX sockets

this one verifies dirty log.

>  * Precopy with UNIX sockets and dirty ring tracking

... dirty ring...

>  * Precopy with XBZRLE

... xbzrle I think needs a diff on old/new, makes sense.

>  * Precopy with UNIX compress
>  * Precopy with UNIX compress (nowait)
>  * Precopy with multifd

What about the rest three?  Especially for two compression tests.

> 
> On a test machine this reduces execution time from 13 minutes to
> 8 minutes.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 15:30   ` Peter Xu
@ 2023-06-01 15:39     ` Daniel P. Berrangé
  2023-06-01 15:53       ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 15:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
> Thanks for looking into this.. definitely worthwhile.
> 
> On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> > There are 27 pre-copy live migration scenarios being tested. In all of
> > these we force non-convergance and run for one iteration, then let it
> > converge and wait for completion during the second (or following)
> > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > long time (~30 seconds).
> > 
> > While it is important to test the migration passes and convergance
> > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > TLS migration scenarios in particular are merely exercising different
> > code paths during connection establishment.
> > 
> > To optimize time taken, switch most of the test scenarios to run
> > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > a massive speed up for most of the test scenarios.
> > 
> > For test coverage the following scenarios are unchanged
> 
> Curious how are below chosen?  I assume..

Chosen based on whether they exercise code paths that are unique
and interesting during the RAM transfer phase.

Essentially the goal is that if we have N% code coverage before this
patch, then we should still have the same N% code coverage after this
patch.

The TLS tests exercise code paths that are unique during the migration
establishment phase. Once establishd they don't exercise anything
"interesting" during RAM transfer phase. Thus we don't loose code coverage
by runing TLS tests non-live.

> 
> > 
> >  * Precopy with UNIX sockets
> 
> this one verifies dirty log.
> 
> >  * Precopy with UNIX sockets and dirty ring tracking
> 
> ... dirty ring...
> 
> >  * Precopy with XBZRLE
> 
> ... xbzrle I think needs a diff on old/new, makes sense.
> 
> >  * Precopy with UNIX compress
> >  * Precopy with UNIX compress (nowait)
> >  * Precopy with multifd
> 
> What about the rest three?  Especially for two compression tests.

The compress thread logic is unique/interesting during RAM transfer
so benefits from running live. The wait vs non-wait scenario tests
a distinct codepath/logic.

multifd has been a massive source of bugs and has very different
codepaths from non-multifd, so is needed for codeage.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
  2023-06-01 10:04   ` Thomas Huth
@ 2023-06-01 15:46   ` Peter Xu
  2023-06-01 16:05     ` Daniel P. Berrangé
  2023-06-01 23:00     ` Juan Quintela
  1 sibling, 2 replies; 57+ messages in thread
From: Peter Xu @ 2023-06-01 15:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> The migration test cases that actually exercise live migration want to
> ensure there is a minimum of two iterations of pre-copy, in order to
> exercise the dirty tracking code.
> 
> Historically we've queried the migration status, looking for the
> 'dirty-sync-count' value to increment to track iterations. This was
> not entirely reliable because often all the data would get transferred
> quickly enough that the migration would finish before we wanted it
> to. So we massively dropped the bandwidth and max downtime to
> guarantee non-convergance. This had the unfortunate side effect
> that every migration took at least 30 seconds to run (100 MB of
> dirty pages / 3 MB/sec).
> 
> This optimization takes a different approach to ensuring that a
> mimimum of two iterations. Rather than waiting for dirty-sync-count
> to increment, directly look for an indication that the source VM
> has dirtied RAM that has already been transferred.
> 
> On the source VM a magic marker is written just after the 3 MB
> offset. The destination VM is now montiored to detect when the
> magic marker is transferred. This gives a guarantee that the
> first 3 MB of memory have been transferred. Now the source VM
> memory is monitored at exactly the 3MB offset until we observe
> a flip in its value. This gives us a guaranteed that the guest
> workload has dirtied a byte that has already been transferred.
> 
> Since we're looking at a place that is only 3 MB from the start
> of memory, with the 3 MB/sec bandwidth, this test should complete
> in 1 second, instead of 30 seconds.
> 
> Once we've proved there is some dirty memory, migration can be
> set back to full speed for the remainder of the 1st iteration,
> and the entire of the second iteration at which point migration
> should be complete.
> 
> On a test machine this further reduces the migration test time
> from 8 minutes to 1 minute 40.

The outcome is definitely nice, but it does looks slightly hacky to me and
make the test slightly more complicated.

If it's all about making sure we finish the 1st iteration, can we simply
add a src qemu parameter "switchover-hold"?  If it's set, src never
switchover to dst but keeps the iterations.

Then migrate_ensure_non_converge() will be as simple as setting
switchover-hold to true.

I am even thinking whether there can even be real-life use case for that,
e.g., where a user might want to have a pre-heat of a migration of some VM,
and trigger it immediately when the admin really wants (the pre-heats moved
most of the pages and keep doing so).

It'll be also similar to what Avihai proposed here on switchover-ack, just
an ack mechanism on the src side:

https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 15:39     ` Daniel P. Berrangé
@ 2023-06-01 15:53       ` Peter Xu
  2023-06-01 15:55         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2023-06-01 15:53 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
> > Thanks for looking into this.. definitely worthwhile.
> > 
> > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> > > There are 27 pre-copy live migration scenarios being tested. In all of
> > > these we force non-convergance and run for one iteration, then let it
> > > converge and wait for completion during the second (or following)
> > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > > long time (~30 seconds).
> > > 
> > > While it is important to test the migration passes and convergance
> > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > > TLS migration scenarios in particular are merely exercising different
> > > code paths during connection establishment.
> > > 
> > > To optimize time taken, switch most of the test scenarios to run
> > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > > a massive speed up for most of the test scenarios.
> > > 
> > > For test coverage the following scenarios are unchanged
> > 
> > Curious how are below chosen?  I assume..
> 
> Chosen based on whether they exercise code paths that are unique
> and interesting during the RAM transfer phase.
> 
> Essentially the goal is that if we have N% code coverage before this
> patch, then we should still have the same N% code coverage after this
> patch.
> 
> The TLS tests exercise code paths that are unique during the migration
> establishment phase. Once establishd they don't exercise anything
> "interesting" during RAM transfer phase. Thus we don't loose code coverage
> by runing TLS tests non-live.
> 
> > 
> > > 
> > >  * Precopy with UNIX sockets
> > 
> > this one verifies dirty log.
> > 
> > >  * Precopy with UNIX sockets and dirty ring tracking
> > 
> > ... dirty ring...
> > 
> > >  * Precopy with XBZRLE
> > 
> > ... xbzrle I think needs a diff on old/new, makes sense.
> > 
> > >  * Precopy with UNIX compress
> > >  * Precopy with UNIX compress (nowait)
> > >  * Precopy with multifd
> > 
> > What about the rest three?  Especially for two compression tests.
> 
> The compress thread logic is unique/interesting during RAM transfer
> so benefits from running live. The wait vs non-wait scenario tests
> a distinct codepath/logic.

I assume you mean e.g. when compressing with guest page being modified and
we should survive that rather than crashing the compressor?

Okay to me.

> 
> multifd has been a massive source of bugs and has very different
> codepaths from non-multifd, so is needed for codeage.

:(

I think it makes sense to keep multifd in that, especially if we want to
some day make it more or less default.

Would you mind add some comment into each of .live=true case right above?
Then it'll be fairly clear that's not the default and any new test case
should justify them to be live=true.

Feel free to add:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 15:53       ` Peter Xu
@ 2023-06-01 15:55         ` Daniel P. Berrangé
  2023-06-01 16:17           ` Peter Xu
  2023-06-01 22:55           ` Juan Quintela
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 15:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
> On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
> > > Thanks for looking into this.. definitely worthwhile.
> > > 
> > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> > > > There are 27 pre-copy live migration scenarios being tested. In all of
> > > > these we force non-convergance and run for one iteration, then let it
> > > > converge and wait for completion during the second (or following)
> > > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > > > long time (~30 seconds).
> > > > 
> > > > While it is important to test the migration passes and convergance
> > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > > > TLS migration scenarios in particular are merely exercising different
> > > > code paths during connection establishment.
> > > > 
> > > > To optimize time taken, switch most of the test scenarios to run
> > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > > > a massive speed up for most of the test scenarios.
> > > > 
> > > > For test coverage the following scenarios are unchanged
> > > 
> > > Curious how are below chosen?  I assume..
> > 
> > Chosen based on whether they exercise code paths that are unique
> > and interesting during the RAM transfer phase.
> > 
> > Essentially the goal is that if we have N% code coverage before this
> > patch, then we should still have the same N% code coverage after this
> > patch.
> > 
> > The TLS tests exercise code paths that are unique during the migration
> > establishment phase. Once establishd they don't exercise anything
> > "interesting" during RAM transfer phase. Thus we don't loose code coverage
> > by runing TLS tests non-live.
> > 
> > > 
> > > > 
> > > >  * Precopy with UNIX sockets
> > > 
> > > this one verifies dirty log.
> > > 
> > > >  * Precopy with UNIX sockets and dirty ring tracking
> > > 
> > > ... dirty ring...
> > > 
> > > >  * Precopy with XBZRLE
> > > 
> > > ... xbzrle I think needs a diff on old/new, makes sense.
> > > 
> > > >  * Precopy with UNIX compress
> > > >  * Precopy with UNIX compress (nowait)
> > > >  * Precopy with multifd
> > > 
> > > What about the rest three?  Especially for two compression tests.
> > 
> > The compress thread logic is unique/interesting during RAM transfer
> > so benefits from running live. The wait vs non-wait scenario tests
> > a distinct codepath/logic.
> 
> I assume you mean e.g. when compressing with guest page being modified and
> we should survive that rather than crashing the compressor?

No, i mean the compression code has a significant behaviour difference
between its two tests, because they toggle:

 @compress-wait-thread: Controls behavior when all compression
     threads are currently busy.  If true (default), wait for a free
     compression thread to become available; otherwise, send the page
     uncompressed.  (Since 3.1)

so we need to exercise the code path that falls back to sending
uncompressed, as well as the code path that waits for free threads.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 15:46   ` Peter Xu
@ 2023-06-01 16:05     ` Daniel P. Berrangé
  2023-06-01 16:22       ` Peter Xu
  2023-06-01 23:00     ` Juan Quintela
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote:
> On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> > The migration test cases that actually exercise live migration want to
> > ensure there is a minimum of two iterations of pre-copy, in order to
> > exercise the dirty tracking code.
> > 
> > Historically we've queried the migration status, looking for the
> > 'dirty-sync-count' value to increment to track iterations. This was
> > not entirely reliable because often all the data would get transferred
> > quickly enough that the migration would finish before we wanted it
> > to. So we massively dropped the bandwidth and max downtime to
> > guarantee non-convergance. This had the unfortunate side effect
> > that every migration took at least 30 seconds to run (100 MB of
> > dirty pages / 3 MB/sec).
> > 
> > This optimization takes a different approach to ensuring that a
> > mimimum of two iterations. Rather than waiting for dirty-sync-count
> > to increment, directly look for an indication that the source VM
> > has dirtied RAM that has already been transferred.
> > 
> > On the source VM a magic marker is written just after the 3 MB
> > offset. The destination VM is now montiored to detect when the
> > magic marker is transferred. This gives a guarantee that the
> > first 3 MB of memory have been transferred. Now the source VM
> > memory is monitored at exactly the 3MB offset until we observe
> > a flip in its value. This gives us a guaranteed that the guest
> > workload has dirtied a byte that has already been transferred.
> > 
> > Since we're looking at a place that is only 3 MB from the start
> > of memory, with the 3 MB/sec bandwidth, this test should complete
> > in 1 second, instead of 30 seconds.
> > 
> > Once we've proved there is some dirty memory, migration can be
> > set back to full speed for the remainder of the 1st iteration,
> > and the entire of the second iteration at which point migration
> > should be complete.
> > 
> > On a test machine this further reduces the migration test time
> > from 8 minutes to 1 minute 40.
> 
> The outcome is definitely nice, but it does looks slightly hacky to me and
> make the test slightly more complicated.
> 
> If it's all about making sure we finish the 1st iteration, can we simply
> add a src qemu parameter "switchover-hold"?  If it's set, src never
> switchover to dst but keeps the iterations.

For *most* of the tests, we want to ensure there are a minimum
of 2 iterations. For the XBZRLE test we want to ensure there are
a minimum of 3 iterations, so the cache gets  workout.

> Then migrate_ensure_non_converge() will be as simple as setting
> switchover-hold to true.
> 
> I am even thinking whether there can even be real-life use case for that,
> e.g., where a user might want to have a pre-heat of a migration of some VM,
> and trigger it immediately when the admin really wants (the pre-heats moved
> most of the pages and keep doing so).
> 
> It'll be also similar to what Avihai proposed here on switchover-ack, just
> an ack mechanism on the src side:
> 
> https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com

In general I strongly wanted to avoid adding special logic to the
migration code that makes it directly synchronize with the  test
suite, because once we do that I don't think the test suite is a
providing coverage of the real world usage scenario.

IOW, if we add a switchover-ack feature, we should certainly have
*a* test that uses it, but we should not modify other tests because
we want to exercise the logic as it would run with apps that don't
rely on switchover-ack.

Also, this slow migration test is incredibly painful for people right
now, so I'd like to see us get a speed up committed to git quickly.
I don't want to block it on a feature proposal that might yet take
months to merge.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 12:33   ` Juan Quintela
  2023-06-01 12:38     ` Daniel P. Berrangé
@ 2023-06-01 16:09     ` Thomas Huth
  2023-06-01 16:17       ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Thomas Huth @ 2023-06-01 16:09 UTC (permalink / raw)
  To: quintela, Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras, Paolo Bonzini

On 01/06/2023 14.33, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
...
>> On a test machine this reduces execution time from 13 minutes to
>> 8 minutes.
>>
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> Hi
> 
> I have a different idea to get the migration tests faster.  Namely, just
> stop letting enter the completation stage until we are ready to use it.
> 
> I need that functionality also for vfio migration, so as I have to
> create the patches, can we put on hold the last two patches and give me
> a couple of days?

A couple of days are certainly OK ... but could we please avoid being stuck 
in the current situation for many more weeks? The slowness of the migration 
tests are really slowing down my turnaround times, and I think they 
contribute to the timeouts of the Travis CI tests that I'm currently facing, 
and likely also contributed to the issue that the QEMU project ran out of 
Gitlab CI minutes again last month (which is thankfully mostly hidden by the 
new private Linux runners, but some things like the Windows runners were not 
available anymore). So I'd appreciate if we'd rather get a speed up here 
merged rather sooner than later.

> Thomas, do you care if I get the whole sets of patches through the
> migration tree? 

That's fine, please take them through your tree!

  Thomas

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 16:09     ` Thomas Huth
@ 2023-06-01 16:17       ` Daniel P. Berrangé
  2023-06-01 16:26         ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:17 UTC (permalink / raw)
  To: Thomas Huth
  Cc: quintela, qemu-devel, Laurent Vivier, Peter Xu, Leonardo Bras,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 06:09:44PM +0200, Thomas Huth wrote:
> On 01/06/2023 14.33, Juan Quintela wrote:
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> ...
> > > On a test machine this reduces execution time from 13 minutes to
> > > 8 minutes.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > Hi
> > 
> > I have a different idea to get the migration tests faster.  Namely, just
> > stop letting enter the completation stage until we are ready to use it.
> > 
> > I need that functionality also for vfio migration, so as I have to
> > create the patches, can we put on hold the last two patches and give me
> > a couple of days?
> 
> A couple of days are certainly OK ... but could we please avoid being stuck
> in the current situation for many more weeks?

Also as I mentioned in my reply to Peter, I'm wary of an approach that
puts in a synchronization between the test suite and the migrate code,
because that's making migration run in a manner which doesn't actually
match how it'll run in the real world. So I think if there was a feature
to add a sync before the final iteration, we'd still want to test without
using that feature so we get proper coverage.

Could we merge this series as-is, and simply re-visit the last patch
afterwards, once we have time to debate the feature about completion
phase synchronization.

>                                               The slowness of the migration
> tests are really slowing down my turnaround times, and I think they
> contribute to the timeouts of the Travis CI tests that I'm currently facing,
> and likely also contributed to the issue that the QEMU project ran out of
> Gitlab CI minutes again last month (which is thankfully mostly hidden by the
> new private Linux runners, but some things like the Windows runners were not
> available anymore). So I'd appreciate if we'd rather get a speed up here
> merged rather sooner than later.
> 
> > Thomas, do you care if I get the whole sets of patches through the
> > migration tree?
> 
> That's fine, please take them through your tree!

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 15:55         ` Daniel P. Berrangé
@ 2023-06-01 16:17           ` Peter Xu
  2023-06-01 16:35             ` Daniel P. Berrangé
  2023-06-01 22:58             ` Juan Quintela
  2023-06-01 22:55           ` Juan Quintela
  1 sibling, 2 replies; 57+ messages in thread
From: Peter Xu @ 2023-06-01 16:17 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 04:55:25PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
> > On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
> > > > Thanks for looking into this.. definitely worthwhile.
> > > > 
> > > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> > > > > There are 27 pre-copy live migration scenarios being tested. In all of
> > > > > these we force non-convergance and run for one iteration, then let it
> > > > > converge and wait for completion during the second (or following)
> > > > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > > > > long time (~30 seconds).
> > > > > 
> > > > > While it is important to test the migration passes and convergance
> > > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > > > > TLS migration scenarios in particular are merely exercising different
> > > > > code paths during connection establishment.
> > > > > 
> > > > > To optimize time taken, switch most of the test scenarios to run
> > > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > > > > a massive speed up for most of the test scenarios.
> > > > > 
> > > > > For test coverage the following scenarios are unchanged
> > > > 
> > > > Curious how are below chosen?  I assume..
> > > 
> > > Chosen based on whether they exercise code paths that are unique
> > > and interesting during the RAM transfer phase.
> > > 
> > > Essentially the goal is that if we have N% code coverage before this
> > > patch, then we should still have the same N% code coverage after this
> > > patch.
> > > 
> > > The TLS tests exercise code paths that are unique during the migration
> > > establishment phase. Once establishd they don't exercise anything
> > > "interesting" during RAM transfer phase. Thus we don't loose code coverage
> > > by runing TLS tests non-live.
> > > 
> > > > 
> > > > > 
> > > > >  * Precopy with UNIX sockets
> > > > 
> > > > this one verifies dirty log.
> > > > 
> > > > >  * Precopy with UNIX sockets and dirty ring tracking
> > > > 
> > > > ... dirty ring...
> > > > 
> > > > >  * Precopy with XBZRLE
> > > > 
> > > > ... xbzrle I think needs a diff on old/new, makes sense.
> > > > 
> > > > >  * Precopy with UNIX compress
> > > > >  * Precopy with UNIX compress (nowait)
> > > > >  * Precopy with multifd
> > > > 
> > > > What about the rest three?  Especially for two compression tests.
> > > 
> > > The compress thread logic is unique/interesting during RAM transfer
> > > so benefits from running live. The wait vs non-wait scenario tests
> > > a distinct codepath/logic.
> > 
> > I assume you mean e.g. when compressing with guest page being modified and
> > we should survive that rather than crashing the compressor?
> 
> No, i mean the compression code has a significant behaviour difference
> between its two tests, because they toggle:
> 
>  @compress-wait-thread: Controls behavior when all compression
>      threads are currently busy.  If true (default), wait for a free
>      compression thread to become available; otherwise, send the page
>      uncompressed.  (Since 3.1)
> 
> so we need to exercise the code path that falls back to sending
> uncompressed, as well as the code path that waits for free threads.

But then the question is why live is needed?

IIUC whether the wait thing triggers have nothing directly related to VM is
live or not, but whether all compress thread busy.  IOW, IIUC all compress
paths will be tested even if non-live as long as we feed enough pages to
the compressor threads.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 16:05     ` Daniel P. Berrangé
@ 2023-06-01 16:22       ` Peter Xu
  2023-06-01 16:36         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2023-06-01 16:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 05:05:23PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote:
> > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> > > The migration test cases that actually exercise live migration want to
> > > ensure there is a minimum of two iterations of pre-copy, in order to
> > > exercise the dirty tracking code.
> > > 
> > > Historically we've queried the migration status, looking for the
> > > 'dirty-sync-count' value to increment to track iterations. This was
> > > not entirely reliable because often all the data would get transferred
> > > quickly enough that the migration would finish before we wanted it
> > > to. So we massively dropped the bandwidth and max downtime to
> > > guarantee non-convergance. This had the unfortunate side effect
> > > that every migration took at least 30 seconds to run (100 MB of
> > > dirty pages / 3 MB/sec).
> > > 
> > > This optimization takes a different approach to ensuring that a
> > > mimimum of two iterations. Rather than waiting for dirty-sync-count
> > > to increment, directly look for an indication that the source VM
> > > has dirtied RAM that has already been transferred.
> > > 
> > > On the source VM a magic marker is written just after the 3 MB
> > > offset. The destination VM is now montiored to detect when the
> > > magic marker is transferred. This gives a guarantee that the
> > > first 3 MB of memory have been transferred. Now the source VM
> > > memory is monitored at exactly the 3MB offset until we observe
> > > a flip in its value. This gives us a guaranteed that the guest
> > > workload has dirtied a byte that has already been transferred.
> > > 
> > > Since we're looking at a place that is only 3 MB from the start
> > > of memory, with the 3 MB/sec bandwidth, this test should complete
> > > in 1 second, instead of 30 seconds.
> > > 
> > > Once we've proved there is some dirty memory, migration can be
> > > set back to full speed for the remainder of the 1st iteration,
> > > and the entire of the second iteration at which point migration
> > > should be complete.
> > > 
> > > On a test machine this further reduces the migration test time
> > > from 8 minutes to 1 minute 40.
> > 
> > The outcome is definitely nice, but it does looks slightly hacky to me and
> > make the test slightly more complicated.
> > 
> > If it's all about making sure we finish the 1st iteration, can we simply
> > add a src qemu parameter "switchover-hold"?  If it's set, src never
> > switchover to dst but keeps the iterations.
> 
> For *most* of the tests, we want to ensure there are a minimum
> of 2 iterations. For the XBZRLE test we want to ensure there are
> a minimum of 3 iterations, so the cache gets  workout.
> 
> > Then migrate_ensure_non_converge() will be as simple as setting
> > switchover-hold to true.
> > 
> > I am even thinking whether there can even be real-life use case for that,
> > e.g., where a user might want to have a pre-heat of a migration of some VM,
> > and trigger it immediately when the admin really wants (the pre-heats moved
> > most of the pages and keep doing so).
> > 
> > It'll be also similar to what Avihai proposed here on switchover-ack, just
> > an ack mechanism on the src side:
> > 
> > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com
> 
> In general I strongly wanted to avoid adding special logic to the
> migration code that makes it directly synchronize with the  test
> suite, because once we do that I don't think the test suite is a
> providing coverage of the real world usage scenario.

The problem is non-live is already not real world usage in most cases.  It
seems we all agreed that it's the code paths to cover not real world usages
in the tests, or maybe not?

> 
> IOW, if we add a switchover-ack feature, we should certainly have
> *a* test that uses it, but we should not modify other tests because
> we want to exercise the logic as it would run with apps that don't
> rely on switchover-ack.
> 
> Also, this slow migration test is incredibly painful for people right
> now, so I'd like to see us get a speed up committed to git quickly.
> I don't want to block it on a feature proposal that might yet take
> months to merge.

That'll be trivial, afaict.

I just worry that this patch will bring complexity to the test cases,
that's another burden we need to carry besides QEMU itself.

If you want, I can try to prepare such a patch before this weekend, and if
it's complicated enough and take more than next week to review feel free to
go ahead with this one.

I understand the migration test issue was there for a long time.  But still
to me it's important on which may be cleaner for the long term too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 16:17       ` Daniel P. Berrangé
@ 2023-06-01 16:26         ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2023-06-01 16:26 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, quintela, qemu-devel, Laurent Vivier, Leonardo Bras,
	Paolo Bonzini

On Thu, Jun 01, 2023 at 05:17:11PM +0100, Daniel P. Berrangé wrote:
> Could we merge this series as-is, and simply re-visit the last patch
> afterwards, once we have time to debate the feature about completion
> phase synchronization.

This sounds good to me too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 16:17           ` Peter Xu
@ 2023-06-01 16:35             ` Daniel P. Berrangé
  2023-06-01 16:59               ` Peter Xu
  2023-06-01 22:58             ` Juan Quintela
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 12:17:38PM -0400, Peter Xu wrote:
> On Thu, Jun 01, 2023 at 04:55:25PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
> > > On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
> > > > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
> > > > > Thanks for looking into this.. definitely worthwhile.
> > > > > 
> > > > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
> > > > > > There are 27 pre-copy live migration scenarios being tested. In all of
> > > > > > these we force non-convergance and run for one iteration, then let it
> > > > > > converge and wait for completion during the second (or following)
> > > > > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > > > > > long time (~30 seconds).
> > > > > > 
> > > > > > While it is important to test the migration passes and convergance
> > > > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > > > > > TLS migration scenarios in particular are merely exercising different
> > > > > > code paths during connection establishment.
> > > > > > 
> > > > > > To optimize time taken, switch most of the test scenarios to run
> > > > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > > > > > a massive speed up for most of the test scenarios.
> > > > > > 
> > > > > > For test coverage the following scenarios are unchanged
> > > > > 
> > > > > Curious how are below chosen?  I assume..
> > > > 
> > > > Chosen based on whether they exercise code paths that are unique
> > > > and interesting during the RAM transfer phase.
> > > > 
> > > > Essentially the goal is that if we have N% code coverage before this
> > > > patch, then we should still have the same N% code coverage after this
> > > > patch.
> > > > 
> > > > The TLS tests exercise code paths that are unique during the migration
> > > > establishment phase. Once establishd they don't exercise anything
> > > > "interesting" during RAM transfer phase. Thus we don't loose code coverage
> > > > by runing TLS tests non-live.
> > > > 
> > > > > 
> > > > > > 
> > > > > >  * Precopy with UNIX sockets
> > > > > 
> > > > > this one verifies dirty log.
> > > > > 
> > > > > >  * Precopy with UNIX sockets and dirty ring tracking
> > > > > 
> > > > > ... dirty ring...
> > > > > 
> > > > > >  * Precopy with XBZRLE
> > > > > 
> > > > > ... xbzrle I think needs a diff on old/new, makes sense.
> > > > > 
> > > > > >  * Precopy with UNIX compress
> > > > > >  * Precopy with UNIX compress (nowait)
> > > > > >  * Precopy with multifd
> > > > > 
> > > > > What about the rest three?  Especially for two compression tests.
> > > > 
> > > > The compress thread logic is unique/interesting during RAM transfer
> > > > so benefits from running live. The wait vs non-wait scenario tests
> > > > a distinct codepath/logic.
> > > 
> > > I assume you mean e.g. when compressing with guest page being modified and
> > > we should survive that rather than crashing the compressor?
> > 
> > No, i mean the compression code has a significant behaviour difference
> > between its two tests, because they toggle:
> > 
> >  @compress-wait-thread: Controls behavior when all compression
> >      threads are currently busy.  If true (default), wait for a free
> >      compression thread to become available; otherwise, send the page
> >      uncompressed.  (Since 3.1)
> > 
> > so we need to exercise the code path that falls back to sending
> > uncompressed, as well as the code path that waits for free threads.
> 
> But then the question is why live is needed?
>
> IIUC whether the wait thing triggers have nothing directly related to VM is
> live or not, but whether all compress thread busy.  IOW, IIUC all compress
> paths will be tested even if non-live as long as we feed enough pages to
> the compressor threads.

If non-live the guest won't have dirtied any pages, so I wasn't
confident there would be sufficient amount of dirty ram to send
to trigger this. Possibly that's being too paranoid though

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 16:22       ` Peter Xu
@ 2023-06-01 16:36         ` Daniel P. Berrangé
  2023-06-01 17:04           ` Peter Xu
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-06-01 16:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 12:22:36PM -0400, Peter Xu wrote:
> On Thu, Jun 01, 2023 at 05:05:23PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Jun 01, 2023 at 11:46:01AM -0400, Peter Xu wrote:
> > > On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
> > > > The migration test cases that actually exercise live migration want to
> > > > ensure there is a minimum of two iterations of pre-copy, in order to
> > > > exercise the dirty tracking code.
> > > > 
> > > > Historically we've queried the migration status, looking for the
> > > > 'dirty-sync-count' value to increment to track iterations. This was
> > > > not entirely reliable because often all the data would get transferred
> > > > quickly enough that the migration would finish before we wanted it
> > > > to. So we massively dropped the bandwidth and max downtime to
> > > > guarantee non-convergance. This had the unfortunate side effect
> > > > that every migration took at least 30 seconds to run (100 MB of
> > > > dirty pages / 3 MB/sec).
> > > > 
> > > > This optimization takes a different approach to ensuring that a
> > > > mimimum of two iterations. Rather than waiting for dirty-sync-count
> > > > to increment, directly look for an indication that the source VM
> > > > has dirtied RAM that has already been transferred.
> > > > 
> > > > On the source VM a magic marker is written just after the 3 MB
> > > > offset. The destination VM is now montiored to detect when the
> > > > magic marker is transferred. This gives a guarantee that the
> > > > first 3 MB of memory have been transferred. Now the source VM
> > > > memory is monitored at exactly the 3MB offset until we observe
> > > > a flip in its value. This gives us a guaranteed that the guest
> > > > workload has dirtied a byte that has already been transferred.
> > > > 
> > > > Since we're looking at a place that is only 3 MB from the start
> > > > of memory, with the 3 MB/sec bandwidth, this test should complete
> > > > in 1 second, instead of 30 seconds.
> > > > 
> > > > Once we've proved there is some dirty memory, migration can be
> > > > set back to full speed for the remainder of the 1st iteration,
> > > > and the entire of the second iteration at which point migration
> > > > should be complete.
> > > > 
> > > > On a test machine this further reduces the migration test time
> > > > from 8 minutes to 1 minute 40.
> > > 
> > > The outcome is definitely nice, but it does looks slightly hacky to me and
> > > make the test slightly more complicated.
> > > 
> > > If it's all about making sure we finish the 1st iteration, can we simply
> > > add a src qemu parameter "switchover-hold"?  If it's set, src never
> > > switchover to dst but keeps the iterations.
> > 
> > For *most* of the tests, we want to ensure there are a minimum
> > of 2 iterations. For the XBZRLE test we want to ensure there are
> > a minimum of 3 iterations, so the cache gets  workout.
> > 
> > > Then migrate_ensure_non_converge() will be as simple as setting
> > > switchover-hold to true.
> > > 
> > > I am even thinking whether there can even be real-life use case for that,
> > > e.g., where a user might want to have a pre-heat of a migration of some VM,
> > > and trigger it immediately when the admin really wants (the pre-heats moved
> > > most of the pages and keep doing so).
> > > 
> > > It'll be also similar to what Avihai proposed here on switchover-ack, just
> > > an ack mechanism on the src side:
> > > 
> > > https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com
> > 
> > In general I strongly wanted to avoid adding special logic to the
> > migration code that makes it directly synchronize with the  test
> > suite, because once we do that I don't think the test suite is a
> > providing coverage of the real world usage scenario.
> 
> The problem is non-live is already not real world usage in most cases.  It
> seems we all agreed that it's the code paths to cover not real world usages
> in the tests, or maybe not?

The cases that i made run non-live are exercising code paths that
are only relevant during migration setup, so the non-live vs live
status is irrelevant to the code coverage attained for those tests.
Other remaining live tests give sufficient coverage for the live
scenario
 
> > IOW, if we add a switchover-ack feature, we should certainly have
> > *a* test that uses it, but we should not modify other tests because
> > we want to exercise the logic as it would run with apps that don't
> > rely on switchover-ack.
> > 
> > Also, this slow migration test is incredibly painful for people right
> > now, so I'd like to see us get a speed up committed to git quickly.
> > I don't want to block it on a feature proposal that might yet take
> > months to merge.
> 
> That'll be trivial, afaict.
> 
> I just worry that this patch will bring complexity to the test cases,
> that's another burden we need to carry besides QEMU itself.
> 
> If you want, I can try to prepare such a patch before this weekend, and if
> it's complicated enough and take more than next week to review feel free to
> go ahead with this one.
> 
> I understand the migration test issue was there for a long time.  But still
> to me it's important on which may be cleaner for the long term too.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 16:35             ` Daniel P. Berrangé
@ 2023-06-01 16:59               ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2023-06-01 16:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 05:35:07PM +0100, Daniel P. Berrangé wrote:
> If non-live the guest won't have dirtied any pages, so I wasn't
> confident there would be sufficient amount of dirty ram to send
> to trigger this. Possibly that's being too paranoid though

I think it makes sense to keep compression tests in white list, as I
mentioned the compressor does have issue with buffer being modified during
compressing.  IIRC we used to hit that, so live test makes sense even for
code path coverage.  See comment in do_compress_ram_page(), and also
34ab9e9743 ("migration: detect compression and decompression errors").

If we want to start doing this, imho we should make it strict and any live
use case needs to be very well justifed or otherwise should be non-live in
the qest, so we don't easily go back to square 1.

I see that in the latest post that part is still missing.  I can do on top
to prepare a patch to document why each use cases need live, and emphasize
that.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 16:36         ` Daniel P. Berrangé
@ 2023-06-01 17:04           ` Peter Xu
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Xu @ 2023-06-01 17:04 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Laurent Vivier, Juan Quintela, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 05:36:42PM +0100, Daniel P. Berrangé wrote:
> > The problem is non-live is already not real world usage in most cases.  It
> > seems we all agreed that it's the code paths to cover not real world usages
> > in the tests, or maybe not?
> 
> The cases that i made run non-live are exercising code paths that
> are only relevant during migration setup, so the non-live vs live
> status is irrelevant to the code coverage attained for those tests.
> Other remaining live tests give sufficient coverage for the live
> scenario

IMHO actually I think it's good we discuss about the goal of target of what
qtest plays a role here, hitting the issue of over-used CI and test running
too long.

For migration tests I'm not sure how Juan sees this I'd say it's good
enough to make it "just to cover code paths as much as possible".

We're never covering real use cases anyway, IMHO, if we start with 128M VM
running sequential dirty workload only.  We still need to rely on real
migration tests carried out elsewhere I believe.

So I'd say we're fine to leverage a new parameter especially if the
parameter is simple enough (for logical change I only expect a few LOCs
with switchover-hold) to help us achieve it.

Let's keep that on top anyway, no matter what.  Thanks.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 15:55         ` Daniel P. Berrangé
  2023-06-01 16:17           ` Peter Xu
@ 2023-06-01 22:55           ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 22:55 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, qemu-devel, Laurent Vivier, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
>> On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
>> > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
>> > > Thanks for looking into this.. definitely worthwhile.
>> > > 
>> > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
>> > > > There are 27 pre-copy live migration scenarios being tested. In all of
>> > > > these we force non-convergance and run for one iteration, then let it
>> > > > converge and wait for completion during the second (or following)
>> > > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
>> > > > long time (~30 seconds).
>> > > > 
>> > > > While it is important to test the migration passes and convergance
>> > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
>> > > > TLS migration scenarios in particular are merely exercising different
>> > > > code paths during connection establishment.
>> > > > 
>> > > > To optimize time taken, switch most of the test scenarios to run
>> > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
>> > > > a massive speed up for most of the test scenarios.
>> > > > 
>> > > > For test coverage the following scenarios are unchanged
>> > > 
>> > > Curious how are below chosen?  I assume..
>> > 
>> > Chosen based on whether they exercise code paths that are unique
>> > and interesting during the RAM transfer phase.
>> > 
>> > Essentially the goal is that if we have N% code coverage before this
>> > patch, then we should still have the same N% code coverage after this
>> > patch.
>> > 
>> > The TLS tests exercise code paths that are unique during the migration
>> > establishment phase. Once establishd they don't exercise anything
>> > "interesting" during RAM transfer phase. Thus we don't loose code coverage
>> > by runing TLS tests non-live.
>> > 
>> > > 
>> > > > 
>> > > >  * Precopy with UNIX sockets
>> > > 
>> > > this one verifies dirty log.
>> > > 
>> > > >  * Precopy with UNIX sockets and dirty ring tracking
>> > > 
>> > > ... dirty ring...
>> > > 
>> > > >  * Precopy with XBZRLE
>> > > 
>> > > ... xbzrle I think needs a diff on old/new, makes sense.
>> > > 
>> > > >  * Precopy with UNIX compress
>> > > >  * Precopy with UNIX compress (nowait)
>> > > >  * Precopy with multifd
>> > > 
>> > > What about the rest three?  Especially for two compression tests.
>> > 
>> > The compress thread logic is unique/interesting during RAM transfer
>> > so benefits from running live. The wait vs non-wait scenario tests
>> > a distinct codepath/logic.
>> 
>> I assume you mean e.g. when compressing with guest page being modified and
>> we should survive that rather than crashing the compressor?
>
> No, i mean the compression code has a significant behaviour difference
> between its two tests, because they toggle:
>
>  @compress-wait-thread: Controls behavior when all compression
>      threads are currently busy.  If true (default), wait for a free
>      compression thread to become available; otherwise, send the page
>      uncompressed.  (Since 3.1)
>
> so we need to exercise the code path that falls back to sending
> uncompressed, as well as the code path that waits for free threads.

It don't work.
I think that I am going to just drop it for this iteration.

I tried 2 or 3 years ago to get a test to run to compression -> was not
able to get it to work.

Moved compression on top of multifd, much, much faster and much cleaner
(each compression method is around 50 lines of code).

Lukas tried this time and he was not able to get it working either.

So I have no hope at all for this code.

To add insult to injury, it copies things so many times that is just not
worthy.

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-06-01 16:17           ` Peter Xu
  2023-06-01 16:35             ` Daniel P. Berrangé
@ 2023-06-01 22:58             ` Juan Quintela
  1 sibling, 0 replies; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 22:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	qemu-devel, Laurent Vivier, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Peter Xu <peterx@redhat.com> wrote:
> On Thu, Jun 01, 2023 at 04:55:25PM +0100, Daniel P. Berrangé wrote:
>> On Thu, Jun 01, 2023 at 11:53:17AM -0400, Peter Xu wrote:
>> > On Thu, Jun 01, 2023 at 04:39:48PM +0100, Daniel P. Berrangé wrote:
>> > > On Thu, Jun 01, 2023 at 11:30:10AM -0400, Peter Xu wrote:
>> > > > Thanks for looking into this.. definitely worthwhile.
>> > > > 
>> > > > On Wed, May 31, 2023 at 02:23:59PM +0100, Daniel P. Berrangé wrote:
>> > > > > There are 27 pre-copy live migration scenarios being tested. In all of
>> > > > > these we force non-convergance and run for one iteration, then let it
>> > > > > converge and wait for completion during the second (or following)
>> > > > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
>> > > > > long time (~30 seconds).
>> > > > > 
>> > > > > While it is important to test the migration passes and convergance
>> > > > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
>> > > > > TLS migration scenarios in particular are merely exercising different
>> > > > > code paths during connection establishment.
>> > > > > 
>> > > > > To optimize time taken, switch most of the test scenarios to run
>> > > > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
>> > > > > a massive speed up for most of the test scenarios.
>> > > > > 
>> > > > > For test coverage the following scenarios are unchanged
>> > > > 
>> > > > Curious how are below chosen?  I assume..
>> > > 
>> > > Chosen based on whether they exercise code paths that are unique
>> > > and interesting during the RAM transfer phase.
>> > > 
>> > > Essentially the goal is that if we have N% code coverage before this
>> > > patch, then we should still have the same N% code coverage after this
>> > > patch.
>> > > 
>> > > The TLS tests exercise code paths that are unique during the migration
>> > > establishment phase. Once establishd they don't exercise anything
>> > > "interesting" during RAM transfer phase. Thus we don't loose code coverage
>> > > by runing TLS tests non-live.
>> > > 
>> > > > 
>> > > > > 
>> > > > >  * Precopy with UNIX sockets
>> > > > 
>> > > > this one verifies dirty log.
>> > > > 
>> > > > >  * Precopy with UNIX sockets and dirty ring tracking
>> > > > 
>> > > > ... dirty ring...
>> > > > 
>> > > > >  * Precopy with XBZRLE
>> > > > 
>> > > > ... xbzrle I think needs a diff on old/new, makes sense.
>> > > > 
>> > > > >  * Precopy with UNIX compress
>> > > > >  * Precopy with UNIX compress (nowait)
>> > > > >  * Precopy with multifd
>> > > > 
>> > > > What about the rest three?  Especially for two compression tests.
>> > > 
>> > > The compress thread logic is unique/interesting during RAM transfer
>> > > so benefits from running live. The wait vs non-wait scenario tests
>> > > a distinct codepath/logic.
>> > 
>> > I assume you mean e.g. when compressing with guest page being modified and
>> > we should survive that rather than crashing the compressor?
>> 
>> No, i mean the compression code has a significant behaviour difference
>> between its two tests, because they toggle:
>> 
>>  @compress-wait-thread: Controls behavior when all compression
>>      threads are currently busy.  If true (default), wait for a free
>>      compression thread to become available; otherwise, send the page
>>      uncompressed.  (Since 3.1)
>> 
>> so we need to exercise the code path that falls back to sending
>> uncompressed, as well as the code path that waits for free threads.
>
> But then the question is why live is needed?
>
> IIUC whether the wait thing triggers have nothing directly related to VM is
> live or not, but whether all compress thread busy.  IOW, IIUC all compress
> paths will be tested even if non-live as long as we feed enough pages to
> the compressor threads.

It is even wrong.

We didn't fix this for compression:

commit 007e179ef0e97eafda4c9ff2a9d665a1947c7c6d
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Tue Jul 5 22:35:59 2022 +0200

    multifd: Copy pages before compressing them with zlib
    
    zlib_send_prepare() compresses pages of a running VM. zlib does not
    make any thread-safety guarantees with respect to changing deflate()
    input concurrently with deflate() [1].


Not that anyone is going to use any accelerator to run zlib when we are
compression just 4k.

Intel AT engine had to also move to 64 pages at a time to make it a
difference.  As said, I can't think of a single scenary where
compression is a good option.

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 15:46   ` Peter Xu
  2023-06-01 16:05     ` Daniel P. Berrangé
@ 2023-06-01 23:00     ` Juan Quintela
  2023-06-01 23:43       ` Peter Xu
  1 sibling, 1 reply; 57+ messages in thread
From: Juan Quintela @ 2023-06-01 23:00 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé,
	qemu-devel, Laurent Vivier, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

Peter Xu <peterx@redhat.com> wrote:
> On Wed, May 31, 2023 at 02:24:00PM +0100, Daniel P. Berrangé wrote:
>> The migration test cases that actually exercise live migration want to
>> ensure there is a minimum of two iterations of pre-copy, in order to
>> exercise the dirty tracking code.
>> 
>> Historically we've queried the migration status, looking for the
>> 'dirty-sync-count' value to increment to track iterations. This was
>> not entirely reliable because often all the data would get transferred
>> quickly enough that the migration would finish before we wanted it
>> to. So we massively dropped the bandwidth and max downtime to
>> guarantee non-convergance. This had the unfortunate side effect
>> that every migration took at least 30 seconds to run (100 MB of
>> dirty pages / 3 MB/sec).
>> 
>> This optimization takes a different approach to ensuring that a
>> mimimum of two iterations. Rather than waiting for dirty-sync-count
>> to increment, directly look for an indication that the source VM
>> has dirtied RAM that has already been transferred.
>> 
>> On the source VM a magic marker is written just after the 3 MB
>> offset. The destination VM is now montiored to detect when the
>> magic marker is transferred. This gives a guarantee that the
>> first 3 MB of memory have been transferred. Now the source VM
>> memory is monitored at exactly the 3MB offset until we observe
>> a flip in its value. This gives us a guaranteed that the guest
>> workload has dirtied a byte that has already been transferred.
>> 
>> Since we're looking at a place that is only 3 MB from the start
>> of memory, with the 3 MB/sec bandwidth, this test should complete
>> in 1 second, instead of 30 seconds.
>> 
>> Once we've proved there is some dirty memory, migration can be
>> set back to full speed for the remainder of the 1st iteration,
>> and the entire of the second iteration at which point migration
>> should be complete.
>> 
>> On a test machine this further reduces the migration test time
>> from 8 minutes to 1 minute 40.
>
> The outcome is definitely nice, but it does looks slightly hacky to me and
> make the test slightly more complicated.
>
> If it's all about making sure we finish the 1st iteration, can we simply
> add a src qemu parameter "switchover-hold"?  If it's set, src never
> switchover to dst but keeps the iterations.
>
> Then migrate_ensure_non_converge() will be as simple as setting
> switchover-hold to true.
>
> I am even thinking whether there can even be real-life use case for that,
> e.g., where a user might want to have a pre-heat of a migration of some VM,
> and trigger it immediately when the admin really wants (the pre-heats moved
> most of the pages and keep doing so).
>
> It'll be also similar to what Avihai proposed here on switchover-ack, just
> an ack mechanism on the src side:
>
> https://lore.kernel.org/r/20230530144821.1557-3-avihaih@nvidia.com

That was basically my idea and that is why I am holding the last two
patches and see if I can came with something in the next couple of days.

Later, Juan.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 23:00     ` Juan Quintela
@ 2023-06-01 23:43       ` Peter Xu
  2023-07-10  9:35         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Xu @ 2023-06-01 23:43 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Daniel P. Berrangé,
	qemu-devel, Laurent Vivier, Leonardo Bras, Thomas Huth,
	Paolo Bonzini

On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote:
> That was basically my idea and that is why I am holding the last two
> patches and see if I can came with something in the next couple of days.

Ah! ...

If you haven't started, please hold off for one day.  I'll see whether I
can post mine tomorrow.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-06-01 23:43       ` Peter Xu
@ 2023-07-10  9:35         ` Daniel P. Berrangé
  2023-07-10  9:40           ` Thomas Huth
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-07-10  9:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, qemu-devel, Laurent Vivier, Leonardo Bras,
	Thomas Huth, Paolo Bonzini

On Thu, Jun 01, 2023 at 07:43:43PM -0400, Peter Xu wrote:
> On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote:
> > That was basically my idea and that is why I am holding the last two
> > patches and see if I can came with something in the next couple of days.
> 
> Ah! ...
> 
> If you haven't started, please hold off for one day.  I'll see whether I
> can post mine tomorrow.

5 weeks later, and we've still not got a fix merged, and it is soft
freeze tomorrow.

Since this patch works, could we please consider merging this now. It is
a simple revert at a later once when Peter's desired better solution is
finally ready, unless Peter's series can be submitted for soft freeze
merge today ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 9/9] tests/qtest: massively speed up migration-test
  2023-07-10  9:35         ` Daniel P. Berrangé
@ 2023-07-10  9:40           ` Thomas Huth
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Huth @ 2023-07-10  9:40 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Xu
  Cc: Juan Quintela, qemu-devel, Laurent Vivier, Leonardo Bras, Paolo Bonzini

On 10/07/2023 11.35, Daniel P. Berrangé wrote:
> On Thu, Jun 01, 2023 at 07:43:43PM -0400, Peter Xu wrote:
>> On Fri, Jun 02, 2023 at 01:00:10AM +0200, Juan Quintela wrote:
>>> That was basically my idea and that is why I am holding the last two
>>> patches and see if I can came with something in the next couple of days.
>>
>> Ah! ...
>>
>> If you haven't started, please hold off for one day.  I'll see whether I
>> can post mine tomorrow.
> 
> 5 weeks later, and we've still not got a fix merged, and it is soft
> freeze tomorrow.
> 
> Since this patch works, could we please consider merging this now. It is
> a simple revert at a later once when Peter's desired better solution is
> finally ready, unless Peter's series can be submitted for soft freeze
> merge today ?

I agree, I'll queue this qtest patch here for my PR-before-softfreeze now - 
if Peter's work is ready to get merged via the migration tree, this patch 
here can be easily reverted again.

  Thomas



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2023-07-10  9:41 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 13:23 [PATCH v3 0/9] tests/qtest: make migration-test massively faster Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 1/9] tests/qtest: add various qtest_qmp_assert_success() variants Daniel P. Berrangé
2023-06-01  9:23   ` Thomas Huth
2023-06-01 12:48     ` Daniel P. Berrangé
2023-06-01 12:04   ` Juan Quintela
2023-06-01 12:20   ` Juan Quintela
2023-06-01 12:51     ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 2/9] tests/qtest: add support for callback to receive QMP events Daniel P. Berrangé
2023-05-31 14:57   ` Thomas Huth
2023-06-01 12:14   ` Juan Quintela
2023-06-01 12:56     ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 3/9] tests/qtest: get rid of 'qmp_command' helper in migration test Daniel P. Berrangé
2023-06-01  9:26   ` Thomas Huth
2023-06-01  9:32     ` Daniel P. Berrangé
2023-06-01 12:17   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 4/9] tests/qtest: get rid of some 'qtest_qmp' usage " Daniel P. Berrangé
2023-06-01  9:28   ` Thomas Huth
2023-06-01 12:10   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 5/9] tests/qtest: switch to using event callbacks for STOP event Daniel P. Berrangé
2023-06-01  9:31   ` Thomas Huth
2023-06-01 12:23   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 6/9] tests/qtest: replace wait_command() with qtest_qmp_assert_success Daniel P. Berrangé
2023-06-01  9:37   ` Thomas Huth
2023-06-01 12:27   ` Juan Quintela
2023-05-31 13:23 ` [PATCH v3 7/9] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
2023-06-01  9:38   ` Thomas Huth
2023-06-01 12:31   ` Juan Quintela
2023-06-01 12:34     ` Daniel P. Berrangé
2023-06-01 12:37       ` Juan Quintela
2023-06-01 12:44         ` Daniel P. Berrangé
2023-05-31 13:23 ` [PATCH v3 8/9] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
2023-06-01  9:47   ` Thomas Huth
2023-06-01 12:33   ` Juan Quintela
2023-06-01 12:38     ` Daniel P. Berrangé
2023-06-01 16:09     ` Thomas Huth
2023-06-01 16:17       ` Daniel P. Berrangé
2023-06-01 16:26         ` Peter Xu
2023-06-01 15:30   ` Peter Xu
2023-06-01 15:39     ` Daniel P. Berrangé
2023-06-01 15:53       ` Peter Xu
2023-06-01 15:55         ` Daniel P. Berrangé
2023-06-01 16:17           ` Peter Xu
2023-06-01 16:35             ` Daniel P. Berrangé
2023-06-01 16:59               ` Peter Xu
2023-06-01 22:58             ` Juan Quintela
2023-06-01 22:55           ` Juan Quintela
2023-05-31 13:24 ` [PATCH v3 9/9] tests/qtest: massively speed up migration-test Daniel P. Berrangé
2023-06-01 10:04   ` Thomas Huth
2023-06-01 15:46   ` Peter Xu
2023-06-01 16:05     ` Daniel P. Berrangé
2023-06-01 16:22       ` Peter Xu
2023-06-01 16:36         ` Daniel P. Berrangé
2023-06-01 17:04           ` Peter Xu
2023-06-01 23:00     ` Juan Quintela
2023-06-01 23:43       ` Peter Xu
2023-07-10  9:35         ` Daniel P. Berrangé
2023-07-10  9:40           ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.