All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL 00/18] migration queue
@ 2022-03-02 18:29 Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 01/18] clock-vmstate: Add missing END_OF_LIST Dr. David Alan Gilbert (git)
                   ` (18 more replies)
  0 siblings, 19 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:

  Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b

for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:

  migration: Remove load_state_old and minimum_version_id_old (2022-03-02 18:20:45 +0000)

----------------------------------------------------------------
Migration/HMP/Virtio pull 2022-03-02

A bit of a mix this time:
  * Minor fixes from myself, Hanna, and Jack
  * VNC password rework by Stefan and Fabian
  * Postcopy changes from Peter X that are
    the start of a larger series to come
  * Removing the prehistoic load_state_old
    code from Peter M

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

----------------------------------------------------------------
Dr. David Alan Gilbert (1):
      clock-vmstate: Add missing END_OF_LIST

Hanna Reitz (1):
      virtiofsd: Let meson check for statx.stx_mnt_id

Jack Wang (1):
      migration/rdma: set the REUSEADDR option for destination

Peter Maydell (1):
      migration: Remove load_state_old and minimum_version_id_old

Peter Xu (11):
      migration: Dump sub-cmd name in loadvm_process_command tp
      migration: Finer grained tracepoints for POSTCOPY_LISTEN
      migration: Tracepoint change in postcopy-run bottom half
      migration: Introduce postcopy channels on dest node
      migration: Dump ramblock and offset too when non-same-page detected
      migration: Add postcopy_thread_create()
      migration: Move static var in ram_block_from_stream() into global
      migration: Enlarge postcopy recovery to capture !-EIO too
      migration: postcopy_pause_fault_thread() never fails
      migration: Add migration_incoming_transport_cleanup()
      tests: Pass in MigrateStart** into test_migrate_start()

Stefan Reiter (3):
      monitor/hmp: add support for flag argument with value
      qapi/monitor: refactor set/expire_password with enums
      qapi/monitor: allow VNC display id in set/expire_password

 docs/devel/migration.rst         |  12 +---
 hmp-commands.hx                  |  24 ++++----
 hw/core/clock-vmstate.c          |   1 +
 hw/ssi/xlnx-versal-ospi.c        |   1 -
 include/migration/vmstate.h      |   2 -
 meson.build                      |  13 +++++
 migration/migration.c            |  26 +++++----
 migration/migration.h            |  48 ++++++++++++++--
 migration/postcopy-ram.c         | 108 ++++++++++++++++++++++-------------
 migration/postcopy-ram.h         |   4 ++
 migration/ram.c                  |  64 +++++++++++----------
 migration/rdma.c                 |   7 +++
 migration/savevm.c               |  46 ++++++++++-----
 migration/trace-events           |   7 +--
 migration/vmstate.c              |   6 --
 monitor/hmp-cmds.c               |  47 ++++++++++++++-
 monitor/hmp.c                    |  19 ++++++-
 monitor/monitor-internal.h       |   3 +-
 monitor/qmp-cmds.c               |  49 +++++-----------
 qapi/ui.json                     | 120 +++++++++++++++++++++++++++++++++------
 tests/qtest/migration-test.c     |  27 +++++----
 tools/virtiofsd/passthrough_ll.c |   2 +-
 22 files changed, 435 insertions(+), 201 deletions(-)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PULL 01/18] clock-vmstate: Add missing END_OF_LIST
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 02/18] virtiofsd: Let meson check for statx.stx_mnt_id Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add the missing VMSTATE_END_OF_LIST to vmstate_muldiv

Fixes: 99abcbc7600 ("clock: Provide builtin multiplier/divider")
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220111101934.115028-1-dgilbert@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luc Michel <luc@lmichel.fr>
Cc: qemu-stable@nongnu.org
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/core/clock-vmstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/core/clock-vmstate.c b/hw/core/clock-vmstate.c
index 9d9174ffbd..7eccb6d4ea 100644
--- a/hw/core/clock-vmstate.c
+++ b/hw/core/clock-vmstate.c
@@ -44,6 +44,7 @@ const VMStateDescription vmstate_muldiv = {
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(multiplier, Clock),
         VMSTATE_UINT32(divider, Clock),
+        VMSTATE_END_OF_LIST()
     },
 };
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 02/18] virtiofsd: Let meson check for statx.stx_mnt_id
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 01/18] clock-vmstate: Add missing END_OF_LIST Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 03/18] monitor/hmp: add support for flag argument with value Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Hanna Reitz <hreitz@redhat.com>

In virtiofsd, we assume that the presence of the STATX_MNT_ID macro
implies existence of the statx.stx_mnt_id field.  Unfortunately, that is
not necessarily the case: glibc has introduced the macro in its commit
88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field
is still missing from its own headers.

Let meson.build actually chek for both STATX_MNT_ID and
statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present.
Then, use this config macro in virtiofsd.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/882
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220223092340.9043-1-hreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 meson.build                      | 13 +++++++++++++
 tools/virtiofsd/passthrough_ll.c |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 8df40bfac4..a5b63e62cd 100644
--- a/meson.build
+++ b/meson.build
@@ -1306,6 +1306,18 @@ statx_test = gnu_source_prefix + '''
 
 has_statx = cc.links(statx_test)
 
+# Check whether statx() provides mount ID information
+
+statx_mnt_id_test = gnu_source_prefix + '''
+  #include <sys/stat.h>
+  int main(void) {
+    struct statx statxbuf;
+    statx(0, "", 0, STATX_BASIC_STATS | STATX_MNT_ID, &statxbuf);
+    return statxbuf.stx_mnt_id;
+  }'''
+
+has_statx_mnt_id = cc.links(statx_mnt_id_test)
+
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
   .require(targetos == 'linux',
            error_message: 'vhost_user_blk_server requires linux') \
@@ -1553,6 +1565,7 @@ config_host_data.set('CONFIG_NETTLE', nettle.found())
 config_host_data.set('CONFIG_QEMU_PRIVATE_XTS', xts == 'private')
 config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
 config_host_data.set('CONFIG_STATX', has_statx)
+config_host_data.set('CONFIG_STATX_MNT_ID', has_statx_mnt_id)
 config_host_data.set('CONFIG_ZSTD', zstd.found())
 config_host_data.set('CONFIG_FUSE', fuse.found())
 config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index dfa2fc250d..028dacdd8f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1039,7 +1039,7 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
 {
     int res;
 
-#if defined(CONFIG_STATX) && defined(STATX_MNT_ID)
+#if defined(CONFIG_STATX) && defined(CONFIG_STATX_MNT_ID)
     if (lo->use_statx) {
         struct statx statxbuf;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 03/18] monitor/hmp: add support for flag argument with value
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 01/18] clock-vmstate: Add missing END_OF_LIST Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 02/18] virtiofsd: Let meson check for statx.stx_mnt_id Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 04/18] qapi/monitor: refactor set/expire_password with enums Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Stefan Reiter <s.reiter@proxmox.com>

Adds support for the "-xs" parameter type, where "-x" denotes a flag
name and the "s" suffix indicates that this flag is supposed to take
an arbitrary string parameter.

These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FE: fixed typo pointed out by Eric Blake
     use s instead of V to indicate string parameter]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-Id: <20220225084949.35746-2-f.ebner@proxmox.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 monitor/hmp.c              | 19 ++++++++++++++++++-
 monitor/monitor-internal.h |  3 ++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index b20737e63c..569066036d 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -981,6 +981,7 @@ static QDict *monitor_parse_arguments(Monitor *mon,
             {
                 const char *tmp = p;
                 int skip_key = 0;
+                int ret;
                 /* option */
 
                 c = *typestr++;
@@ -1003,11 +1004,27 @@ static QDict *monitor_parse_arguments(Monitor *mon,
                     }
                     if (skip_key) {
                         p = tmp;
+                    } else if (*typestr == 's') {
+                        /* has option with string value */
+                        typestr++;
+                        tmp = p++;
+                        while (qemu_isspace(*p)) {
+                            p++;
+                        }
+                        ret = get_str(buf, sizeof(buf), &p);
+                        if (ret < 0) {
+                            monitor_printf(mon, "%s: value expected for -%c\n",
+                                           cmd->name, *tmp);
+                            goto fail;
+                        }
+                        qdict_put_str(qdict, key, buf);
                     } else {
-                        /* has option */
+                        /* has boolean option */
                         p++;
                         qdict_put_bool(qdict, key, true);
                     }
+                } else if (*typestr == 's') {
+                    typestr++;
                 }
             }
             break;
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index 3da3f86c6a..caa2e90ef2 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -63,7 +63,8 @@
  * '.'          other form of optional type (for 'i' and 'l')
  * 'b'          boolean
  *              user mode accepts "on" or "off"
- * '-'          optional parameter (eg. '-f')
+ * '-'          optional parameter (eg. '-f'); if followed by a 's', it
+ *              specifies an optional string param (e.g. '-fs' allows '-f foo')
  *
  */
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 04/18] qapi/monitor: refactor set/expire_password with enums
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 03/18] monitor/hmp: add support for flag argument with value Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 05/18] qapi/monitor: allow VNC display id in set/expire_password Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Stefan Reiter <s.reiter@proxmox.com>

'protocol' and 'connected' are better suited as enums than as strings,
make use of that. No functional change intended.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FE: update "Since: " from 6.2 to 7.0
     put 'keep' first in enum to ease use as a default]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-Id: <20220225084949.35746-3-f.ebner@proxmox.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 monitor/hmp-cmds.c | 29 +++++++++++++++++++++++++++--
 monitor/qmp-cmds.c | 37 ++++++++++++-------------------------
 qapi/ui.json       | 36 ++++++++++++++++++++++++++++++++++--
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 8c384dc1b2..ff78741b75 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1398,8 +1398,24 @@ void hmp_set_password(Monitor *mon, const QDict *qdict)
     const char *password  = qdict_get_str(qdict, "password");
     const char *connected = qdict_get_try_str(qdict, "connected");
     Error *err = NULL;
+    DisplayProtocol proto;
+    SetPasswordAction conn;
 
-    qmp_set_password(protocol, password, !!connected, connected, &err);
+    proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+                            DISPLAY_PROTOCOL_VNC, &err);
+    if (err) {
+        goto out;
+    }
+
+    conn = qapi_enum_parse(&SetPasswordAction_lookup, connected,
+                           SET_PASSWORD_ACTION_KEEP, &err);
+    if (err) {
+        goto out;
+    }
+
+    qmp_set_password(proto, password, !!connected, conn, &err);
+
+out:
     hmp_handle_error(mon, err);
 }
 
@@ -1408,8 +1424,17 @@ void hmp_expire_password(Monitor *mon, const QDict *qdict)
     const char *protocol  = qdict_get_str(qdict, "protocol");
     const char *whenstr = qdict_get_str(qdict, "time");
     Error *err = NULL;
+    DisplayProtocol proto;
+
+    proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+                            DISPLAY_PROTOCOL_VNC, &err);
+    if (err) {
+        goto out;
+    }
 
-    qmp_expire_password(protocol, whenstr, &err);
+    qmp_expire_password(proto, whenstr, &err);
+
+out:
     hmp_handle_error(mon, err);
 }
 
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index db4d186448..b6e8b57fcc 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -168,33 +168,27 @@ void qmp_system_wakeup(Error **errp)
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, errp);
 }
 
-void qmp_set_password(const char *protocol, const char *password,
-                      bool has_connected, const char *connected, Error **errp)
+void qmp_set_password(DisplayProtocol protocol, const char *password,
+                      bool has_connected, SetPasswordAction connected,
+                      Error **errp)
 {
     int disconnect_if_connected = 0;
     int fail_if_connected = 0;
     int rc;
 
     if (has_connected) {
-        if (strcmp(connected, "fail") == 0) {
-            fail_if_connected = 1;
-        } else if (strcmp(connected, "disconnect") == 0) {
-            disconnect_if_connected = 1;
-        } else if (strcmp(connected, "keep") == 0) {
-            /* nothing */
-        } else {
-            error_setg(errp, QERR_INVALID_PARAMETER, "connected");
-            return;
-        }
+        fail_if_connected = connected == SET_PASSWORD_ACTION_FAIL;
+        disconnect_if_connected = connected == SET_PASSWORD_ACTION_DISCONNECT;
     }
 
-    if (strcmp(protocol, "spice") == 0) {
+    if (protocol == DISPLAY_PROTOCOL_SPICE) {
         if (!qemu_using_spice(errp)) {
             return;
         }
         rc = qemu_spice.set_passwd(password, fail_if_connected,
                                    disconnect_if_connected);
-    } else if (strcmp(protocol, "vnc") == 0) {
+    } else {
+        assert(protocol == DISPLAY_PROTOCOL_VNC);
         if (fail_if_connected || disconnect_if_connected) {
             /* vnc supports "connected=keep" only */
             error_setg(errp, QERR_INVALID_PARAMETER, "connected");
@@ -203,10 +197,6 @@ void qmp_set_password(const char *protocol, const char *password,
         /* Note that setting an empty password will not disable login through
          * this interface. */
         rc = vnc_display_password(NULL, password);
-    } else {
-        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "protocol",
-                   "'vnc' or 'spice'");
-        return;
     }
 
     if (rc != 0) {
@@ -214,7 +204,7 @@ void qmp_set_password(const char *protocol, const char *password,
     }
 }
 
-void qmp_expire_password(const char *protocol, const char *whenstr,
+void qmp_expire_password(DisplayProtocol protocol, const char *whenstr,
                          Error **errp)
 {
     time_t when;
@@ -230,17 +220,14 @@ void qmp_expire_password(const char *protocol, const char *whenstr,
         when = strtoull(whenstr, NULL, 10);
     }
 
-    if (strcmp(protocol, "spice") == 0) {
+    if (protocol == DISPLAY_PROTOCOL_SPICE) {
         if (!qemu_using_spice(errp)) {
             return;
         }
         rc = qemu_spice.set_pw_expire(when);
-    } else if (strcmp(protocol, "vnc") == 0) {
-        rc = vnc_display_pw_expire(NULL, when);
     } else {
-        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "protocol",
-                   "'vnc' or 'spice'");
-        return;
+        assert(protocol == DISPLAY_PROTOCOL_VNC);
+        rc = vnc_display_pw_expire(NULL, when);
     }
 
     if (rc != 0) {
diff --git a/qapi/ui.json b/qapi/ui.json
index 9354f4c467..e112409211 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -9,6 +9,34 @@
 { 'include': 'common.json' }
 { 'include': 'sockets.json' }
 
+##
+# @DisplayProtocol:
+#
+# Display protocols which support changing password options.
+#
+# Since: 7.0
+#
+##
+{ 'enum': 'DisplayProtocol',
+  'data': [ 'vnc', 'spice' ] }
+
+##
+# @SetPasswordAction:
+#
+# An action to take on changing a password on a connection with active clients.
+#
+# @keep: maintain existing clients
+#
+# @fail: fail the command if clients are connected
+#
+# @disconnect: disconnect existing clients
+#
+# Since: 7.0
+#
+##
+{ 'enum': 'SetPasswordAction',
+  'data': [ 'keep', 'fail', 'disconnect' ] }
+
 ##
 # @set_password:
 #
@@ -38,7 +66,9 @@
 #
 ##
 { 'command': 'set_password',
-  'data': {'protocol': 'str', 'password': 'str', '*connected': 'str'} }
+  'data': { 'protocol': 'DisplayProtocol',
+            'password': 'str',
+            '*connected': 'SetPasswordAction' } }
 
 ##
 # @expire_password:
@@ -71,7 +101,9 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'expire_password', 'data': {'protocol': 'str', 'time': 'str'} }
+{ 'command': 'expire_password',
+  'data': { 'protocol': 'DisplayProtocol',
+            'time': 'str' } }
 
 ##
 # @screendump:
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 05/18] qapi/monitor: allow VNC display id in set/expire_password
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 04/18] qapi/monitor: refactor set/expire_password with enums Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 06/18] migration/rdma: set the REUSEADDR option for destination Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Stefan Reiter <s.reiter@proxmox.com>

It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...

It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.

For HMP, the display is specified using the "-d" value flag.

For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FE: update "Since: " from 6.2 to 7.0
     make @connected a common member of @SetPasswordOptions]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-Id: <20220225084949.35746-4-f.ebner@proxmox.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hmp-commands.hx    | 24 ++++++------
 monitor/hmp-cmds.c | 40 +++++++++++++------
 monitor/qmp-cmds.c | 34 +++++++---------
 qapi/ui.json       | 96 +++++++++++++++++++++++++++++++++++-----------
 4 files changed, 129 insertions(+), 65 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 70a9136ac2..8476277aa9 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1514,33 +1514,35 @@ ERST
 
     {
         .name       = "set_password",
-        .args_type  = "protocol:s,password:s,connected:s?",
-        .params     = "protocol password action-if-connected",
+        .args_type  = "protocol:s,password:s,display:-ds,connected:s?",
+        .params     = "protocol password [-d display] [action-if-connected]",
         .help       = "set spice/vnc password",
         .cmd        = hmp_set_password,
     },
 
 SRST
-``set_password [ vnc | spice ] password [ action-if-connected ]``
-  Change spice/vnc password.  *action-if-connected* specifies what
-  should happen in case a connection is established: *fail* makes the
-  password change fail.  *disconnect* changes the password and
+``set_password [ vnc | spice ] password [ -d display ] [ action-if-connected ]``
+  Change spice/vnc password.  *display* can be used with 'vnc' to specify
+  which display to set the password on.  *action-if-connected* specifies
+  what should happen in case a connection is established: *fail* makes
+  the password change fail.  *disconnect* changes the password and
   disconnects the client.  *keep* changes the password and keeps the
   connection up.  *keep* is the default.
 ERST
 
     {
         .name       = "expire_password",
-        .args_type  = "protocol:s,time:s",
-        .params     = "protocol time",
+        .args_type  = "protocol:s,time:s,display:-ds",
+        .params     = "protocol time [-d display]",
         .help       = "set spice/vnc password expire-time",
         .cmd        = hmp_expire_password,
     },
 
 SRST
-``expire_password [ vnc | spice ]`` *expire-time*
-  Specify when a password for spice/vnc becomes
-  invalid. *expire-time* accepts:
+``expire_password [ vnc | spice ] expire-time [ -d display ]``
+  Specify when a password for spice/vnc becomes invalid.
+  *display* behaves the same as in ``set_password``.
+  *expire-time* accepts:
 
   ``now``
     Invalidate password instantly.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ff78741b75..634968498b 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1396,24 +1396,33 @@ void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
     const char *password  = qdict_get_str(qdict, "password");
+    const char *display = qdict_get_try_str(qdict, "display");
     const char *connected = qdict_get_try_str(qdict, "connected");
     Error *err = NULL;
-    DisplayProtocol proto;
-    SetPasswordAction conn;
 
-    proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
-                            DISPLAY_PROTOCOL_VNC, &err);
+    SetPasswordOptions opts = {
+        .password = (char *)password,
+        .has_connected = !!connected,
+    };
+
+    opts.connected = qapi_enum_parse(&SetPasswordAction_lookup, connected,
+                                     SET_PASSWORD_ACTION_KEEP, &err);
     if (err) {
         goto out;
     }
 
-    conn = qapi_enum_parse(&SetPasswordAction_lookup, connected,
-                           SET_PASSWORD_ACTION_KEEP, &err);
+    opts.protocol = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+                                    DISPLAY_PROTOCOL_VNC, &err);
     if (err) {
         goto out;
     }
 
-    qmp_set_password(proto, password, !!connected, conn, &err);
+    if (opts.protocol == DISPLAY_PROTOCOL_VNC) {
+        opts.u.vnc.has_display = !!display;
+        opts.u.vnc.display = (char *)display;
+    }
+
+    qmp_set_password(&opts, &err);
 
 out:
     hmp_handle_error(mon, err);
@@ -1423,16 +1432,25 @@ void hmp_expire_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
     const char *whenstr = qdict_get_str(qdict, "time");
+    const char *display = qdict_get_try_str(qdict, "display");
     Error *err = NULL;
-    DisplayProtocol proto;
 
-    proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
-                            DISPLAY_PROTOCOL_VNC, &err);
+    ExpirePasswordOptions opts = {
+        .time = (char *)whenstr,
+    };
+
+    opts.protocol = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+                                    DISPLAY_PROTOCOL_VNC, &err);
     if (err) {
         goto out;
     }
 
-    qmp_expire_password(proto, whenstr, &err);
+    if (opts.protocol == DISPLAY_PROTOCOL_VNC) {
+        opts.u.vnc.has_display = !!display;
+        opts.u.vnc.display = (char *)display;
+    }
+
+    qmp_expire_password(&opts, &err);
 
 out:
     hmp_handle_error(mon, err);
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index b6e8b57fcc..df97582dd4 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -168,35 +168,27 @@ void qmp_system_wakeup(Error **errp)
     qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, errp);
 }
 
-void qmp_set_password(DisplayProtocol protocol, const char *password,
-                      bool has_connected, SetPasswordAction connected,
-                      Error **errp)
+void qmp_set_password(SetPasswordOptions *opts, Error **errp)
 {
-    int disconnect_if_connected = 0;
-    int fail_if_connected = 0;
     int rc;
 
-    if (has_connected) {
-        fail_if_connected = connected == SET_PASSWORD_ACTION_FAIL;
-        disconnect_if_connected = connected == SET_PASSWORD_ACTION_DISCONNECT;
-    }
-
-    if (protocol == DISPLAY_PROTOCOL_SPICE) {
+    if (opts->protocol == DISPLAY_PROTOCOL_SPICE) {
         if (!qemu_using_spice(errp)) {
             return;
         }
-        rc = qemu_spice.set_passwd(password, fail_if_connected,
-                                   disconnect_if_connected);
+        rc = qemu_spice.set_passwd(opts->password,
+                opts->connected == SET_PASSWORD_ACTION_FAIL,
+                opts->connected == SET_PASSWORD_ACTION_DISCONNECT);
     } else {
-        assert(protocol == DISPLAY_PROTOCOL_VNC);
-        if (fail_if_connected || disconnect_if_connected) {
+        assert(opts->protocol == DISPLAY_PROTOCOL_VNC);
+        if (opts->connected != SET_PASSWORD_ACTION_KEEP) {
             /* vnc supports "connected=keep" only */
             error_setg(errp, QERR_INVALID_PARAMETER, "connected");
             return;
         }
         /* Note that setting an empty password will not disable login through
          * this interface. */
-        rc = vnc_display_password(NULL, password);
+        rc = vnc_display_password(opts->u.vnc.display, opts->password);
     }
 
     if (rc != 0) {
@@ -204,11 +196,11 @@ void qmp_set_password(DisplayProtocol protocol, const char *password,
     }
 }
 
-void qmp_expire_password(DisplayProtocol protocol, const char *whenstr,
-                         Error **errp)
+void qmp_expire_password(ExpirePasswordOptions *opts, Error **errp)
 {
     time_t when;
     int rc;
+    const char *whenstr = opts->time;
 
     if (strcmp(whenstr, "now") == 0) {
         when = 0;
@@ -220,14 +212,14 @@ void qmp_expire_password(DisplayProtocol protocol, const char *whenstr,
         when = strtoull(whenstr, NULL, 10);
     }
 
-    if (protocol == DISPLAY_PROTOCOL_SPICE) {
+    if (opts->protocol == DISPLAY_PROTOCOL_SPICE) {
         if (!qemu_using_spice(errp)) {
             return;
         }
         rc = qemu_spice.set_pw_expire(when);
     } else {
-        assert(protocol == DISPLAY_PROTOCOL_VNC);
-        rc = vnc_display_pw_expire(NULL, when);
+        assert(opts->protocol == DISPLAY_PROTOCOL_VNC);
+        rc = vnc_display_pw_expire(opts->u.vnc.display, when);
     }
 
     if (rc != 0) {
diff --git a/qapi/ui.json b/qapi/ui.json
index e112409211..4a13f883a3 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -38,20 +38,47 @@
   'data': [ 'keep', 'fail', 'disconnect' ] }
 
 ##
-# @set_password:
+# @SetPasswordOptions:
 #
-# Sets the password of a remote display session.
+# Options for set_password.
 #
 # @protocol: - 'vnc' to modify the VNC server password
 #            - 'spice' to modify the Spice server password
 #
 # @password: the new password
 #
-# @connected: how to handle existing clients when changing the
-#             password.  If nothing is specified, defaults to 'keep'
-#             'fail' to fail the command if clients are connected
-#             'disconnect' to disconnect existing clients
-#             'keep' to maintain existing clients
+# @connected: How to handle existing clients when changing the
+#             password. If nothing is specified, defaults to 'keep'.
+#             For VNC, only 'keep' is currently implemented.
+#
+# Since: 7.0
+#
+##
+{ 'union': 'SetPasswordOptions',
+  'base': { 'protocol': 'DisplayProtocol',
+            'password': 'str',
+            '*connected': 'SetPasswordAction' },
+  'discriminator': 'protocol',
+  'data': { 'vnc': 'SetPasswordOptionsVnc' } }
+
+##
+# @SetPasswordOptionsVnc:
+#
+# Options for set_password specific to the VNC procotol.
+#
+# @display: The id of the display where the password should be changed.
+#           Defaults to the first.
+#
+# Since: 7.0
+#
+##
+{ 'struct': 'SetPasswordOptionsVnc',
+  'data': { '*display': 'str' } }
+
+##
+# @set_password:
+#
+# Set the password of a remote display server.
 #
 # Returns: - Nothing on success
 #          - If Spice is not enabled, DeviceNotFound
@@ -65,17 +92,15 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'set_password',
-  'data': { 'protocol': 'DisplayProtocol',
-            'password': 'str',
-            '*connected': 'SetPasswordAction' } }
+{ 'command': 'set_password', 'boxed': true, 'data': 'SetPasswordOptions' }
 
 ##
-# @expire_password:
+# @ExpirePasswordOptions:
 #
-# Expire the password of a remote display server.
+# General options for expire_password.
 #
-# @protocol: the name of the remote display protocol 'vnc' or 'spice'
+# @protocol: - 'vnc' to modify the VNC server expiration
+#            - 'spice' to modify the Spice server expiration
 #
 # @time: when to expire the password.
 #
@@ -84,16 +109,45 @@
 #        - '+INT' where INT is the number of seconds from now (integer)
 #        - 'INT' where INT is the absolute time in seconds
 #
-# Returns: - Nothing on success
-#          - If @protocol is 'spice' and Spice is not active, DeviceNotFound
-#
-# Since: 0.14
-#
 # Notes: Time is relative to the server and currently there is no way to
 #        coordinate server time with client time.  It is not recommended to
 #        use the absolute time version of the @time parameter unless you're
 #        sure you are on the same machine as the QEMU instance.
 #
+# Since: 7.0
+#
+##
+{ 'union': 'ExpirePasswordOptions',
+  'base': { 'protocol': 'DisplayProtocol',
+            'time': 'str' },
+  'discriminator': 'protocol',
+  'data': { 'vnc': 'ExpirePasswordOptionsVnc' } }
+
+##
+# @ExpirePasswordOptionsVnc:
+#
+# Options for expire_password specific to the VNC procotol.
+#
+# @display: The id of the display where the expiration should be changed.
+#           Defaults to the first.
+#
+# Since: 7.0
+#
+##
+
+{ 'struct': 'ExpirePasswordOptionsVnc',
+  'data': { '*display': 'str' } }
+
+##
+# @expire_password:
+#
+# Expire the password of a remote display server.
+#
+# Returns: - Nothing on success
+#          - If @protocol is 'spice' and Spice is not active, DeviceNotFound
+#
+# Since: 0.14
+#
 # Example:
 #
 # -> { "execute": "expire_password", "arguments": { "protocol": "vnc",
@@ -101,9 +155,7 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'expire_password',
-  'data': { 'protocol': 'DisplayProtocol',
-            'time': 'str' } }
+{ 'command': 'expire_password', 'boxed': true, 'data': 'ExpirePasswordOptions' }
 
 ##
 # @screendump:
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 06/18] migration/rdma: set the REUSEADDR option for destination
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 05/18] qapi/monitor: allow VNC display id in set/expire_password Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 07/18] migration: Dump sub-cmd name in loadvm_process_command tp Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Jack Wang <jinpu.wang@ionos.com>

We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr

Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.

Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Message-Id: <20220208085640.19702-1-jinpu.wang@ionos.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up some tabs
---
 migration/rdma.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index c7c7a38487..ef1e65ec36 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     char ip[40] = "unknown";
     struct rdma_addrinfo *res, *e;
     char port_str[16];
+    int reuse = 1;
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         rdma->wr_data[idx].control_len = 0;
@@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
         goto err_dest_init_bind_addr;
     }
 
+    ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
+                          &reuse, sizeof reuse);
+    if (ret) {
+        ERROR(errp, "Error: could not set REUSEADDR option");
+        goto err_dest_init_bind_addr;
+    }
     for (e = res; e != NULL; e = e->ai_next) {
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 07/18] migration: Dump sub-cmd name in loadvm_process_command tp
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 06/18] migration/rdma: set the REUSEADDR option for destination Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 08/18] migration: Finer grained tracepoints for POSTCOPY_LISTEN Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

It'll be easier to read the name rather than index of sub-cmd when debugging.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c     | 3 ++-
 migration/trace-events | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 1599b02fbc..7bb65e1d61 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2273,12 +2273,13 @@ static int loadvm_process_command(QEMUFile *f)
         return qemu_file_get_error(f);
     }
 
-    trace_loadvm_process_command(cmd, len);
     if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
         error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
         return -EINVAL;
     }
 
+    trace_loadvm_process_command(mig_cmd_args[cmd].name, len);
+
     if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
         error_report("%s received with bad length - expecting %zu, got %d",
                      mig_cmd_args[cmd].name,
diff --git a/migration/trace-events b/migration/trace-events
index 48aa7b10ee..123cfe79d7 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -22,7 +22,7 @@ loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
-loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command(const char *s, uint16_t len) "com=%s len=%d"
 loadvm_process_command_ping(uint32_t val) "0x%x"
 postcopy_ram_listen_thread_exit(void) ""
 postcopy_ram_listen_thread_start(void) ""
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 08/18] migration: Finer grained tracepoints for POSTCOPY_LISTEN
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 07/18] migration: Dump sub-cmd name in loadvm_process_command tp Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 09/18] migration: Tracepoint change in postcopy-run bottom half Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

The enablement of postcopy listening has a few steps, add a few tracepoints to
be there ready for some basic measurements for them.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-3-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c     | 9 ++++++++-
 migration/trace-events | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 7bb65e1d61..190cc5fc42 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1948,9 +1948,10 @@ static void *postcopy_ram_listen_thread(void *opaque)
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
-    trace_loadvm_postcopy_handle_listen();
     Error *local_err = NULL;
 
+    trace_loadvm_postcopy_handle_listen("enter");
+
     if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
         error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
         return -1;
@@ -1965,6 +1966,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         }
     }
 
+    trace_loadvm_postcopy_handle_listen("after discard");
+
     /*
      * Sensitise RAM - can now generate requests for blocks that don't exist
      * However, at this point the CPU shouldn't be running, and the IO
@@ -1977,6 +1980,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         }
     }
 
+    trace_loadvm_postcopy_handle_listen("after uffd");
+
     if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
         error_report_err(local_err);
         return -1;
@@ -1991,6 +1996,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
 
+    trace_loadvm_postcopy_handle_listen("return");
+
     return 0;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index 123cfe79d7..92596c00d8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -14,7 +14,7 @@ loadvm_handle_cmd_packaged_main(int ret) "%d"
 loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
-loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
 loadvm_postcopy_handle_run_cpu_sync(void) ""
 loadvm_postcopy_handle_run_vmstart(void) ""
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 09/18] migration: Tracepoint change in postcopy-run bottom half
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 08/18] migration: Finer grained tracepoints for POSTCOPY_LISTEN Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 10/18] migration: Introduce postcopy channels on dest node Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Remove the old two tracepoints and they're even near each other:

    trace_loadvm_postcopy_handle_run_cpu_sync()
    trace_loadvm_postcopy_handle_run_vmstart()

Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-4-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c     | 12 +++++++++---
 migration/trace-events |  3 +--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 190cc5fc42..41e3238798 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2006,13 +2006,19 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
+    trace_loadvm_postcopy_handle_run_bh("enter");
+
     /* TODO we should move all of this lot into postcopy_ram.c or a shared code
      * in migration.c
      */
     cpu_synchronize_all_post_init();
 
+    trace_loadvm_postcopy_handle_run_bh("after cpu sync");
+
     qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
+    trace_loadvm_postcopy_handle_run_bh("after announce");
+
     /* Make sure all file formats flush their mutable metadata.
      * If we get an error here, just don't restart the VM yet. */
     bdrv_invalidate_cache_all(&local_err);
@@ -2022,9 +2028,7 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
         autostart = false;
     }
 
-    trace_loadvm_postcopy_handle_run_cpu_sync();
-
-    trace_loadvm_postcopy_handle_run_vmstart();
+    trace_loadvm_postcopy_handle_run_bh("after invalidate cache");
 
     dirty_bitmap_mig_before_vm_start();
 
@@ -2037,6 +2041,8 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     }
 
     qemu_bh_delete(mis->bh);
+
+    trace_loadvm_postcopy_handle_run_bh("return");
 }
 
 /* After all discards we can start running and asking for pages */
diff --git a/migration/trace-events b/migration/trace-events
index 92596c00d8..1aec580e92 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -16,8 +16,7 @@ loadvm_handle_recv_bitmap(char *s) "%s"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(const char *str) "%s"
 loadvm_postcopy_handle_run(void) ""
-loadvm_postcopy_handle_run_cpu_sync(void) ""
-loadvm_postcopy_handle_run_vmstart(void) ""
+loadvm_postcopy_handle_run_bh(const char *str) "%s"
 loadvm_postcopy_handle_resume(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 10/18] migration: Introduce postcopy channels on dest node
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 09/18] migration: Tracepoint change in postcopy-run bottom half Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 11/18] migration: Dump ramblock and offset too when non-same-page detected Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Postcopy handles huge pages in a special way that currently we can only have
one "channel" to transfer the page.

It's because when we install pages using UFFDIO_COPY, we need to have the whole
huge page ready, it also means we need to have a temp huge page when trying to
receive the whole content of the page.

Currently all maintainance around this tmp page is global: firstly we'll
allocate a temp huge page, then we maintain its status mostly within
ram_load_postcopy().

To enable multiple channels for postcopy, the first thing we need to do is to
prepare N temp huge pages as caching, one for each channel.

Meanwhile we need to maintain the tmp huge page status per-channel too.

To give some example, some local variables maintained in ram_load_postcopy()
are listed; they are responsible for maintaining temp huge page status:

  - all_zero:     this keeps whether this huge page contains all zeros
  - target_pages: this counts how many target pages have been copied
  - host_page:    this keeps the host ptr for the page to install

Move all these fields to be together with the temp huge pages to form a new
structure called PostcopyTmpPage.  Then for each (future) postcopy channel, we
need one structure to keep the state around.

For vanilla postcopy, obviously there's only one channel.  It contains both
precopy and postcopy pages.

This patch teaches the dest migration node to start realize the possible number
of postcopy channels by introducing the "postcopy_channels" variable.  Its
value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN
phase).

Vanilla postcopy will have channels=1, but when postcopy-preempt capability is
enabled (in the future), we will boost it to 2 because even during partial
sending of a precopy huge page we still want to preempt it and start sending
the postcopy requested page right away (so we start to keep two temp huge
pages; more if we want to enable multifd).  In this patch there's a TODO marked
for that; so far the channels is always set to 1.

We need to send one "host huge page" on one channel only and we cannot split
them, because otherwise the data upon the same huge page can locate on more
than one channel so we need more complicated logic to manage.  One temp host
huge page for each channel will be enough for us for now.

Postcopy will still always use the index=0 huge page even after this patch.
However it prepares for the latter patches where it can start to use multiple
channels (which needs src intervention, because only src knows which channel we
should use).

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-5-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up long line
---
 migration/migration.h    | 36 +++++++++++++++++++++++-
 migration/postcopy-ram.c | 60 ++++++++++++++++++++++++++++++----------
 migration/ram.c          | 45 ++++++++++++++----------------
 migration/savevm.c       | 12 ++++++++
 4 files changed, 114 insertions(+), 39 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..42c7395094 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -45,6 +45,24 @@ struct PostcopyBlocktimeContext;
  */
 #define CLEAR_BITMAP_SHIFT_MAX            31
 
+/* This is an abstraction of a "temp huge page" for postcopy's purpose */
+typedef struct {
+    /*
+     * This points to a temporary huge page as a buffer for UFFDIO_COPY.  It's
+     * mmap()ed and needs to be freed when cleanup.
+     */
+    void *tmp_huge_page;
+    /*
+     * This points to the host page we're going to install for this temp page.
+     * It tells us after we've received the whole page, where we should put it.
+     */
+    void *host_addr;
+    /* Number of small pages copied (in size of TARGET_PAGE_SIZE) */
+    unsigned int target_pages;
+    /* Whether this page contains all zeros */
+    bool all_zero;
+} PostcopyTmpPage;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -81,7 +99,22 @@ struct MigrationIncomingState {
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     /* RAMBlock of last request sent to source */
     RAMBlock *last_rb;
-    void     *postcopy_tmp_page;
+    /*
+     * Number of postcopy channels including the default precopy channel, so
+     * vanilla postcopy will only contain one channel which contain both
+     * precopy and postcopy streams.
+     *
+     * This is calculated when the src requests to enable postcopy but before
+     * it starts.  Its value can depend on e.g. whether postcopy preemption is
+     * enabled.
+     */
+    unsigned int postcopy_channels;
+    /*
+     * An array of temp host huge pages to be used, one for each postcopy
+     * channel.
+     */
+    PostcopyTmpPage *postcopy_tmp_pages;
+    /* This is shared for all postcopy channels */
     void     *postcopy_tmp_zero_page;
     /* PostCopyFD's for external userfaultfds & handlers of shared memory */
     GArray   *postcopy_remote_fds;
@@ -391,5 +424,6 @@ bool migration_rate_limit(void);
 void migration_cancel(const Error *error);
 
 void populate_vfio_info(MigrationInfo *info);
+void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 2a2cc5faf8..30c3508f44 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -526,9 +526,18 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis)
 
 static void postcopy_temp_pages_cleanup(MigrationIncomingState *mis)
 {
-    if (mis->postcopy_tmp_page) {
-        munmap(mis->postcopy_tmp_page, mis->largest_page_size);
-        mis->postcopy_tmp_page = NULL;
+    int i;
+
+    if (mis->postcopy_tmp_pages) {
+        for (i = 0; i < mis->postcopy_channels; i++) {
+            if (mis->postcopy_tmp_pages[i].tmp_huge_page) {
+                munmap(mis->postcopy_tmp_pages[i].tmp_huge_page,
+                       mis->largest_page_size);
+                mis->postcopy_tmp_pages[i].tmp_huge_page = NULL;
+            }
+        }
+        g_free(mis->postcopy_tmp_pages);
+        mis->postcopy_tmp_pages = NULL;
     }
 
     if (mis->postcopy_tmp_zero_page) {
@@ -1092,17 +1101,30 @@ retry:
 
 static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
 {
-    int err;
-
-    mis->postcopy_tmp_page = mmap(NULL, mis->largest_page_size,
-                                  PROT_READ | PROT_WRITE,
-                                  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-    if (mis->postcopy_tmp_page == MAP_FAILED) {
-        err = errno;
-        mis->postcopy_tmp_page = NULL;
-        error_report("%s: Failed to map postcopy_tmp_page %s",
-                     __func__, strerror(err));
-        return -err;
+    PostcopyTmpPage *tmp_page;
+    int err, i, channels;
+    void *temp_page;
+
+    /* TODO: will be boosted when enable postcopy preemption */
+    mis->postcopy_channels = 1;
+
+    channels = mis->postcopy_channels;
+    mis->postcopy_tmp_pages = g_malloc0_n(sizeof(PostcopyTmpPage), channels);
+
+    for (i = 0; i < channels; i++) {
+        tmp_page = &mis->postcopy_tmp_pages[i];
+        temp_page = mmap(NULL, mis->largest_page_size, PROT_READ | PROT_WRITE,
+                         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+        if (temp_page == MAP_FAILED) {
+            err = errno;
+            error_report("%s: Failed to map postcopy_tmp_pages[%d]: %s",
+                         __func__, i, strerror(err));
+            /* Clean up will be done later */
+            return -err;
+        }
+        tmp_page->tmp_huge_page = temp_page;
+        /* Initialize default states for each tmp page */
+        postcopy_temp_page_reset(tmp_page);
     }
 
     /*
@@ -1352,6 +1374,16 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
 #endif
 
 /* ------------------------------------------------------------------------- */
+void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page)
+{
+    tmp_page->target_pages = 0;
+    tmp_page->host_addr = NULL;
+    /*
+     * This is set to true when reset, and cleared as long as we received any
+     * of the non-zero small page within this huge page.
+     */
+    tmp_page->all_zero = true;
+}
 
 void postcopy_fault_thread_notify(MigrationIncomingState *mis)
 {
diff --git a/migration/ram.c b/migration/ram.c
index 781f0745dc..fe3de84856 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3641,11 +3641,8 @@ static int ram_load_postcopy(QEMUFile *f)
     bool place_needed = false;
     bool matches_target_page_size = false;
     MigrationIncomingState *mis = migration_incoming_get_current();
-    /* Temporary page that is later 'placed' */
-    void *postcopy_host_page = mis->postcopy_tmp_page;
-    void *host_page = NULL;
-    bool all_zero = true;
-    int target_pages = 0;
+    /* Currently we only use channel 0.  TODO: use all the channels */
+    PostcopyTmpPage *tmp_page = &mis->postcopy_tmp_pages[0];
 
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr;
@@ -3689,7 +3686,7 @@ static int ram_load_postcopy(QEMUFile *f)
                 ret = -EINVAL;
                 break;
             }
-            target_pages++;
+            tmp_page->target_pages++;
             matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
             /*
              * Postcopy requires that we place whole host pages atomically;
@@ -3701,15 +3698,16 @@ static int ram_load_postcopy(QEMUFile *f)
              * however the source ensures it always sends all the components
              * of a host page in one chunk.
              */
-            page_buffer = postcopy_host_page +
+            page_buffer = tmp_page->tmp_huge_page +
                           host_page_offset_from_ram_block_offset(block, addr);
             /* If all TP are zero then we can optimise the place */
-            if (target_pages == 1) {
-                host_page = host_page_from_ram_block_offset(block, addr);
-            } else if (host_page != host_page_from_ram_block_offset(block,
-                                                                    addr)) {
+            if (tmp_page->target_pages == 1) {
+                tmp_page->host_addr =
+                    host_page_from_ram_block_offset(block, addr);
+            } else if (tmp_page->host_addr !=
+                       host_page_from_ram_block_offset(block, addr)) {
                 /* not the 1st TP within the HP */
-                error_report("Non-same host page %p/%p", host_page,
+                error_report("Non-same host page %p/%p", tmp_page->host_addr,
                              host_page_from_ram_block_offset(block, addr));
                 ret = -EINVAL;
                 break;
@@ -3719,10 +3717,11 @@ static int ram_load_postcopy(QEMUFile *f)
              * If it's the last part of a host page then we place the host
              * page
              */
-            if (target_pages == (block->page_size / TARGET_PAGE_SIZE)) {
+            if (tmp_page->target_pages ==
+                (block->page_size / TARGET_PAGE_SIZE)) {
                 place_needed = true;
             }
-            place_source = postcopy_host_page;
+            place_source = tmp_page->tmp_huge_page;
         }
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
@@ -3736,12 +3735,12 @@ static int ram_load_postcopy(QEMUFile *f)
                 memset(page_buffer, ch, TARGET_PAGE_SIZE);
             }
             if (ch) {
-                all_zero = false;
+                tmp_page->all_zero = false;
             }
             break;
 
         case RAM_SAVE_FLAG_PAGE:
-            all_zero = false;
+            tmp_page->all_zero = false;
             if (!matches_target_page_size) {
                 /* For huge pages, we always use temporary buffer */
                 qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
@@ -3759,7 +3758,7 @@ static int ram_load_postcopy(QEMUFile *f)
             }
             break;
         case RAM_SAVE_FLAG_COMPRESS_PAGE:
-            all_zero = false;
+            tmp_page->all_zero = false;
             len = qemu_get_be32(f);
             if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
                 error_report("Invalid compressed data length: %d", len);
@@ -3791,16 +3790,14 @@ static int ram_load_postcopy(QEMUFile *f)
         }
 
         if (!ret && place_needed) {
-            if (all_zero) {
-                ret = postcopy_place_page_zero(mis, host_page, block);
+            if (tmp_page->all_zero) {
+                ret = postcopy_place_page_zero(mis, tmp_page->host_addr, block);
             } else {
-                ret = postcopy_place_page(mis, host_page, place_source,
-                                          block);
+                ret = postcopy_place_page(mis, tmp_page->host_addr,
+                                          place_source, block);
             }
             place_needed = false;
-            target_pages = 0;
-            /* Assume we have a zero page until we detect something different */
-            all_zero = true;
+            postcopy_temp_page_reset(tmp_page);
         }
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 41e3238798..0ccd7e5e3f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2579,6 +2579,18 @@ void qemu_loadvm_state_cleanup(void)
 /* Return true if we should continue the migration, or false. */
 static bool postcopy_pause_incoming(MigrationIncomingState *mis)
 {
+    int i;
+
+    /*
+     * If network is interrupted, any temp page we received will be useless
+     * because we didn't mark them as "received" in receivedmap.  After a
+     * proper recovery later (which will sync src dirty bitmap with receivedmap
+     * on dest) these cached small pages will be resent again.
+     */
+    for (i = 0; i < mis->postcopy_channels; i++) {
+        postcopy_temp_page_reset(&mis->postcopy_tmp_pages[i]);
+    }
+
     trace_postcopy_pause_incoming();
 
     assert(migrate_postcopy_ram());
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 11/18] migration: Dump ramblock and offset too when non-same-page detected
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 10/18] migration: Introduce postcopy channels on dest node Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 12/18] migration: Add postcopy_thread_create() Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

In ram_load_postcopy() we'll try to detect non-same-page case and dump error.
This error is very helpful for debugging.  Adding ramblock & offset into the
error log too.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-6-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fix up long line
---
 migration/ram.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index fe3de84856..a9d0d100bd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3707,8 +3707,12 @@ static int ram_load_postcopy(QEMUFile *f)
             } else if (tmp_page->host_addr !=
                        host_page_from_ram_block_offset(block, addr)) {
                 /* not the 1st TP within the HP */
-                error_report("Non-same host page %p/%p", tmp_page->host_addr,
-                             host_page_from_ram_block_offset(block, addr));
+                error_report("Non-same host page detected.  "
+                             "Target host page %p, received host page %p "
+                             "(rb %s offset 0x"RAM_ADDR_FMT" target_pages %d)",
+                             tmp_page->host_addr,
+                             host_page_from_ram_block_offset(block, addr),
+                             block->idstr, addr, tmp_page->target_pages);
                 ret = -EINVAL;
                 break;
             }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 12/18] migration: Add postcopy_thread_create()
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 11/18] migration: Dump ramblock and offset too when non-same-page detected Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 13/18] migration: Move static var in ram_block_from_stream() into global Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Postcopy create threads. A common manner is we init a sem and use it to sync
with the thread.  Namely, we have fault_thread_sem and listen_thread_sem and
they're only used for this.

Make it a shared infrastructure so it's easier to create yet another thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-7-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.h    |  8 +++++---
 migration/postcopy-ram.c | 23 +++++++++++++++++------
 migration/postcopy-ram.h |  4 ++++
 migration/savevm.c       | 12 +++---------
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 42c7395094..8445e1d14a 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -70,7 +70,11 @@ struct MigrationIncomingState {
     /* A hook to allow cleanup at the end of incoming migration */
     void *transport_data;
     void (*transport_cleanup)(void *data);
-
+    /*
+     * Used to sync thread creations.  Note that we can't create threads in
+     * parallel with this sem.
+     */
+    QemuSemaphore  thread_sync_sem;
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
@@ -83,13 +87,11 @@ struct MigrationIncomingState {
     size_t         largest_page_size;
     bool           have_fault_thread;
     QemuThread     fault_thread;
-    QemuSemaphore  fault_thread_sem;
     /* Set this when we want the fault thread to quit */
     bool           fault_thread_quit;
 
     bool           have_listen_thread;
     QemuThread     listen_thread;
-    QemuSemaphore  listen_thread_sem;
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 30c3508f44..d08d396c63 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -78,6 +78,20 @@ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
                                             &pnd);
 }
 
+/*
+ * NOTE: this routine is not thread safe, we can't call it concurrently. But it
+ * should be good enough for migration's purposes.
+ */
+void postcopy_thread_create(MigrationIncomingState *mis,
+                            QemuThread *thread, const char *name,
+                            void *(*fn)(void *), int joinable)
+{
+    qemu_sem_init(&mis->thread_sync_sem, 0);
+    qemu_thread_create(thread, name, fn, mis, joinable);
+    qemu_sem_wait(&mis->thread_sync_sem);
+    qemu_sem_destroy(&mis->thread_sync_sem);
+}
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -902,7 +916,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     trace_postcopy_ram_fault_thread_entry();
     rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
-    qemu_sem_post(&mis->fault_thread_sem);
+    qemu_sem_post(&mis->thread_sync_sem);
 
     struct pollfd *pfd;
     size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
@@ -1173,11 +1187,8 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
-    qemu_sem_init(&mis->fault_thread_sem, 0);
-    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
-    qemu_sem_wait(&mis->fault_thread_sem);
-    qemu_sem_destroy(&mis->fault_thread_sem);
+    postcopy_thread_create(mis, &mis->fault_thread, "postcopy/fault",
+                           postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
     mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 6d2b3cf124..07684c0e1d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -135,6 +135,10 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
 /* Call the notifier list set by postcopy_add_start_notifier */
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
 
+void postcopy_thread_create(MigrationIncomingState *mis,
+                            QemuThread *thread, const char *name,
+                            void *(*fn)(void *), int joinable);
+
 struct PostCopyFD;
 
 /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ccd7e5e3f..967ff80547 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1863,7 +1863,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    qemu_sem_post(&mis->listen_thread_sem);
+    qemu_sem_post(&mis->thread_sync_sem);
     trace_postcopy_ram_listen_thread_start();
 
     rcu_register_thread();
@@ -1988,14 +1988,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     }
 
     mis->have_listen_thread = true;
-    /* Start up the listening thread and wait for it to signal ready */
-    qemu_sem_init(&mis->listen_thread_sem, 0);
-    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, NULL,
-                       QEMU_THREAD_DETACHED);
-    qemu_sem_wait(&mis->listen_thread_sem);
-    qemu_sem_destroy(&mis->listen_thread_sem);
-
+    postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen",
+                           postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
     trace_loadvm_postcopy_handle_listen("return");
 
     return 0;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 13/18] migration: Move static var in ram_block_from_stream() into global
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 12/18] migration: Add postcopy_thread_create() Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 14/18] migration: Enlarge postcopy recovery to capture !-EIO too Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Static variable is very unfriendly to threading of ram_block_from_stream().
Move it into MigrationIncomingState.

Make the incoming state pointer to be passed over to ram_block_from_stream() on
both caller sites.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-8-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.h |  3 ++-
 migration/ram.c       | 13 +++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 8445e1d14a..d8b9850eae 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -66,7 +66,8 @@ typedef struct {
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
-
+    /* Previously received RAM's RAMBlock pointer */
+    RAMBlock *last_recv_block;
     /* A hook to allow cleanup at the end of incoming migration */
     void *transport_data;
     void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index a9d0d100bd..170e522a1f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3185,12 +3185,14 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
  *
  * Returns a pointer from within the RCU-protected ram_list.
  *
+ * @mis: the migration incoming state pointer
  * @f: QEMUFile where to read the data from
  * @flags: Page flags (mostly to see if it's a continuation of previous block)
  */
-static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
+static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
+                                              QEMUFile *f, int flags)
 {
-    static RAMBlock *block;
+    RAMBlock *block = mis->last_recv_block;
     char id[256];
     uint8_t len;
 
@@ -3217,6 +3219,8 @@ static inline RAMBlock *ram_block_from_stream(QEMUFile *f, int flags)
         return NULL;
     }
 
+    mis->last_recv_block = block;
+
     return block;
 }
 
@@ -3669,7 +3673,7 @@ static int ram_load_postcopy(QEMUFile *f)
         trace_ram_load_postcopy_loop((uint64_t)addr, flags);
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE)) {
-            block = ram_block_from_stream(f, flags);
+            block = ram_block_from_stream(mis, f, flags);
             if (!block) {
                 ret = -EINVAL;
                 break;
@@ -3881,6 +3885,7 @@ void colo_flush_ram_cache(void)
  */
 static int ram_load_precopy(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
     /* ADVISE is earlier, it shows the source has the postcopy capability on */
     bool postcopy_advised = postcopy_is_advised();
@@ -3919,7 +3924,7 @@ static int ram_load_precopy(QEMUFile *f)
 
         if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-            RAMBlock *block = ram_block_from_stream(f, flags);
+            RAMBlock *block = ram_block_from_stream(mis, f, flags);
 
             host = host_from_ram_block_offset(block, addr);
             /*
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 14/18] migration: Enlarge postcopy recovery to capture !-EIO too
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 13/18] migration: Move static var in ram_block_from_stream() into global Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 15/18] migration: postcopy_pause_fault_thread() never fails Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

We used to have quite a few places making sure -EIO happened and that's the
only way to trigger postcopy recovery.  That's based on the assumption that
we'll only return -EIO for channel issues.

It'll work in 99.99% cases but logically that won't cover some corner cases.
One example is e.g. ram_block_from_stream() could fail with an interrupted
network, then -EINVAL will be returned instead of -EIO.

I remembered Dave Gilbert pointed that out before, but somehow this is
overlooked.  Neither did I encounter anything outside the -EIO error.

However we'd better touch that up before it triggers a rare VM data loss during
live migrating.

To cover as much those cases as possible, remove the -EIO restriction on
triggering the postcopy recovery, because even if it's not a channel failure,
we can't do anything better than halting QEMU anyway - the corpse of the
process may even be used by a good hand to dig out useful memory regions, or
the admin could simply kill the process later on.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-11-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c    | 4 ++--
 migration/postcopy-ram.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index bcc385b94b..306e2ac60e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2865,7 +2865,7 @@ retry:
 out:
     res = qemu_file_get_error(rp);
     if (res) {
-        if (res == -EIO && migration_in_postcopy()) {
+        if (res && migration_in_postcopy()) {
             /*
              * Maybe there is something we can do: it looks like a
              * network down issue, and we pause for a recovery.
@@ -3466,7 +3466,7 @@ static MigThrError migration_detect_error(MigrationState *s)
         error_free(local_error);
     }
 
-    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
+    if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) {
         /*
          * For postcopy, we allow the network to be down for a
          * while. After that, it can be continued by a
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d08d396c63..b0d12d5053 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1039,7 +1039,7 @@ retry:
                                         msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
-                if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
+                if (postcopy_pause_fault_thread(mis)) {
                     /* We got reconnected somehow, try to continue */
                     goto retry;
                 } else {
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 15/18] migration: postcopy_pause_fault_thread() never fails
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 14/18] migration: Enlarge postcopy recovery to capture !-EIO too Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 16/18] migration: Add migration_incoming_transport_cleanup() Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Per the title, remove the return code and simplify the callers as the errors
will never be triggered.  No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-12-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b0d12d5053..32c52f4b1d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -891,15 +891,11 @@ static void mark_postcopy_blocktime_end(uintptr_t addr)
                                       affected_cpu);
 }
 
-static bool postcopy_pause_fault_thread(MigrationIncomingState *mis)
+static void postcopy_pause_fault_thread(MigrationIncomingState *mis)
 {
     trace_postcopy_pause_fault_thread();
-
     qemu_sem_wait(&mis->postcopy_pause_sem_fault);
-
     trace_postcopy_pause_fault_thread_continued();
-
-    return true;
 }
 
 /*
@@ -959,13 +955,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
              * broken already using the event. We should hold until
              * the channel is rebuilt.
              */
-            if (postcopy_pause_fault_thread(mis)) {
-                /* Continue to read the userfaultfd */
-            } else {
-                error_report("%s: paused but don't allow to continue",
-                             __func__);
-                break;
-            }
+            postcopy_pause_fault_thread(mis);
         }
 
         if (pfd[1].revents) {
@@ -1039,15 +1029,8 @@ retry:
                                         msg.arg.pagefault.address);
             if (ret) {
                 /* May be network failure, try to wait for recovery */
-                if (postcopy_pause_fault_thread(mis)) {
-                    /* We got reconnected somehow, try to continue */
-                    goto retry;
-                } else {
-                    /* This is a unavoidable fault */
-                    error_report("%s: postcopy_request_page() get %d",
-                                 __func__, ret);
-                    break;
-                }
+                postcopy_pause_fault_thread(mis);
+                goto retry;
             }
         }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 16/18] migration: Add migration_incoming_transport_cleanup()
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 15/18] migration: postcopy_pause_fault_thread() never fails Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 17/18] tests: Pass in MigrateStart** into test_migrate_start() Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

Add a helper to cleanup the transport listener.

When do it, we should also null-ify the cleanup hook and the data, then it's
even safe to call it multiple times.

Move the socket_address_list cleanup altogether, because that's a mirror of the
listener channels and only for the purpose of query-migrate.  Hence when
someone wants to cleanup the listener transport, it should also want to cleanup
the socket list too, always.

No functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-15-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 22 ++++++++++++++--------
 migration/migration.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 306e2ac60e..9cc344514b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -267,6 +267,19 @@ MigrationIncomingState *migration_incoming_get_current(void)
     return current_incoming;
 }
 
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
+{
+    if (mis->socket_address_list) {
+        qapi_free_SocketAddressList(mis->socket_address_list);
+        mis->socket_address_list = NULL;
+    }
+
+    if (mis->transport_cleanup) {
+        mis->transport_cleanup(mis->transport_data);
+        mis->transport_data = mis->transport_cleanup = NULL;
+    }
+}
+
 void migration_incoming_state_destroy(void)
 {
     struct MigrationIncomingState *mis = migration_incoming_get_current();
@@ -287,10 +300,8 @@ void migration_incoming_state_destroy(void)
         g_array_free(mis->postcopy_remote_fds, TRUE);
         mis->postcopy_remote_fds = NULL;
     }
-    if (mis->transport_cleanup) {
-        mis->transport_cleanup(mis->transport_data);
-    }
 
+    migration_incoming_transport_cleanup(mis);
     qemu_event_reset(&mis->main_thread_load_event);
 
     if (mis->page_requested) {
@@ -298,11 +309,6 @@ void migration_incoming_state_destroy(void)
         mis->page_requested = NULL;
     }
 
-    if (mis->socket_address_list) {
-        qapi_free_SocketAddressList(mis->socket_address_list);
-        mis->socket_address_list = NULL;
-    }
-
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index d8b9850eae..2de861df01 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -166,6 +166,7 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
 /*
  * Functions to work with blocktime context
  */
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 17/18] tests: Pass in MigrateStart** into test_migrate_start()
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 16/18] migration: Add migration_incoming_transport_cleanup() Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-02 18:29 ` [PULL 18/18] migration: Remove load_state_old and minimum_version_id_old Dr. David Alan Gilbert (git)
  2022-03-03 14:46 ` [PULL 00/18] migration queue Peter Maydell
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Xu <peterx@redhat.com>

test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers.  It can easily
be a source of use-after-free.

Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-26-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixup apply since I didn't take 24/25
---
 tests/qtest/migration-test.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7b42f6fd90..0870656d82 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -495,7 +495,7 @@ static void migrate_start_destroy(MigrateStart *args)
 }
 
 static int test_migrate_start(QTestState **from, QTestState **to,
-                              const char *uri, MigrateStart *args)
+                              const char *uri, MigrateStart **pargs)
 {
     g_autofree gchar *arch_source = NULL;
     g_autofree gchar *arch_target = NULL;
@@ -507,6 +507,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
     g_autofree char *shmem_path = NULL;
     const char *arch = qtest_get_arch();
     const char *machine_opts = NULL;
+    MigrateStart *args = *pargs;
     const char *memory_size;
     int ret = 0;
 
@@ -621,6 +622,8 @@ static int test_migrate_start(QTestState **from, QTestState **to,
 
 out:
     migrate_start_destroy(args);
+    /* This tells the caller that this structure is gone */
+    *pargs = NULL;
     return ret;
 }
 
@@ -665,7 +668,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
     QTestState *from, *to;
 
-    if (test_migrate_start(&from, &to, uri, args)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return -1;
     }
 
@@ -788,7 +791,7 @@ static void test_baddest(void)
 
     args->hide_stderr = true;
 
-    if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", args)) {
+    if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", &args)) {
         return;
     }
     migrate_qmp(from, "tcp:127.0.0.1:0", "{}");
@@ -804,7 +807,7 @@ static void test_precopy_unix_common(bool dirty_ring)
 
     args->use_dirty_ring = dirty_ring;
 
-    if (test_migrate_start(&from, &to, uri, args)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return;
     }
 
@@ -892,7 +895,7 @@ static void test_xbzrle(const char *uri)
     MigrateStart *args = migrate_start_new();
     QTestState *from, *to;
 
-    if (test_migrate_start(&from, &to, uri, args)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return;
     }
 
@@ -946,7 +949,7 @@ static void test_precopy_tcp(void)
     g_autofree char *uri = NULL;
     QTestState *from, *to;
 
-    if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", args)) {
+    if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", &args)) {
         return;
     }
 
@@ -991,7 +994,7 @@ static void test_migrate_fd_proto(void)
     QDict *rsp;
     const char *error_desc;
 
-    if (test_migrate_start(&from, &to, "defer", args)) {
+    if (test_migrate_start(&from, &to, "defer", &args)) {
         return;
     }
 
@@ -1071,7 +1074,7 @@ static void do_test_validate_uuid(MigrateStart *args, bool should_fail)
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
     QTestState *from, *to;
 
-    if (test_migrate_start(&from, &to, uri, args)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return;
     }
 
@@ -1163,7 +1166,7 @@ static void test_migrate_auto_converge(void)
      */
     const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
 
-    if (test_migrate_start(&from, &to, uri, args)) {
+    if (test_migrate_start(&from, &to, uri, &args)) {
         return;
     }
 
@@ -1232,7 +1235,7 @@ static void test_multifd_tcp(const char *method)
     QDict *rsp;
     g_autofree char *uri = NULL;
 
-    if (test_migrate_start(&from, &to, "defer", args)) {
+    if (test_migrate_start(&from, &to, "defer", &args)) {
         return;
     }
 
@@ -1318,7 +1321,7 @@ static void test_multifd_tcp_cancel(void)
 
     args->hide_stderr = true;
 
-    if (test_migrate_start(&from, &to, "defer", args)) {
+    if (test_migrate_start(&from, &to, "defer", &args)) {
         return;
     }
 
@@ -1357,7 +1360,7 @@ static void test_multifd_tcp_cancel(void)
     args = migrate_start_new();
     args->only_target = true;
 
-    if (test_migrate_start(&from, &to2, "defer", args)) {
+    if (test_migrate_start(&from, &to2, "defer", &args)) {
         return;
     }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PULL 18/18] migration: Remove load_state_old and minimum_version_id_old
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 17/18] tests: Pass in MigrateStart** into test_migrate_start() Dr. David Alan Gilbert (git)
@ 2022-03-02 18:29 ` Dr. David Alan Gilbert (git)
  2022-03-03 14:46 ` [PULL 00/18] migration queue Peter Maydell
  18 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-03-02 18:29 UTC (permalink / raw)
  To: qemu-devel, f.ebner, hreitz, jinpu.wang, peter.maydell, peterx, s.reiter
  Cc: quintela

From: Peter Maydell <peter.maydell@linaro.org>

There are no longer any VMStateDescription structs in the tree which
use the load_state_old support for custom handling of incoming
migration from very old QEMU.  Remove the mechanism entirely.

This includes removing one stray useless setting of
minimum_version_id_old in a VMStateDescription with no load_state_old
function, which crept in after the global weeding-out of them in
commit 17e313406126.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220215175705.3846411-1-peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/devel/migration.rst    | 12 +++---------
 hw/ssi/xlnx-versal-ospi.c   |  1 -
 include/migration/vmstate.h |  2 --
 migration/vmstate.c         |  6 ------
 4 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 2401253482..3e9656d8e0 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -389,19 +389,13 @@ Each version is associated with a series of fields saved.  The ``save_state`` al
 the state as the newer version.  But ``load_state`` sometimes is able to
 load state from an older version.
 
-You can see that there are several version fields:
+You can see that there are two version fields:
 
 - ``version_id``: the maximum version_id supported by VMState for that device.
 - ``minimum_version_id``: the minimum version_id that VMState is able to understand
   for that device.
-- ``minimum_version_id_old``: For devices that were not able to port to vmstate, we can
-  assign a function that knows how to read this old state. This field is
-  ignored if there is no ``load_state_old`` handler.
-
-VMState is able to read versions from minimum_version_id to
-version_id.  And the function ``load_state_old()`` (if present) is able to
-load state from minimum_version_id_old to minimum_version_id.  This
-function is deprecated and will be removed when no more users are left.
+
+VMState is able to read versions from minimum_version_id to version_id.
 
 There are *_V* forms of many ``VMSTATE_`` macros to load fields for version dependent fields,
 e.g.
diff --git a/hw/ssi/xlnx-versal-ospi.c b/hw/ssi/xlnx-versal-ospi.c
index 7ecd148fdf..c762e0b367 100644
--- a/hw/ssi/xlnx-versal-ospi.c
+++ b/hw/ssi/xlnx-versal-ospi.c
@@ -1800,7 +1800,6 @@ static const VMStateDescription vmstate_xlnx_versal_ospi = {
     .name = TYPE_XILINX_VERSAL_OSPI,
     .version_id = 1,
     .minimum_version_id = 1,
-    .minimum_version_id_old = 1,
     .fields = (VMStateField[]) {
         VMSTATE_FIFO8(rx_fifo, XlnxVersalOspi),
         VMSTATE_FIFO8(tx_fifo, XlnxVersalOspi),
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 017c03675c..ad24aa1934 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -181,9 +181,7 @@ struct VMStateDescription {
     int unmigratable;
     int version_id;
     int minimum_version_id;
-    int minimum_version_id_old;
     MigrationPriority priority;
-    LoadStateHandler *load_state_old;
     int (*pre_load)(void *opaque);
     int (*post_load)(void *opaque, int version_id);
     int (*pre_save)(void *opaque);
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 05f87cdddc..36ae8b9e19 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -90,12 +90,6 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
         return -EINVAL;
     }
     if  (version_id < vmsd->minimum_version_id) {
-        if (vmsd->load_state_old &&
-            version_id >= vmsd->minimum_version_id_old) {
-            ret = vmsd->load_state_old(f, opaque, version_id);
-            trace_vmstate_load_state_end(vmsd->name, "old path", ret);
-            return ret;
-        }
         error_report("%s: incoming version_id %d is too old "
                      "for local minimum version_id  %d",
                      vmsd->name, version_id, vmsd->minimum_version_id);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2022-03-02 18:29 ` [PULL 18/18] migration: Remove load_state_old and minimum_version_id_old Dr. David Alan Gilbert (git)
@ 2022-03-03 14:46 ` Peter Maydell
  2022-03-08 18:36   ` Philippe Mathieu-Daudé
  18 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-03 14:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: quintela, s.reiter, qemu-devel, peterx, hreitz, f.ebner, jinpu.wang

On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
<dgilbert@redhat.com> wrote:
>
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
>
>   Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)
>
> are available in the Git repository at:
>
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
>
> for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
>
>   migration: Remove load_state_old and minimum_version_id_old (2022-03-02 18:20:45 +0000)
>
> ----------------------------------------------------------------
> Migration/HMP/Virtio pull 2022-03-02
>
> A bit of a mix this time:
>   * Minor fixes from myself, Hanna, and Jack
>   * VNC password rework by Stefan and Fabian
>   * Postcopy changes from Peter X that are
>     the start of a larger series to come
>   * Removing the prehistoic load_state_old
>     code from Peter M
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-03 14:46 ` [PULL 00/18] migration queue Peter Maydell
@ 2022-03-08 18:36   ` Philippe Mathieu-Daudé
  2022-03-08 18:47     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 47+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-03-08 18:36 UTC (permalink / raw)
  To: Peter Maydell, Dr. David Alan Gilbert (git)
  Cc: quintela, s.reiter, hreitz, peterx, qemu-devel,
	open list:S390 general arch...,
	f.ebner, jinpu.wang

On 3/3/22 15:46, Peter Maydell wrote:
> On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
> <dgilbert@redhat.com> wrote:
>>
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
>>
>>    Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)
>>
>> are available in the Git repository at:
>>
>>    https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
>>
>> for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
>>
>>    migration: Remove load_state_old and minimum_version_id_old (2022-03-02 18:20:45 +0000)
>>
>> ----------------------------------------------------------------
>> Migration/HMP/Virtio pull 2022-03-02
>>
>> A bit of a mix this time:
>>    * Minor fixes from myself, Hanna, and Jack
>>    * VNC password rework by Stefan and Fabian
>>    * Postcopy changes from Peter X that are
>>      the start of a larger series to come
>>    * Removing the prehistoic load_state_old
>>      code from Peter M

I'm seeing an error on the s390x runner:

▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram: 
assertion failed: (bad == 0) ERROR

  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test 
            ERROR          78.87s   killed by signal 6 SIGABRT

https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-08 18:36   ` Philippe Mathieu-Daudé
@ 2022-03-08 18:47     ` Dr. David Alan Gilbert
  2022-03-14 16:56       ` Peter Maydell
  2022-03-15 14:53       ` [PULL 00/18] migration queue Christian Borntraeger
  0 siblings, 2 replies; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-08 18:47 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, thuth
  Cc: Peter Maydell, quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch...,
	hreitz, f.ebner, jinpu.wang

* Philippe Mathieu-Daudé (philippe.mathieu.daude@gmail.com) wrote:
> On 3/3/22 15:46, Peter Maydell wrote:
> > On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
> > <dgilbert@redhat.com> wrote:
> > > 
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
> > > 
> > >    Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)
> > > 
> > > are available in the Git repository at:
> > > 
> > >    https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
> > > 
> > > for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
> > > 
> > >    migration: Remove load_state_old and minimum_version_id_old (2022-03-02 18:20:45 +0000)
> > > 
> > > ----------------------------------------------------------------
> > > Migration/HMP/Virtio pull 2022-03-02
> > > 
> > > A bit of a mix this time:
> > >    * Minor fixes from myself, Hanna, and Jack
> > >    * VNC password rework by Stefan and Fabian
> > >    * Postcopy changes from Peter X that are
> > >      the start of a larger series to come
> > >    * Removing the prehistoic load_state_old
> > >      code from Peter M
> 
> I'm seeing an error on the s390x runner:
> 
> ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> assertion failed: (bad == 0) ERROR
> 
>  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test            ERROR
> 78.87s   killed by signal 6 SIGABRT
> 
> https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848

Yeh, thuth mentioned that, it seems to only be s390 which is odd.
I'm not seeing anything obviously architecture dependent in that set, or
for that matter that plays with the ram migration stream much.
Is this reliable enough that someone with a tame s390 could bisect?

Dave

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-08 18:47     ` Dr. David Alan Gilbert
@ 2022-03-14 16:56       ` Peter Maydell
  2022-03-14 17:07         ` Daniel P. Berrangé
  2022-03-15 14:53       ` [PULL 00/18] migration queue Christian Borntraeger
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-14 16:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Tue, 8 Mar 2022 at 18:47, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>
> * Philippe Mathieu-Daudé (philippe.mathieu.daude@gmail.com) wrote:
> > I'm seeing an error on the s390x runner:
> >
> > ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> > assertion failed: (bad == 0) ERROR
> >
> >  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test            ERROR
> > 78.87s   killed by signal 6 SIGABRT
> >
> > https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848
>
> Yeh, thuth mentioned that, it seems to only be s390 which is odd.
> I'm not seeing anything obviously architecture dependent in that set, or
> for that matter that plays with the ram migration stream much.
> Is this reliable enough that someone with a tame s390 could bisect?

Didn't see a SIGABRT, but here's a gdb backtrace of a hang
in the migration test on s390 host. I have also observed the
migration test hanging on macos host, so I don't think this is
s390-specific.

Process tree:
migration-test(455775)-+-qemu-system-i38(456194)
                       |-qemu-system-i38(456200)
                       `-qemu-system-i38(456266)
===========================================================
PROCESS: 455775
linux1    455775  312266  5 14:36 pts/0    00:07:19
./tests/qtest/migration-test -tap -k
[New LWP 455776]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
__libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
../sysdeps/unix/sysv/linux/read.c:26
26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.

Thread 2 (Thread 0x3ff9c7ff900 (LWP 455776)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x000002aa1e8d24ca in qemu_futex_wait (f=0x2aa1e920cbc
<rcu_call_ready_event>, val=4294967295) at
/home/linux1/qemu/include/qemu/futex.h:29
#2  0x000002aa1e8d276e in qemu_event_wait (ev=0x2aa1e920cbc
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:481
#3  0x000002aa1e8e6ce2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
#4  0x000002aa1e8d2998 in qemu_thread_start (args=0x2aa200e51e0) at
../../util/qemu-thread-posix.c:556
#5  0x000003ff9ca87e66 in start_thread (arg=0x3ff9c7ff900) at
pthread_create.c:477
#6  0x000003ff9c97cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff9ce75430 (LWP 455775)):
#0  __libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=<optimized out>, buf=0x3ffe69fd816, nbytes=1) at
../sysdeps/unix/sysv/linux/read.c:24
#2  0x000002aa1e894652 in qmp_fd_receive (fd=4) at
../../tests/qtest/libqtest.c:613
#3  0x000002aa1e894816 in qtest_qmp_receive_dict (s=0x2aa200f6a20) at
../../tests/qtest/libqtest.c:648
#4  0x000002aa1e894782 in qtest_qmp_receive (s=0x2aa200f6a20) at
../../tests/qtest/libqtest.c:636
#5  0x000002aa1e894ede in qtest_vqmp (s=0x2aa200f6a20,
fmt=0x2aa1e8f140c "{ 'execute': 'query-migrate' }", ap=0x3ffe69fdb80)
at ../../tests/qtest/libqtest.c:749
#6  0x000002aa1e891ac0 in wait_command (who=0x2aa200f6a20,
command=0x2aa1e8f140c "{ 'execute': 'query-migrate' }") at
../../tests/qtest/migration-helpers.c:63
#7  0x000002aa1e891de8 in migrate_query (who=0x2aa200f6a20) at
../../tests/qtest/migration-helpers.c:107
#8  0x000002aa1e891e1a in migrate_query_status (who=0x2aa200f6a20) at
../../tests/qtest/migration-helpers.c:116
#9  0x000002aa1e891ef6 in check_migration_status (who=0x2aa200f6a20,
goal=0x2aa1e8f0f0e "cancelled", ungoals=0x0) at
../../tests/qtest/migration-helpers.c:132
#10 0x000002aa1e892150 in wait_for_migration_status
(who=0x2aa200f6a20, goal=0x2aa1e8f0f0e "cancelled", ungoals=0x0) at
../../tests/qtest/migration-helpers.c:156
#11 0x000002aa1e8910fa in test_multifd_tcp_cancel () at
../../tests/qtest/migration-test.c:1379
#12 0x000003ff9cc7e608 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x000003ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x000003ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x000003ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#16 0x000003ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#17 0x000003ff9cc7eada in g_test_run_suite () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#18 0x000003ff9cc7eb10 in g_test_run () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#19 0x000002aa1e891578 in main (argc=2, argv=0x3ffe69fece8) at
../../tests/qtest/migration-test.c:1491
[Inferior 1 (process 455775) detached]

===========================================================
PROCESS: 456194
linux1    456194  455775 85 14:39 pts/0    01:54:06 ./qemu-system-i386
-qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-dmqzpM/src_serial -drive
file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
[New LWP 456196]
[New LWP 456197]
[New LWP 456198]
[New LWP 456229]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x000003ff9a071c9c in __ppoll (fds=0x2aa2c46c2f0, nfds=5,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:44
44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 5 (Thread 0x3fee0ff9900 (LWP 456229)):
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0,
expected=0, futex_word=0x2aa2c46e7e4) at
../sysdeps/nptl/futex-internal.h:320
#1  do_futex_wait (sem=sem@entry=0x2aa2c46e7e0, abstime=0x0,
clockid=0) at sem_waitcommon.c:112
#2  0x000003ff9a191870 in __new_sem_wait_slow (sem=0x2aa2c46e7e0,
abstime=0x0, clockid=0) at sem_waitcommon.c:184
#3  0x000003ff9a19190e in __new_sem_wait (sem=<optimized out>) at sem_wait.c:42
#4  0x000002aa2923da1e in qemu_sem_wait (sem=0x2aa2c46e7e0) at
../../util/qemu-thread-posix.c:358
#5  0x000002aa289483cc in multifd_send_sync_main (f=0x2aa2b5f92d0) at
../../migration/multifd.c:610
#6  0x000002aa28dfa30c in ram_save_iterate (f=0x2aa2b5f92d0,
opaque=0x2aa29bf75d0 <ram_state>) at ../../migration/ram.c:3049
#7  0x000002aa28958fee in qemu_savevm_state_iterate (f=0x2aa2b5f92d0,
postcopy=false) at ../../migration/savevm.c:1296
#8  0x000002aa28942d40 in migration_iteration_run (s=0x2aa2b3f9800) at
../../migration/migration.c:3607
#9  0x000002aa289434da in migration_thread (opaque=0x2aa2b3f9800) at
../../migration/migration.c:3838
#10 0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b8b29e0) at
../../util/qemu-thread-posix.c:556
#11 0x000003ff9a187e66 in start_thread (arg=0x3fee0ff9900) at
pthread_create.c:477
#12 0x000003ff9a07cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 4 (Thread 0x3ff89f2f900 (LWP 456198)):
#0  env_neg (env=0x2aa2b5f5030) at /home/linux1/qemu/include/exec/cpu-all.h:478
#1  0x000002aa28f5376a in env_tlb (env=0x2aa2b5f5030) at
/home/linux1/qemu/include/exec/cpu-all.h:502
#2  0x000002aa28f538a8 in tlb_index (env=0x2aa2b5f5030, mmu_idx=2,
addr=73265152) at /home/linux1/qemu/include/exec/cpu_ldst.h:366
#3  0x000002aa28f574bc in tlb_set_page_with_attrs (cpu=0x2aa2b5ec750,
vaddr=73265152, paddr=73265152, attrs=..., prot=7, mmu_idx=2,
size=4096) at ../../accel/tcg/cputlb.c:1194
#4  0x000002aa28cdfd3e in handle_mmu_fault (cs=0x2aa2b5ec750,
addr=73265152, size=1, is_write1=0, mmu_idx=2) at
../../target/i386/tcg/sysemu/excp_helper.c:442
#5  0x000002aa28cdfe90 in x86_cpu_tlb_fill (cs=0x2aa2b5ec750,
addr=73265152, size=1, access_type=MMU_DATA_LOAD, mmu_idx=2,
probe=false, retaddr=4393820748608) at
../../target/i386/tcg/sysemu/excp_helper.c:468
#6  0x000002aa28f5794e in tlb_fill (cpu=0x2aa2b5ec750, addr=73265152,
size=1, access_type=MMU_DATA_LOAD, mmu_idx=2, retaddr=4393820748608)
at ../../accel/tcg/cputlb.c:1313
#7  0x000002aa28f59982 in load_helper (env=0x2aa2b5f5030,
addr=73265152, oi=3586, retaddr=4393820748608, op=MO_8,
code_read=false, full_load=0x2aa28f59db0 <full_ldub_mmu>) at
../../accel/tcg/cputlb.c:1934
#8  0x000002aa28f59e2e in full_ldub_mmu (env=0x2aa2b5f5030,
addr=73265152, oi=3586, retaddr=4393820748608) at
../../accel/tcg/cputlb.c:2025
#9  0x000002aa28f59e94 in helper_ret_ldub_mmu (env=0x2aa2b5f5030,
addr=73265152, oi=3586, retaddr=4393820748608) at
../../accel/tcg/cputlb.c:2031
#10 0x000003ff041ffbfa in code_gen_buffer ()
#11 0x000002aa28f3cfba in cpu_tb_exec (cpu=0x2aa2b5ec750,
itb=0x3ff441ffa00, tb_exit=0x3ff89f2af44) at
../../accel/tcg/cpu-exec.c:357
#12 0x000002aa28f3e47e in cpu_loop_exec_tb (cpu=0x2aa2b5ec750,
tb=0x3ff441ffa00, last_tb=0x3ff89f2af58, tb_exit=0x3ff89f2af44) at
../../accel/tcg/cpu-exec.c:847
#13 0x000002aa28f3e970 in cpu_exec (cpu=0x2aa2b5ec750) at
../../accel/tcg/cpu-exec.c:1006
#14 0x000002aa28f71a1e in tcg_cpus_exec (cpu=0x2aa2b5ec750) at
../../accel/tcg/tcg-accel-ops.c:68
#15 0x000002aa28f71efe in mttcg_cpu_thread_fn (arg=0x2aa2b5ec750) at
../../accel/tcg/tcg-accel-ops-mttcg.c:96
#16 0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b60bb00) at
../../util/qemu-thread-posix.c:556
#17 0x000003ff9a187e66 in start_thread (arg=0x3ff89f2f900) at
pthread_create.c:477
#18 0x000003ff9a07cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 3 (Thread 0x3ff8a821900 (LWP 456197)):
#0  0x000003ff9a071b42 in __GI___poll (fds=0x3fefc003280, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x000003ff9c7d4386 in  () at /lib/s390x-linux-gnu/libglib-2.0.so.0
#2  0x000003ff9c7d4790 in g_main_loop_run () at
/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  0x000002aa28fd9d56 in iothread_run (opaque=0x2aa2b339750) at
../../iothread.c:73
#4  0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b2e7980) at
../../util/qemu-thread-posix.c:556
#5  0x000003ff9a187e66 in start_thread (arg=0x3ff8a821900) at
pthread_create.c:477
#6  0x000003ff9a07cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 2 (Thread 0x3ff8b1a4900 (LWP 456196)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x000002aa2923db52 in qemu_futex_wait (f=0x2aa29c14244
<rcu_call_ready_event>, val=4294967295) at
/home/linux1/qemu/include/qemu/futex.h:29
#2  0x000002aa2923ddf6 in qemu_event_wait (ev=0x2aa29c14244
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:481
#3  0x000002aa2924cbd2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
#4  0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b26ac90) at
../../util/qemu-thread-posix.c:556
#5  0x000003ff9a187e66 in start_thread (arg=0x3ff8b1a4900) at
pthread_create.c:477
#6  0x000003ff9a07cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff9d5fe440 (LWP 456194)):
#0  0x000003ff9a071c9c in __ppoll (fds=0x2aa2c46c2f0, nfds=5,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x000002aa2927a3e4 in qemu_poll_ns (fds=0x2aa2c46c2f0, nfds=5,
timeout=27206167) at ../../util/qemu-timer.c:348
#2  0x000002aa29272280 in os_host_main_loop_wait (timeout=27206167) at
../../util/main-loop.c:250
#3  0x000002aa29272434 in main_loop_wait (nonblocking=0) at
../../util/main-loop.c:531
#4  0x000002aa28901276 in qemu_main_loop () at ../../softmmu/runstate.c:727
#5  0x000002aa2887d2ce in main (argc=25, argv=0x3fff647eac8,
envp=0x3fff647eb98) at ../../softmmu/main.c:50
[Inferior 1 (process 456194) detached]

===========================================================
PROCESS: 456200
linux1    456200  455775  0 14:39 pts/0    00:00:00 [qemu-system-i38] <defunct>
/proc/456200/exe: No such file or directory.
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
warning: process 456200 is a zombie - the process has already terminated
ptrace: Operation not permitted.
/home/linux1/456200: No such file or directory.

===========================================================
PROCESS: 456266
linux1    456266  455775  0 14:39 pts/0    00:00:00 ./qemu-system-i386
-qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
target,debug-threads=on -m 150M -serial
file:/tmp/migration-test-dmqzpM/dest_serial -incoming defer -drive
file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
[New LWP 456268]
[New LWP 456269]
[New LWP 456270]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x000003ff9a271c9c in __ppoll (fds=0x2aa0435eb40, nfds=6,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:44
44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 4 (Thread 0x3ff8a12f900 (LWP 456270)):
#0  futex_wait_cancelable (private=0, expected=0,
futex_word=0x2aa0459cbac) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0,
mutex=0x2aa02ddec88 <qemu_global_mutex>, cond=0x2aa0459cb80) at
pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x2aa0459cb80, mutex=0x2aa02ddec88
<qemu_global_mutex>) at pthread_cond_wait.c:638
#3  0x000002aa0243d498 in qemu_cond_wait_impl (cond=0x2aa0459cb80,
mutex=0x2aa02ddec88 <qemu_global_mutex>, file=0x2aa024e81e8
"../../softmmu/cpus.c", line=424) at
../../util/qemu-thread-posix.c:195
#4  0x000002aa01af4cc0 in qemu_wait_io_event (cpu=0x2aa0457d750) at
../../softmmu/cpus.c:424
#5  0x000002aa02172028 in mttcg_cpu_thread_fn (arg=0x2aa0457d750) at
../../accel/tcg/tcg-accel-ops-mttcg.c:124
#6  0x000002aa0243e020 in qemu_thread_start (args=0x2aa0459cbc0) at
../../util/qemu-thread-posix.c:556
#7  0x000003ff9a387e66 in start_thread (arg=0x3ff8a12f900) at
pthread_create.c:477
#8  0x000003ff9a27cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 3 (Thread 0x3ff8aa21900 (LWP 456269)):
#0  0x000003ff9a271b42 in __GI___poll (fds=0x3fefc003280, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x000003ff9c9d4386 in  () at /lib/s390x-linux-gnu/libglib-2.0.so.0
#2  0x000003ff9c9d4790 in g_main_loop_run () at
/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  0x000002aa021d9d56 in iothread_run (opaque=0x2aa042ca750) at
../../iothread.c:73
#4  0x000002aa0243e020 in qemu_thread_start (args=0x2aa04278980) at
../../util/qemu-thread-posix.c:556
#5  0x000003ff9a387e66 in start_thread (arg=0x3ff8aa21900) at
pthread_create.c:477
#6  0x000003ff9a27cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 2 (Thread 0x3ff8b3a4900 (LWP 456268)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x000002aa0243db52 in qemu_futex_wait (f=0x2aa02e14244
<rcu_call_ready_event>, val=4294967295) at
/home/linux1/qemu/include/qemu/futex.h:29
#2  0x000002aa0243ddf6 in qemu_event_wait (ev=0x2aa02e14244
<rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:481
#3  0x000002aa0244cbd2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
#4  0x000002aa0243e020 in qemu_thread_start (args=0x2aa041fbc90) at
../../util/qemu-thread-posix.c:556
#5  0x000003ff9a387e66 in start_thread (arg=0x3ff8b3a4900) at
pthread_create.c:477
#6  0x000003ff9a27cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff9d7fe440 (LWP 456266)):
#0  0x000003ff9a271c9c in __ppoll (fds=0x2aa0435eb40, nfds=6,
timeout=<optimized out>, sigmask=0x0) at
../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x000002aa0247a3e4 in qemu_poll_ns (fds=0x2aa0435eb40, nfds=6,
timeout=1000000000) at ../../util/qemu-timer.c:348
#2  0x000002aa02472280 in os_host_main_loop_wait (timeout=1000000000)
at ../../util/main-loop.c:250
#3  0x000002aa02472434 in main_loop_wait (nonblocking=0) at
../../util/main-loop.c:531
#4  0x000002aa01b01276 in qemu_main_loop () at ../../softmmu/runstate.c:727
#5  0x000002aa01a7d2ce in main (argc=27, argv=0x3ffe38fe7e8,
envp=0x3ffe38fe8c8) at ../../softmmu/main.c:50
[Inferior 1 (process 456266) detached]

thanks
-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 16:56       ` Peter Maydell
@ 2022-03-14 17:07         ` Daniel P. Berrangé
  2022-03-14 17:15           ` Peter Maydell
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel P. Berrangé @ 2022-03-14 17:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, Mar 14, 2022 at 04:56:18PM +0000, Peter Maydell wrote:
> On Tue, 8 Mar 2022 at 18:47, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >
> > * Philippe Mathieu-Daudé (philippe.mathieu.daude@gmail.com) wrote:
> > > I'm seeing an error on the s390x runner:
> > >
> > > ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> > > assertion failed: (bad == 0) ERROR
> > >
> > >  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test            ERROR
> > > 78.87s   killed by signal 6 SIGABRT
> > >
> > > https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848
> >
> > Yeh, thuth mentioned that, it seems to only be s390 which is odd.
> > I'm not seeing anything obviously architecture dependent in that set, or
> > for that matter that plays with the ram migration stream much.
> > Is this reliable enough that someone with a tame s390 could bisect?
> 
> Didn't see a SIGABRT, but here's a gdb backtrace of a hang
> in the migration test on s390 host. I have also observed the
> migration test hanging on macos host, so I don't think this is
> s390-specific.
> 
> Process tree:
> migration-test(455775)-+-qemu-system-i38(456194)
>                        |-qemu-system-i38(456200)
>                        `-qemu-system-i38(456266)
> ===========================================================
> PROCESS: 455775
> linux1    455775  312266  5 14:36 pts/0    00:07:19
> ./tests/qtest/migration-test -tap -k
> [New LWP 455776]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> __libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
> ../sysdeps/unix/sysv/linux/read.c:26
> 26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.


> 
> #5  0x000002aa1e894ede in qtest_vqmp (s=0x2aa200f6a20,
> fmt=0x2aa1e8f140c "{ 'execute': 'query-migrate' }", ap=0x3ffe69fdb80)
> at ../../tests/qtest/libqtest.c:749

So the test harness is waiting for a reply to 'query-migrate'.

This should be fast unless QEMU has hung in the main event
loop servicing monitor commands, or stopped.


> ===========================================================
> PROCESS: 456194
> linux1    456194  455775 85 14:39 pts/0    01:54:06 ./qemu-system-i386
> -qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> source,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-dmqzpM/src_serial -drive
> file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
> [New LWP 456196]
> [New LWP 456197]
> [New LWP 456198]
> [New LWP 456229]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> 0x000003ff9a071c9c in __ppoll (fds=0x2aa2c46c2f0, nfds=5,
> timeout=<optimized out>, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:44
> 44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
> 
> Thread 5 (Thread 0x3fee0ff9900 (LWP 456229)):
> #0  futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0,
> expected=0, futex_word=0x2aa2c46e7e4) at
> ../sysdeps/nptl/futex-internal.h:320
> #1  do_futex_wait (sem=sem@entry=0x2aa2c46e7e0, abstime=0x0,
> clockid=0) at sem_waitcommon.c:112
> #2  0x000003ff9a191870 in __new_sem_wait_slow (sem=0x2aa2c46e7e0,
> abstime=0x0, clockid=0) at sem_waitcommon.c:184
> #3  0x000003ff9a19190e in __new_sem_wait (sem=<optimized out>) at sem_wait.c:42
> #4  0x000002aa2923da1e in qemu_sem_wait (sem=0x2aa2c46e7e0) at
> ../../util/qemu-thread-posix.c:358
> #5  0x000002aa289483cc in multifd_send_sync_main (f=0x2aa2b5f92d0) at
> ../../migration/multifd.c:610
> #6  0x000002aa28dfa30c in ram_save_iterate (f=0x2aa2b5f92d0,
> opaque=0x2aa29bf75d0 <ram_state>) at ../../migration/ram.c:3049
> #7  0x000002aa28958fee in qemu_savevm_state_iterate (f=0x2aa2b5f92d0,
> postcopy=false) at ../../migration/savevm.c:1296
> #8  0x000002aa28942d40 in migration_iteration_run (s=0x2aa2b3f9800) at
> ../../migration/migration.c:3607
> #9  0x000002aa289434da in migration_thread (opaque=0x2aa2b3f9800) at
> ../../migration/migration.c:3838
> #10 0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b8b29e0) at
> ../../util/qemu-thread-posix.c:556
> #11 0x000003ff9a187e66 in start_thread (arg=0x3fee0ff9900) at
> pthread_create.c:477
> #12 0x000003ff9a07cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 4 (Thread 0x3ff89f2f900 (LWP 456198)):
> #0  env_neg (env=0x2aa2b5f5030) at /home/linux1/qemu/include/exec/cpu-all.h:478
> #1  0x000002aa28f5376a in env_tlb (env=0x2aa2b5f5030) at
> /home/linux1/qemu/include/exec/cpu-all.h:502
> #2  0x000002aa28f538a8 in tlb_index (env=0x2aa2b5f5030, mmu_idx=2,
> addr=73265152) at /home/linux1/qemu/include/exec/cpu_ldst.h:366
> #3  0x000002aa28f574bc in tlb_set_page_with_attrs (cpu=0x2aa2b5ec750,
> vaddr=73265152, paddr=73265152, attrs=..., prot=7, mmu_idx=2,
> size=4096) at ../../accel/tcg/cputlb.c:1194
> #4  0x000002aa28cdfd3e in handle_mmu_fault (cs=0x2aa2b5ec750,
> addr=73265152, size=1, is_write1=0, mmu_idx=2) at
> ../../target/i386/tcg/sysemu/excp_helper.c:442
> #5  0x000002aa28cdfe90 in x86_cpu_tlb_fill (cs=0x2aa2b5ec750,
> addr=73265152, size=1, access_type=MMU_DATA_LOAD, mmu_idx=2,
> probe=false, retaddr=4393820748608) at
> ../../target/i386/tcg/sysemu/excp_helper.c:468
> #6  0x000002aa28f5794e in tlb_fill (cpu=0x2aa2b5ec750, addr=73265152,
> size=1, access_type=MMU_DATA_LOAD, mmu_idx=2, retaddr=4393820748608)
> at ../../accel/tcg/cputlb.c:1313
> #7  0x000002aa28f59982 in load_helper (env=0x2aa2b5f5030,
> addr=73265152, oi=3586, retaddr=4393820748608, op=MO_8,
> code_read=false, full_load=0x2aa28f59db0 <full_ldub_mmu>) at
> ../../accel/tcg/cputlb.c:1934
> #8  0x000002aa28f59e2e in full_ldub_mmu (env=0x2aa2b5f5030,
> addr=73265152, oi=3586, retaddr=4393820748608) at
> ../../accel/tcg/cputlb.c:2025
> #9  0x000002aa28f59e94 in helper_ret_ldub_mmu (env=0x2aa2b5f5030,
> addr=73265152, oi=3586, retaddr=4393820748608) at
> ../../accel/tcg/cputlb.c:2031
> #10 0x000003ff041ffbfa in code_gen_buffer ()
> #11 0x000002aa28f3cfba in cpu_tb_exec (cpu=0x2aa2b5ec750,
> itb=0x3ff441ffa00, tb_exit=0x3ff89f2af44) at
> ../../accel/tcg/cpu-exec.c:357
> #12 0x000002aa28f3e47e in cpu_loop_exec_tb (cpu=0x2aa2b5ec750,
> tb=0x3ff441ffa00, last_tb=0x3ff89f2af58, tb_exit=0x3ff89f2af44) at
> ../../accel/tcg/cpu-exec.c:847
> #13 0x000002aa28f3e970 in cpu_exec (cpu=0x2aa2b5ec750) at
> ../../accel/tcg/cpu-exec.c:1006
> #14 0x000002aa28f71a1e in tcg_cpus_exec (cpu=0x2aa2b5ec750) at
> ../../accel/tcg/tcg-accel-ops.c:68
> #15 0x000002aa28f71efe in mttcg_cpu_thread_fn (arg=0x2aa2b5ec750) at
> ../../accel/tcg/tcg-accel-ops-mttcg.c:96
> #16 0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b60bb00) at
> ../../util/qemu-thread-posix.c:556
> #17 0x000003ff9a187e66 in start_thread (arg=0x3ff89f2f900) at
> pthread_create.c:477
> #18 0x000003ff9a07cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 3 (Thread 0x3ff8a821900 (LWP 456197)):
> #0  0x000003ff9a071b42 in __GI___poll (fds=0x3fefc003280, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ff9c7d4386 in  () at /lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ff9c7d4790 in g_main_loop_run () at
> /lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa28fd9d56 in iothread_run (opaque=0x2aa2b339750) at
> ../../iothread.c:73
> #4  0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b2e7980) at
> ../../util/qemu-thread-posix.c:556
> #5  0x000003ff9a187e66 in start_thread (arg=0x3ff8a821900) at
> pthread_create.c:477
> #6  0x000003ff9a07cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3ff8b1a4900 (LWP 456196)):
> #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
> #1  0x000002aa2923db52 in qemu_futex_wait (f=0x2aa29c14244
> <rcu_call_ready_event>, val=4294967295) at
> /home/linux1/qemu/include/qemu/futex.h:29
> #2  0x000002aa2923ddf6 in qemu_event_wait (ev=0x2aa29c14244
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:481
> #3  0x000002aa2924cbd2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
> #4  0x000002aa2923e020 in qemu_thread_start (args=0x2aa2b26ac90) at
> ../../util/qemu-thread-posix.c:556
> #5  0x000003ff9a187e66 in start_thread (arg=0x3ff8b1a4900) at
> pthread_create.c:477
> #6  0x000003ff9a07cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff9d5fe440 (LWP 456194)):
> #0  0x000003ff9a071c9c in __ppoll (fds=0x2aa2c46c2f0, nfds=5,
> timeout=<optimized out>, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:44
> #1  0x000002aa2927a3e4 in qemu_poll_ns (fds=0x2aa2c46c2f0, nfds=5,
> timeout=27206167) at ../../util/qemu-timer.c:348
> #2  0x000002aa29272280 in os_host_main_loop_wait (timeout=27206167) at
> ../../util/main-loop.c:250
> #3  0x000002aa29272434 in main_loop_wait (nonblocking=0) at
> ../../util/main-loop.c:531
> #4  0x000002aa28901276 in qemu_main_loop () at ../../softmmu/runstate.c:727
> #5  0x000002aa2887d2ce in main (argc=25, argv=0x3fff647eac8,
> envp=0x3fff647eb98) at ../../softmmu/main.c:50
> [Inferior 1 (process 456194) detached]


No obvious sign of a hang that would cause it to fail to reply
to 'query-migrate'



> ===========================================================
> PROCESS: 456266
> linux1    456266  455775  0 14:39 pts/0    00:00:00 ./qemu-system-i386
> -qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> target,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-dmqzpM/dest_serial -incoming defer -drive
> file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
> [New LWP 456268]
> [New LWP 456269]
> [New LWP 456270]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> 0x000003ff9a271c9c in __ppoll (fds=0x2aa0435eb40, nfds=6,
> timeout=<optimized out>, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:44
> 44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
> 
> Thread 4 (Thread 0x3ff8a12f900 (LWP 456270)):
> #0  futex_wait_cancelable (private=0, expected=0,
> futex_word=0x2aa0459cbac) at ../sysdeps/nptl/futex-internal.h:183
> #1  __pthread_cond_wait_common (abstime=0x0, clockid=0,
> mutex=0x2aa02ddec88 <qemu_global_mutex>, cond=0x2aa0459cb80) at
> pthread_cond_wait.c:508
> #2  __pthread_cond_wait (cond=0x2aa0459cb80, mutex=0x2aa02ddec88
> <qemu_global_mutex>) at pthread_cond_wait.c:638
> #3  0x000002aa0243d498 in qemu_cond_wait_impl (cond=0x2aa0459cb80,
> mutex=0x2aa02ddec88 <qemu_global_mutex>, file=0x2aa024e81e8
> "../../softmmu/cpus.c", line=424) at
> ../../util/qemu-thread-posix.c:195
> #4  0x000002aa01af4cc0 in qemu_wait_io_event (cpu=0x2aa0457d750) at
> ../../softmmu/cpus.c:424
> #5  0x000002aa02172028 in mttcg_cpu_thread_fn (arg=0x2aa0457d750) at
> ../../accel/tcg/tcg-accel-ops-mttcg.c:124
> #6  0x000002aa0243e020 in qemu_thread_start (args=0x2aa0459cbc0) at
> ../../util/qemu-thread-posix.c:556
> #7  0x000003ff9a387e66 in start_thread (arg=0x3ff8a12f900) at
> pthread_create.c:477
> #8  0x000003ff9a27cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 3 (Thread 0x3ff8aa21900 (LWP 456269)):
> #0  0x000003ff9a271b42 in __GI___poll (fds=0x3fefc003280, nfds=3,
> timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
> #1  0x000003ff9c9d4386 in  () at /lib/s390x-linux-gnu/libglib-2.0.so.0
> #2  0x000003ff9c9d4790 in g_main_loop_run () at
> /lib/s390x-linux-gnu/libglib-2.0.so.0
> #3  0x000002aa021d9d56 in iothread_run (opaque=0x2aa042ca750) at
> ../../iothread.c:73
> #4  0x000002aa0243e020 in qemu_thread_start (args=0x2aa04278980) at
> ../../util/qemu-thread-posix.c:556
> #5  0x000003ff9a387e66 in start_thread (arg=0x3ff8aa21900) at
> pthread_create.c:477
> #6  0x000003ff9a27cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 2 (Thread 0x3ff8b3a4900 (LWP 456268)):
> #0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
> #1  0x000002aa0243db52 in qemu_futex_wait (f=0x2aa02e14244
> <rcu_call_ready_event>, val=4294967295) at
> /home/linux1/qemu/include/qemu/futex.h:29
> #2  0x000002aa0243ddf6 in qemu_event_wait (ev=0x2aa02e14244
> <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:481
> #3  0x000002aa0244cbd2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
> #4  0x000002aa0243e020 in qemu_thread_start (args=0x2aa041fbc90) at
> ../../util/qemu-thread-posix.c:556
> #5  0x000003ff9a387e66 in start_thread (arg=0x3ff8b3a4900) at
> pthread_create.c:477
> #6  0x000003ff9a27cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 1 (Thread 0x3ff9d7fe440 (LWP 456266)):
> #0  0x000003ff9a271c9c in __ppoll (fds=0x2aa0435eb40, nfds=6,
> timeout=<optimized out>, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:44
> #1  0x000002aa0247a3e4 in qemu_poll_ns (fds=0x2aa0435eb40, nfds=6,
> timeout=1000000000) at ../../util/qemu-timer.c:348
> #2  0x000002aa02472280 in os_host_main_loop_wait (timeout=1000000000)
> at ../../util/main-loop.c:250
> #3  0x000002aa02472434 in main_loop_wait (nonblocking=0) at
> ../../util/main-loop.c:531
> #4  0x000002aa01b01276 in qemu_main_loop () at ../../softmmu/runstate.c:727
> #5  0x000002aa01a7d2ce in main (argc=27, argv=0x3ffe38fe7e8,
> envp=0x3ffe38fe8c8) at ../../softmmu/main.c:50
> [Inferior 1 (process 456266) detached]

No obvious sign of trouble here either.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 17:07         ` Daniel P. Berrangé
@ 2022-03-14 17:15           ` Peter Maydell
  2022-03-14 17:24             ` Daniel P. Berrangé
                               ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Peter Maydell @ 2022-03-14 17:15 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé <berrange@redhat.com> wrote:
> So the test harness is waiting for a reply to 'query-migrate'.
>
> This should be fast unless QEMU has hung in the main event
> loop servicing monitor commands, or stopped.

I was kind of loose with the terminology -- I don't remember whether
it was actually hung in the sense of stopped entirely, or just
"sat in a loop waiting for a migration state that never arrives".
I'll try to look more closely if I can catch it in the act again.

One thing that makes this bug investigation trickier, incidentally,
is that the migration-test code seems to depend on userfaultfd.
That means you can't run it under 'rr'.

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 17:15           ` Peter Maydell
@ 2022-03-14 17:24             ` Daniel P. Berrangé
  2022-03-14 17:54             ` Dr. David Alan Gilbert
  2022-03-14 18:58             ` Peter Maydell
  2 siblings, 0 replies; 47+ messages in thread
From: Daniel P. Berrangé @ 2022-03-14 17:24 UTC (permalink / raw)
  To: Peter Maydell
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, Mar 14, 2022 at 05:15:57PM +0000, Peter Maydell wrote:
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
> 
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

Ah yes, if it is just forever migrating, that would match the
stack traces shown from the QEMUs. 

> One thing that makes this bug investigation trickier, incidentally,
> is that the migration-test code seems to depend on userfaultfd.
> That means you can't run it under 'rr'.

Yeah, we also can't turn on the tracing for a live QEMU since the
monitor connection is already in use. Kinda need to have a second
monitor instance present, that we can connect to for debugging
the migration state.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 17:15           ` Peter Maydell
  2022-03-14 17:24             ` Daniel P. Berrangé
@ 2022-03-14 17:54             ` Dr. David Alan Gilbert
  2022-03-14 18:08               ` Peter Maydell
  2022-03-14 18:58             ` Peter Maydell
  2 siblings, 1 reply; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-14 17:54 UTC (permalink / raw)
  To: Peter Maydell
  Cc: thuth, Daniel P. Berrangé,
	quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
> 
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

Yeh, there's a big difference; still, if it's always in this test at
that point, then I think it's one for Juan; it looks like multifd cancel
path.

> One thing that makes this bug investigation trickier, incidentally,
> is that the migration-test code seems to depend on userfaultfd.
> That means you can't run it under 'rr'.

That should only be the postcopy tests; the others shouldn't use that.

Dave

> 
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 17:54             ` Dr. David Alan Gilbert
@ 2022-03-14 18:08               ` Peter Maydell
  2022-03-14 18:20                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-14 18:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thuth, Daniel P. Berrangé,
	quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> Peter Maydell (peter.maydell@linaro.org) wrote:
> > One thing that makes this bug investigation trickier, incidentally,
> > is that the migration-test code seems to depend on userfaultfd.
> > That means you can't run it under 'rr'.
>
> That should only be the postcopy tests; the others shouldn't use that.

tests/qtest/migration-test.c:main() exits immediately without adding
any of the test cases if ufd_version_check() fails, so no userfaultfd
means no tests run at all, currently.

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 18:08               ` Peter Maydell
@ 2022-03-14 18:20                 ` Dr. David Alan Gilbert
  2022-03-14 18:53                   ` Daniel P. Berrangé
  0 siblings, 1 reply; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-14 18:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: thuth, Daniel P. Berrangé,
	quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > Peter Maydell (peter.maydell@linaro.org) wrote:
> > > One thing that makes this bug investigation trickier, incidentally,
> > > is that the migration-test code seems to depend on userfaultfd.
> > > That means you can't run it under 'rr'.
> >
> > That should only be the postcopy tests; the others shouldn't use that.
> 
> tests/qtest/migration-test.c:main() exits immediately without adding
> any of the test cases if ufd_version_check() fails, so no userfaultfd
> means no tests run at all, currently.

Ouch! I could swear we had a fix for that.

Anyway, it would be really good to see what migrate-query was returning;
if it's stuck in running or cancelling then it's a problem with multifd
that needs to learn to let go if someone is trying to cancel.
If it's failed or similar then the test needs fixing to not lockup.

Dave

> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 18:20                 ` Dr. David Alan Gilbert
@ 2022-03-14 18:53                   ` Daniel P. Berrangé
  2022-03-15  2:41                     ` Peter Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel P. Berrangé @ 2022-03-14 18:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, thuth, quintela, s.reiter, qemu-devel, peterx,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, Mar 14, 2022 at 06:20:54PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> > >
> > > Peter Maydell (peter.maydell@linaro.org) wrote:
> > > > One thing that makes this bug investigation trickier, incidentally,
> > > > is that the migration-test code seems to depend on userfaultfd.
> > > > That means you can't run it under 'rr'.
> > >
> > > That should only be the postcopy tests; the others shouldn't use that.
> > 
> > tests/qtest/migration-test.c:main() exits immediately without adding
> > any of the test cases if ufd_version_check() fails, so no userfaultfd
> > means no tests run at all, currently.
> 
> Ouch! I could swear we had a fix for that.
> 
> Anyway, it would be really good to see what migrate-query was returning;
> if it's stuck in running or cancelling then it's a problem with multifd
> that needs to learn to let go if someone is trying to cancel.
> If it's failed or similar then the test needs fixing to not lockup.

This patch of mine may well be helpful:

  https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03192.html

when debugging my TLS tests various mistakes meant I ended up with
a failed session, but the test was spinning forever on 'query-migrate'.
It was waiting for it to finish one iteration, and never bothering to
validate that the reported status == active.

If that patch was merged, it might well cause the test to abort in an
assertion rather than spining forever, if status == failed.

Of course someone would still need to find out why it failed, but
none the less, I think assert is nicer than spin forever.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 17:15           ` Peter Maydell
  2022-03-14 17:24             ` Daniel P. Berrangé
  2022-03-14 17:54             ` Dr. David Alan Gilbert
@ 2022-03-14 18:58             ` Peter Maydell
  2022-03-14 19:44               ` Peter Maydell
  2 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-14 18:58 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, 14 Mar 2022 at 17:15, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
>
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

I just hit the abort case, narrowing it down to the
/i386/migration/multifd/tcp/zlib case, which can hit this without
any other tests being run:

$ QTEST_QEMU_BINARY=./qemu-system-i386 ./tests/qtest/migration-test
-tap -k -p /i386/migration/multifd/tcp/zlib
# random seed: R02S37eab07b59417f6cd7e26d94df0d3908
# Start of i386 tests
# Start of migration tests
# Start of multifd tests
# Start of tcp tests
# starting QEMU: exec ./qemu-system-i386 -qtest
unix:/tmp/qtest-782502.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-782502.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-H8Ggsm/src_serial -drive
file=/tmp/migration-test-H8Ggsm/bootsect,format=raw    -accel qtest
# starting QEMU: exec ./qemu-system-i386 -qtest
unix:/tmp/qtest-782502.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-782502.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
target,debug-threads=on -m 150M -serial
file:/tmp/migration-test-H8Ggsm/dest_serial -incoming defer -drive
file=/tmp/migration-test-H8Ggsm/bootsect,format=raw    -accel qtest
Memory content inconsistency at 5f76000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f77000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f78000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f79000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7a000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7b000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7c000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7d000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7e000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7f000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
and in another 17 pages**
ERROR:../../tests/qtest/migration-test.c:276:check_guests_ram:
assertion failed: (bad == 0)
Bail out! ERROR:../../tests/qtest/migration-test.c:276:check_guests_ram:
assertion failed: (bad == 0)
Aborted (core dumped)

This test seems to fail fairly frequently. I'll try a bisect...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 18:58             ` Peter Maydell
@ 2022-03-14 19:44               ` Peter Maydell
  2022-03-15 14:39                 ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-14 19:44 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thuth, quintela, s.reiter, qemu-devel, peterx,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 14 Mar 2022 at 17:15, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > So the test harness is waiting for a reply to 'query-migrate'.
> > >
> > > This should be fast unless QEMU has hung in the main event
> > > loop servicing monitor commands, or stopped.
> >
> > I was kind of loose with the terminology -- I don't remember whether
> > it was actually hung in the sense of stopped entirely, or just
> > "sat in a loop waiting for a migration state that never arrives".
> > I'll try to look more closely if I can catch it in the act again.
>
> I just hit the abort case, narrowing it down to the
> /i386/migration/multifd/tcp/zlib case, which can hit this without
> any other tests being run:

> This test seems to fail fairly frequently. I'll try a bisect...

On this s390 machine, this test has been intermittent since
it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
multifd support") in 2019. On that commit (after 31 successful
runs):

# random seed: R02S17937f515046216afcc72143266b3e1f
# Start of i386 tests
# Start of migration tests
# Start of multifd tests
# Start of tcp tests
# starting QEMU: exec ./build/i386/i386-softmmu/qemu-system-i386
-qtest unix:/tmp/qtest-861747.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-861747.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-7qODSs/src_serial -drive
file=/tmp/migration-test-7qODSs/bootsect,format=raw    -accel qtest
qemu-system-i386: -accel kvm: invalid accelerator kvm
qemu-system-i386: falling back to tcg
# starting QEMU: exec ./build/i386/i386-softmmu/qemu-system-i386
-qtest unix:/tmp/qtest-861747.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-861747.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
target,debug-threads=on -m 150M -serial
file:/tmp/migration-test-7qODSs/dest_serial -incoming defer -drive
file=/tmp/migration-test-7qODSs/bootsect,format=raw    -accel qtest
qemu-system-i386: -accel kvm: invalid accelerator kvm
qemu-system-i386: falling back to tcg
Memory content inconsistency at 5cff000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d00000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d01000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d02000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d03000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d04000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d05000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d06000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d07000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d08000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
and in another 118 pages**
ERROR:/home/linux1/qemu/tests/qtest/migration-test.c:268:check_guests_ram:
assertion failed: (bad == 0)
Bail out! ERROR:/home/linux1/qemu/tests/qtest/migration-test.c:268:check_guests_ram:
assertion failed: (bad == 0)
Aborted (core dumped)

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-14 18:53                   ` Daniel P. Berrangé
@ 2022-03-15  2:41                     ` Peter Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Xu @ 2022-03-15  2:41 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, thuth, quintela, s.reiter, qemu-devel,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	hreitz, f.ebner, jinpu.wang

On Mon, Mar 14, 2022 at 06:53:29PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 14, 2022 at 06:20:54PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Maydell (peter.maydell@linaro.org) wrote:
> > > On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
> > > <dgilbert@redhat.com> wrote:
> > > >
> > > > Peter Maydell (peter.maydell@linaro.org) wrote:
> > > > > One thing that makes this bug investigation trickier, incidentally,
> > > > > is that the migration-test code seems to depend on userfaultfd.
> > > > > That means you can't run it under 'rr'.
> > > >
> > > > That should only be the postcopy tests; the others shouldn't use that.
> > > 
> > > tests/qtest/migration-test.c:main() exits immediately without adding
> > > any of the test cases if ufd_version_check() fails, so no userfaultfd
> > > means no tests run at all, currently.
> > 
> > Ouch! I could swear we had a fix for that.

https://lore.kernel.org/qemu-devel/20210615175523.439830-2-peterx@redhat.com/

I remembered for some reason that pull (containing this patch) got issues
on applying, and that patch got forgotten.

> > 
> > Anyway, it would be really good to see what migrate-query was returning;
> > if it's stuck in running or cancelling then it's a problem with multifd
> > that needs to learn to let go if someone is trying to cancel.
> > If it's failed or similar then the test needs fixing to not lockup.
> 
> This patch of mine may well be helpful:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03192.html
> 
> when debugging my TLS tests various mistakes meant I ended up with
> a failed session, but the test was spinning forever on 'query-migrate'.
> It was waiting for it to finish one iteration, and never bothering to
> validate that the reported status == active.
> 
> If that patch was merged, it might well cause the test to abort in an
> assertion rather than spining forever, if status == failed.
> 
> Of course someone would still need to find out why it failed, but
> none the less, I think assert is nicer than spin forever.

Agreed.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 47+ messages in thread

* multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-14 19:44               ` Peter Maydell
@ 2022-03-15 14:39                 ` Peter Maydell
  2022-03-15 15:03                   ` Peter Maydell
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-15 14:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, Juan Quintela, s.reiter, QEMU Developers, Peter Xu,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Mon, 14 Mar 2022 at 19:44, Peter Maydell <peter.maydell@linaro.org> wrote:
> On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org> wrote:
> > I just hit the abort case, narrowing it down to the
> > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > any other tests being run:
>
> > This test seems to fail fairly frequently. I'll try a bisect...
>
> On this s390 machine, this test has been intermittent since
> it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> multifd support") in 2019.

I have tried (on current master) runs of various of the other
migration tests, and:
 * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
   failing
 * /i386/migration/precopy/tcp completed 4669 iterations without
   failing
 * /i386/migration/multifd/tcp/zlib fails usually within the first
   10 iterations (the most I ever saw it manage was 32)

So whatever this is, it seems like it might be specific to the
zlib code somehow ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-03-08 18:47     ` Dr. David Alan Gilbert
  2022-03-14 16:56       ` Peter Maydell
@ 2022-03-15 14:53       ` Christian Borntraeger
  1 sibling, 0 replies; 47+ messages in thread
From: Christian Borntraeger @ 2022-03-15 14:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Philippe Mathieu-Daudé, thuth
  Cc: Peter Maydell, Ilya Leoshkevich, quintela, s.reiter, qemu-devel,
	peterx, open list:S390 general arch...,
	hreitz, f.ebner, jinpu.wang

Am 08.03.22 um 19:47 schrieb Dr. David Alan Gilbert:
> * Philippe Mathieu-Daudé (philippe.mathieu.daude@gmail.com) wrote:
>> On 3/3/22 15:46, Peter Maydell wrote:
>>> On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
>>> <dgilbert@redhat.com> wrote:
>>>>
>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>
>>>> The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
>>>>
>>>>     Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>>     https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
>>>>
>>>> for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
>>>>
>>>>     migration: Remove load_state_old and minimum_version_id_old (2022-03-02 18:20:45 +0000)
>>>>
>>>> ----------------------------------------------------------------
>>>> Migration/HMP/Virtio pull 2022-03-02
>>>>
>>>> A bit of a mix this time:
>>>>     * Minor fixes from myself, Hanna, and Jack
>>>>     * VNC password rework by Stefan and Fabian
>>>>     * Postcopy changes from Peter X that are
>>>>       the start of a larger series to come
>>>>     * Removing the prehistoic load_state_old
>>>>       code from Peter M
>>
>> I'm seeing an error on the s390x runner:
>>
>> ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
>> assertion failed: (bad == 0) ERROR
>>
>>   26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test            ERROR
>> 78.87s   killed by signal 6 SIGABRT
>>
>> https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848
> 
> Yeh, thuth mentioned that, it seems to only be s390 which is odd.
> I'm not seeing anything obviously architecture dependent in that set, or
> for that matter that plays with the ram migration stream much.
> Is this reliable enough that someone with a tame s390 could bisect?

I just asked Peter to try with DFLTCC=0 to disable the hardware acceleration. Maybe
the zlib library still has a bug? (We are not aware of any problem right now).
In case DFLTCC makes a difference, this would be something for Ilya to look at.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 14:39                 ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
@ 2022-03-15 15:03                   ` Peter Maydell
  2022-03-15 15:30                     ` Peter Maydell
  2022-03-15 16:14                     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 47+ messages in thread
From: Peter Maydell @ 2022-03-15 15:03 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, Christian Borntraeger, Ilya Leoshkevich,
	Juan Quintela, s.reiter, QEMU Developers, Peter Xu,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Tue, 15 Mar 2022 at 14:39, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 14 Mar 2022 at 19:44, Peter Maydell <peter.maydell@linaro.org> wrote:
> > On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > I just hit the abort case, narrowing it down to the
> > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > any other tests being run:
> >
> > > This test seems to fail fairly frequently. I'll try a bisect...
> >
> > On this s390 machine, this test has been intermittent since
> > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > multifd support") in 2019.
>
> I have tried (on current master) runs of various of the other
> migration tests, and:
>  * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
>    failing
>  * /i386/migration/precopy/tcp completed 4669 iterations without
>    failing
>  * /i386/migration/multifd/tcp/zlib fails usually within the first
>    10 iterations (the most I ever saw it manage was 32)
>
> So whatever this is, it seems like it might be specific to the
> zlib code somehow ?

Maybe we're running into this bug
https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
("zlib: compressBound() returns an incorrect result on z15") ?

That bug report claims it doesn't affect focal, though, which
is what we're running on this box (specifically, the zlib1g
package is version 1:1.2.11.dfsg-2ubuntu1.2).

A run with DFLTCC=0 has made it past 60 iterations so far, which
suggests that that does serve as a workaround for the bug.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 15:03                   ` Peter Maydell
@ 2022-03-15 15:30                     ` Peter Maydell
  2022-03-15 15:40                       ` Daniel P. Berrangé
  2022-03-15 16:14                     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2022-03-15 15:30 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, Christian Borntraeger, Ilya Leoshkevich,
	Juan Quintela, s.reiter, QEMU Developers, Peter Xu,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Tue, 15 Mar 2022 at 15:03, Peter Maydell <peter.maydell@linaro.org> wrote:
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?

Full repro info, since it's a bit hidden in this long thread:

Build an i386 guest QEMU; I used this configure command:

'../../configure' '--target-list=i386-softmmu' '--enable-debug'
'--with-pkgversion=pm215' '--disable-docs'

Then run the multifd/tcp/zlib test in a tight loop:

X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
./build/i386/tests/qtest/migration-test  -tap -k -p
/i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done

Without DFLTCC=0 it fails typically within 5 or so iterations;
the longest I've ever seen it go is about 32.

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 15:30                     ` Peter Maydell
@ 2022-03-15 15:40                       ` Daniel P. Berrangé
  2022-03-15 15:44                         ` multifd/tcp/zlib intermittent abort Thomas Huth
  2022-03-15 15:46                         ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
  0 siblings, 2 replies; 47+ messages in thread
From: Daniel P. Berrangé @ 2022-03-15 15:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Thomas Huth, Christian Borntraeger, Ilya Leoshkevich,
	Juan Quintela, s.reiter, QEMU Developers, Peter Xu,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Tue, Mar 15, 2022 at 03:30:27PM +0000, Peter Maydell wrote:
> On Tue, 15 Mar 2022 at 15:03, Peter Maydell <peter.maydell@linaro.org> wrote:
> > Maybe we're running into this bug
> > https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> > ("zlib: compressBound() returns an incorrect result on z15") ?
> 
> Full repro info, since it's a bit hidden in this long thread:
> 
> Build an i386 guest QEMU; I used this configure command:
> 
> '../../configure' '--target-list=i386-softmmu' '--enable-debug'
> '--with-pkgversion=pm215' '--disable-docs'
> 
> Then run the multifd/tcp/zlib test in a tight loop:
> 
> X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
> ./build/i386/tests/qtest/migration-test  -tap -k -p
> /i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done
> 
> Without DFLTCC=0 it fails typically within 5 or so iterations;
> the longest I've ever seen it go is about 32.

So if this is a host OS package bug we punt to OS vendor to fix,
and just apply workaround in our CI ?  eg

$ git diff
diff --git a/.travis.yml b/.travis.yml
index c3c8048842..6da4c9f640 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -218,6 +218,7 @@ jobs:
         - TEST_CMD="make check check-tcg V=1"
         - CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
         - UNRELIABLE=true
+        - DFLTCC=0
       script:
         - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
         - |



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort
  2022-03-15 15:40                       ` Daniel P. Berrangé
@ 2022-03-15 15:44                         ` Thomas Huth
  2022-03-15 17:01                           ` Daniel P. Berrangé
  2022-03-15 15:46                         ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
  1 sibling, 1 reply; 47+ messages in thread
From: Thomas Huth @ 2022-03-15 15:44 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Maydell
  Cc: Christian Borntraeger, Ilya Leoshkevich, Juan Quintela, s.reiter,
	QEMU Developers, Peter Xu, Dr. David Alan Gilbert,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On 15/03/2022 16.40, Daniel P. Berrangé wrote:
> On Tue, Mar 15, 2022 at 03:30:27PM +0000, Peter Maydell wrote:
>> On Tue, 15 Mar 2022 at 15:03, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> Maybe we're running into this bug
>>> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
>>> ("zlib: compressBound() returns an incorrect result on z15") ?
>>
>> Full repro info, since it's a bit hidden in this long thread:
>>
>> Build an i386 guest QEMU; I used this configure command:
>>
>> '../../configure' '--target-list=i386-softmmu' '--enable-debug'
>> '--with-pkgversion=pm215' '--disable-docs'
>>
>> Then run the multifd/tcp/zlib test in a tight loop:
>>
>> X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
>> ./build/i386/tests/qtest/migration-test  -tap -k -p
>> /i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done
>>
>> Without DFLTCC=0 it fails typically within 5 or so iterations;
>> the longest I've ever seen it go is about 32.
> 
> So if this is a host OS package bug we punt to OS vendor to fix,
> and just apply workaround in our CI ?  eg
> 
> $ git diff
> diff --git a/.travis.yml b/.travis.yml
> index c3c8048842..6da4c9f640 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -218,6 +218,7 @@ jobs:
>           - TEST_CMD="make check check-tcg V=1"
>           - CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
>           - UNRELIABLE=true
> +        - DFLTCC=0
>         script:
>           - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
>           - |

Sounds like a good idea - but you should certainly add a proper comment 
here, too, so that we can later remind ourselves to remove the workaround again.

  Thomas



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 15:40                       ` Daniel P. Berrangé
  2022-03-15 15:44                         ` multifd/tcp/zlib intermittent abort Thomas Huth
@ 2022-03-15 15:46                         ` Peter Maydell
  1 sibling, 0 replies; 47+ messages in thread
From: Peter Maydell @ 2022-03-15 15:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, Christian Borntraeger, Ilya Leoshkevich,
	Juan Quintela, s.reiter, QEMU Developers, Peter Xu,
	Dr. David Alan Gilbert, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Tue, 15 Mar 2022 at 15:41, Daniel P. Berrangé <berrange@redhat.com> wrote:
> So if this is a host OS package bug we punt to OS vendor to fix,
> and just apply workaround in our CI ?  eg
>
> $ git diff
> diff --git a/.travis.yml b/.travis.yml
> index c3c8048842..6da4c9f640 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -218,6 +218,7 @@ jobs:
>          - TEST_CMD="make check check-tcg V=1"
>          - CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
>          - UNRELIABLE=true
> +        - DFLTCC=0
>        script:
>          - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
>          - |
>

Yes, that seems like the best approach. We also need to
adjust the gitlab CI config for the s390-host jobs. (In that
case we control the system being used so if there's a fixed
zlib we could install it, but for the travis stuff we'll probably
need the workaround for some time.)

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 15:03                   ` Peter Maydell
  2022-03-15 15:30                     ` Peter Maydell
@ 2022-03-15 16:14                     ` Dr. David Alan Gilbert
  2022-03-15 16:21                       ` Peter Maydell
  1 sibling, 1 reply; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-15 16:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Thomas Huth, Christian Borntraeger, Daniel P. Berrangé,
	Ilya Leoshkevich, Juan Quintela, s.reiter, QEMU Developers,
	Peter Xu, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Tue, 15 Mar 2022 at 14:39, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 14 Mar 2022 at 19:44, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > > I just hit the abort case, narrowing it down to the
> > > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > > any other tests being run:
> > >
> > > > This test seems to fail fairly frequently. I'll try a bisect...
> > >
> > > On this s390 machine, this test has been intermittent since
> > > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > > multifd support") in 2019.
> >
> > I have tried (on current master) runs of various of the other
> > migration tests, and:
> >  * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
> >    failing
> >  * /i386/migration/precopy/tcp completed 4669 iterations without
> >    failing
> >  * /i386/migration/multifd/tcp/zlib fails usually within the first
> >    10 iterations (the most I ever saw it manage was 32)
> >
> > So whatever this is, it seems like it might be specific to the
> > zlib code somehow ?
> 
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?

The initial description of compressBound being wrong doesn't
feel like it would cause that; it claims it would trigger an error
(I'm not sure how good we are at spotting that!); but then later
in the description it says:

'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
  (incl. the bit definitions), may cause various and unforseen defects'

Certainly looks like a 'various and unforseen defect'.

Dave

> That bug report claims it doesn't affect focal, though, which
> is what we're running on this box (specifically, the zlib1g
> package is version 1:1.2.11.dfsg-2ubuntu1.2).
> 
> A run with DFLTCC=0 has made it past 60 iterations so far, which
> suggests that that does serve as a workaround for the bug.
> 
> thanks
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)
  2022-03-15 16:14                     ` Dr. David Alan Gilbert
@ 2022-03-15 16:21                       ` Peter Maydell
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Maydell @ 2022-03-15 16:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Thomas Huth, Christian Borntraeger, Daniel P. Berrangé,
	Ilya Leoshkevich, Juan Quintela, s.reiter, QEMU Developers,
	Peter Xu, open list:S390 general arch...,
	Philippe Mathieu-Daudé,
	Hanna Reitz, f.ebner, Jinpu Wang

On Tue, 15 Mar 2022 at 16:15, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> The initial description of compressBound being wrong doesn't
> feel like it would cause that; it claims it would trigger an error
> (I'm not sure how good we are at spotting that!); but then later
> in the description it says:
>
> 'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
>   (incl. the bit definitions), may cause various and unforseen defects'
>
> Certainly looks like a 'various and unforseen defect'.

Mmm. I couldn't get the testcase in that bug to fail on the machine
I see the migration-test fails on, so it presumably is a different bug
(or just faintly possibly a QEMU bug that's only tickled by the
specifics of the accelerated zlib behaviour).

-- PMM


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: multifd/tcp/zlib intermittent abort
  2022-03-15 15:44                         ` multifd/tcp/zlib intermittent abort Thomas Huth
@ 2022-03-15 17:01                           ` Daniel P. Berrangé
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel P. Berrangé @ 2022-03-15 17:01 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Peter Maydell, f.ebner, Ilya Leoshkevich, Juan Quintela,
	s.reiter, QEMU Developers, Peter Xu, Dr. David Alan Gilbert,
	open list:S390 general arch..., Philippe Mathieu-Daudé,
	Hanna Reitz, Christian Borntraeger, Jinpu Wang

On Tue, Mar 15, 2022 at 04:44:37PM +0100, Thomas Huth wrote:
> On 15/03/2022 16.40, Daniel P. Berrangé wrote:
> > On Tue, Mar 15, 2022 at 03:30:27PM +0000, Peter Maydell wrote:
> > > On Tue, 15 Mar 2022 at 15:03, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > > Maybe we're running into this bug
> > > > https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> > > > ("zlib: compressBound() returns an incorrect result on z15") ?
> > > 
> > > Full repro info, since it's a bit hidden in this long thread:
> > > 
> > > Build an i386 guest QEMU; I used this configure command:
> > > 
> > > '../../configure' '--target-list=i386-softmmu' '--enable-debug'
> > > '--with-pkgversion=pm215' '--disable-docs'
> > > 
> > > Then run the multifd/tcp/zlib test in a tight loop:
> > > 
> > > X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
> > > ./build/i386/tests/qtest/migration-test  -tap -k -p
> > > /i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done
> > > 
> > > Without DFLTCC=0 it fails typically within 5 or so iterations;
> > > the longest I've ever seen it go is about 32.
> > 
> > So if this is a host OS package bug we punt to OS vendor to fix,
> > and just apply workaround in our CI ?  eg
> > 
> > $ git diff
> > diff --git a/.travis.yml b/.travis.yml
> > index c3c8048842..6da4c9f640 100644
> > --- a/.travis.yml
> > +++ b/.travis.yml
> > @@ -218,6 +218,7 @@ jobs:
> >           - TEST_CMD="make check check-tcg V=1"
> >           - CONFIG="--disable-containers --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
> >           - UNRELIABLE=true
> > +        - DFLTCC=0
> >         script:
> >           - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
> >           - |
> 
> Sounds like a good idea - but you should certainly add a proper comment
> here, too, so that we can later remind ourselves to remove the workaround
> again.

FYI, I don't have time to actually test this for real with Travis right
now , so I'll leave it to someone else to test and submit a formal patch.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-04-21 18:40 Dr. David Alan Gilbert (git)
@ 2022-04-22  5:02 ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2022-04-22  5:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, peterx, berrange; +Cc: quintela

On 4/21/22 11:40, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The following changes since commit 28298069afff3eb696e4995e63b2579b27adf378:
> 
>    Merge tag 'misc-pull-request' of gitlab.com:marcandre.lureau/qemu into staging (2022-04-21 09:27:54 -0700)
> 
> are available in the Git repository at:
> 
>    https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421a
> 
> for you to fetch changes up to 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e:
> 
>    migration: Read state once (2022-04-21 19:36:46 +0100)
> 
> ----------------------------------------------------------------
> V2: Migration pull 2022-04-21
> 
>    Dan: Test fixes and improvements (TLS mostly)
>    Peter: Postcopy improvements
>    Me: Race fix for info migrate, and compilation fix
> 
> V2:
>    Fixed checkpatch nit of unneeded NULL check
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as appropriate.


r~



> 
> ----------------------------------------------------------------
> Daniel P. Berrangé (9):
>        tests: improve error message when saving TLS PSK file fails
>        tests: support QTEST_TRACE env variable
>        tests: print newline after QMP response in qtest logs
>        migration: fix use of TLS PSK credentials with a UNIX socket
>        tests: switch MigrateStart struct to be stack allocated
>        tests: merge code for UNIX and TCP migration pre-copy tests
>        tests: introduce ability to provide hooks for migration precopy test
>        tests: switch migration FD passing test to use common precopy helper
>        tests: expand the migration precopy helper to support failures
> 
> Dr. David Alan Gilbert (2):
>        migration: Fix operator type
>        migration: Read state once
> 
> Peter Xu (7):
>        migration: Postpone releasing MigrationState.hostname
>        migration: Drop multifd tls_hostname cache
>        migration: Add pss.postcopy_requested status
>        migration: Move migrate_allow_multifd and helpers into migration.c
>        migration: Export ram_load_postcopy()
>        migration: Move channel setup out of postcopy_try_recover()
>        migration: Allow migrate-recover to run multiple times
> 
>   migration/channel.c                 |   1 -
>   migration/migration.c               |  66 ++++---
>   migration/migration.h               |   4 +-
>   migration/multifd.c                 |  29 +--
>   migration/multifd.h                 |   4 -
>   migration/ram.c                     |  10 +-
>   migration/ram.h                     |   1 +
>   migration/savevm.c                  |   3 -
>   migration/tls.c                     |   4 -
>   tests/qtest/libqtest.c              |  13 +-
>   tests/qtest/migration-test.c        | 368 ++++++++++++++++++++----------------
>   tests/unit/crypto-tls-psk-helpers.c |   2 +-
>   12 files changed, 267 insertions(+), 238 deletions(-)
> 
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PULL 00/18] migration queue
  2022-04-21 16:40 Dr. David Alan Gilbert (git)
@ 2022-04-21 18:44 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 47+ messages in thread
From: Dr. David Alan Gilbert @ 2022-04-21 18:44 UTC (permalink / raw)
  To: qemu-devel, peterx, berrange; +Cc: quintela

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The following changes since commit 401d46789410e88e9e90d76a11f46e8e9f358d55:
> 
>   Merge tag 'pull-target-arm-20220421' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-04-21 08:04:43 -0700)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421b
> 
> for you to fetch changes up to 25e7d2fd25d133a9f714443974b51e50416546a5:
> 
>   migration: Read state once (2022-04-21 17:33:50 +0100)

Oops, this has a checkpatch nit; just reposted a fixed version.

Dave

> ----------------------------------------------------------------
> Migration pull 2022-04-21
> 
>   Dan: Test fixes and improvements (TLS mostly)
>   Peter: Postcopy improvements
>   Me: Race fix for info migrate, and compilation fix
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> ----------------------------------------------------------------
> Daniel P. Berrangé (9):
>       tests: improve error message when saving TLS PSK file fails
>       tests: support QTEST_TRACE env variable
>       tests: print newline after QMP response in qtest logs
>       migration: fix use of TLS PSK credentials with a UNIX socket
>       tests: switch MigrateStart struct to be stack allocated
>       tests: merge code for UNIX and TCP migration pre-copy tests
>       tests: introduce ability to provide hooks for migration precopy test
>       tests: switch migration FD passing test to use common precopy helper
>       tests: expand the migration precopy helper to support failures
> 
> Dr. David Alan Gilbert (2):
>       migration: Fix operator type
>       migration: Read state once
> 
> Peter Xu (7):
>       migration: Postpone releasing MigrationState.hostname
>       migration: Drop multifd tls_hostname cache
>       migration: Add pss.postcopy_requested status
>       migration: Move migrate_allow_multifd and helpers into migration.c
>       migration: Export ram_load_postcopy()
>       migration: Move channel setup out of postcopy_try_recover()
>       migration: Allow migrate-recover to run multiple times
> 
>  migration/channel.c                 |   1 -
>  migration/migration.c               |  68 ++++---
>  migration/migration.h               |   4 +-
>  migration/multifd.c                 |  29 +--
>  migration/multifd.h                 |   4 -
>  migration/ram.c                     |  10 +-
>  migration/ram.h                     |   1 +
>  migration/savevm.c                  |   3 -
>  migration/tls.c                     |   4 -
>  tests/qtest/libqtest.c              |  13 +-
>  tests/qtest/migration-test.c        | 368 ++++++++++++++++++++----------------
>  tests/unit/crypto-tls-psk-helpers.c |   2 +-
>  12 files changed, 269 insertions(+), 238 deletions(-)
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PULL 00/18] migration queue
@ 2022-04-21 18:40 Dr. David Alan Gilbert (git)
  2022-04-22  5:02 ` Richard Henderson
  0 siblings, 1 reply; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-21 18:40 UTC (permalink / raw)
  To: qemu-devel, peterx, berrange; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit 28298069afff3eb696e4995e63b2579b27adf378:

  Merge tag 'misc-pull-request' of gitlab.com:marcandre.lureau/qemu into staging (2022-04-21 09:27:54 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421a

for you to fetch changes up to 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e:

  migration: Read state once (2022-04-21 19:36:46 +0100)

----------------------------------------------------------------
V2: Migration pull 2022-04-21

  Dan: Test fixes and improvements (TLS mostly)
  Peter: Postcopy improvements
  Me: Race fix for info migrate, and compilation fix

V2:
  Fixed checkpatch nit of unneeded NULL check

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

----------------------------------------------------------------
Daniel P. Berrangé (9):
      tests: improve error message when saving TLS PSK file fails
      tests: support QTEST_TRACE env variable
      tests: print newline after QMP response in qtest logs
      migration: fix use of TLS PSK credentials with a UNIX socket
      tests: switch MigrateStart struct to be stack allocated
      tests: merge code for UNIX and TCP migration pre-copy tests
      tests: introduce ability to provide hooks for migration precopy test
      tests: switch migration FD passing test to use common precopy helper
      tests: expand the migration precopy helper to support failures

Dr. David Alan Gilbert (2):
      migration: Fix operator type
      migration: Read state once

Peter Xu (7):
      migration: Postpone releasing MigrationState.hostname
      migration: Drop multifd tls_hostname cache
      migration: Add pss.postcopy_requested status
      migration: Move migrate_allow_multifd and helpers into migration.c
      migration: Export ram_load_postcopy()
      migration: Move channel setup out of postcopy_try_recover()
      migration: Allow migrate-recover to run multiple times

 migration/channel.c                 |   1 -
 migration/migration.c               |  66 ++++---
 migration/migration.h               |   4 +-
 migration/multifd.c                 |  29 +--
 migration/multifd.h                 |   4 -
 migration/ram.c                     |  10 +-
 migration/ram.h                     |   1 +
 migration/savevm.c                  |   3 -
 migration/tls.c                     |   4 -
 tests/qtest/libqtest.c              |  13 +-
 tests/qtest/migration-test.c        | 368 ++++++++++++++++++++----------------
 tests/unit/crypto-tls-psk-helpers.c |   2 +-
 12 files changed, 267 insertions(+), 238 deletions(-)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PULL 00/18] migration queue
@ 2022-04-21 16:40 Dr. David Alan Gilbert (git)
  2022-04-21 18:44 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 47+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2022-04-21 16:40 UTC (permalink / raw)
  To: qemu-devel, peterx, berrange; +Cc: quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The following changes since commit 401d46789410e88e9e90d76a11f46e8e9f358d55:

  Merge tag 'pull-target-arm-20220421' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-04-21 08:04:43 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421b

for you to fetch changes up to 25e7d2fd25d133a9f714443974b51e50416546a5:

  migration: Read state once (2022-04-21 17:33:50 +0100)

----------------------------------------------------------------
Migration pull 2022-04-21

  Dan: Test fixes and improvements (TLS mostly)
  Peter: Postcopy improvements
  Me: Race fix for info migrate, and compilation fix

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

----------------------------------------------------------------
Daniel P. Berrangé (9):
      tests: improve error message when saving TLS PSK file fails
      tests: support QTEST_TRACE env variable
      tests: print newline after QMP response in qtest logs
      migration: fix use of TLS PSK credentials with a UNIX socket
      tests: switch MigrateStart struct to be stack allocated
      tests: merge code for UNIX and TCP migration pre-copy tests
      tests: introduce ability to provide hooks for migration precopy test
      tests: switch migration FD passing test to use common precopy helper
      tests: expand the migration precopy helper to support failures

Dr. David Alan Gilbert (2):
      migration: Fix operator type
      migration: Read state once

Peter Xu (7):
      migration: Postpone releasing MigrationState.hostname
      migration: Drop multifd tls_hostname cache
      migration: Add pss.postcopy_requested status
      migration: Move migrate_allow_multifd and helpers into migration.c
      migration: Export ram_load_postcopy()
      migration: Move channel setup out of postcopy_try_recover()
      migration: Allow migrate-recover to run multiple times

 migration/channel.c                 |   1 -
 migration/migration.c               |  68 ++++---
 migration/migration.h               |   4 +-
 migration/multifd.c                 |  29 +--
 migration/multifd.h                 |   4 -
 migration/ram.c                     |  10 +-
 migration/ram.h                     |   1 +
 migration/savevm.c                  |   3 -
 migration/tls.c                     |   4 -
 tests/qtest/libqtest.c              |  13 +-
 tests/qtest/migration-test.c        | 368 ++++++++++++++++++++----------------
 tests/unit/crypto-tls-psk-helpers.c |   2 +-
 12 files changed, 269 insertions(+), 238 deletions(-)



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-04-22  5:04 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02 18:29 [PULL 00/18] migration queue Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 01/18] clock-vmstate: Add missing END_OF_LIST Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 02/18] virtiofsd: Let meson check for statx.stx_mnt_id Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 03/18] monitor/hmp: add support for flag argument with value Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 04/18] qapi/monitor: refactor set/expire_password with enums Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 05/18] qapi/monitor: allow VNC display id in set/expire_password Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 06/18] migration/rdma: set the REUSEADDR option for destination Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 07/18] migration: Dump sub-cmd name in loadvm_process_command tp Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 08/18] migration: Finer grained tracepoints for POSTCOPY_LISTEN Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 09/18] migration: Tracepoint change in postcopy-run bottom half Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 10/18] migration: Introduce postcopy channels on dest node Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 11/18] migration: Dump ramblock and offset too when non-same-page detected Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 12/18] migration: Add postcopy_thread_create() Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 13/18] migration: Move static var in ram_block_from_stream() into global Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 14/18] migration: Enlarge postcopy recovery to capture !-EIO too Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 15/18] migration: postcopy_pause_fault_thread() never fails Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 16/18] migration: Add migration_incoming_transport_cleanup() Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 17/18] tests: Pass in MigrateStart** into test_migrate_start() Dr. David Alan Gilbert (git)
2022-03-02 18:29 ` [PULL 18/18] migration: Remove load_state_old and minimum_version_id_old Dr. David Alan Gilbert (git)
2022-03-03 14:46 ` [PULL 00/18] migration queue Peter Maydell
2022-03-08 18:36   ` Philippe Mathieu-Daudé
2022-03-08 18:47     ` Dr. David Alan Gilbert
2022-03-14 16:56       ` Peter Maydell
2022-03-14 17:07         ` Daniel P. Berrangé
2022-03-14 17:15           ` Peter Maydell
2022-03-14 17:24             ` Daniel P. Berrangé
2022-03-14 17:54             ` Dr. David Alan Gilbert
2022-03-14 18:08               ` Peter Maydell
2022-03-14 18:20                 ` Dr. David Alan Gilbert
2022-03-14 18:53                   ` Daniel P. Berrangé
2022-03-15  2:41                     ` Peter Xu
2022-03-14 18:58             ` Peter Maydell
2022-03-14 19:44               ` Peter Maydell
2022-03-15 14:39                 ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
2022-03-15 15:03                   ` Peter Maydell
2022-03-15 15:30                     ` Peter Maydell
2022-03-15 15:40                       ` Daniel P. Berrangé
2022-03-15 15:44                         ` multifd/tcp/zlib intermittent abort Thomas Huth
2022-03-15 17:01                           ` Daniel P. Berrangé
2022-03-15 15:46                         ` multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Peter Maydell
2022-03-15 16:14                     ` Dr. David Alan Gilbert
2022-03-15 16:21                       ` Peter Maydell
2022-03-15 14:53       ` [PULL 00/18] migration queue Christian Borntraeger
2022-04-21 16:40 Dr. David Alan Gilbert (git)
2022-04-21 18:44 ` Dr. David Alan Gilbert
2022-04-21 18:40 Dr. David Alan Gilbert (git)
2022-04-22  5:02 ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.